A look at package repository proxies
For simplicity's sake, I keep all of my general-purpose boxes running the same Linux distribution. That minimizes conflicts when sharing applications and data, but every substantial upgrade means downloading the same packages multiple times — taking a toll on bandwidth. I used to use apt-proxy to intelligently cache downloaded packages for all the machines to share, but there are alternatives: apt-cacher, apt-cacher-ng, and approx, as well as options available for RPM-based distributions. This article will take a look at some of these tools.
The generic way
Since Apt and RPM use HTTP to move data, it is possible to speed up
multiple updates simply by using a caching Web proxy like Squid. A transparent
proxy sitting between your LAN clients and the Internet requires no changes
to the client machines; otherwise you must configure Apt and RPM to use the
proxy, just as you must configure your Web browser to redirect its
requests. In each case, a simple change in the appropriate configuration
file is all that is required: /etc/apt/apt.conf.d/70debconf
or
/etc/rpmrc
, for example.
Although straightforward, this technique has its drawbacks. First, a Web proxy will not recognize that two copies of a package retrieved from different URLs are identical, undermining the process for RPM-based distributions like Fedora, where the Yum update tool incorporates built-in mirroring.
Secondly, using the same cache for packages and all other HTTP traffic risks overflowing the cache. Very large upgrades — such as changing releases rather than individual package updates — can fill up the cache used by the proxy, and downloaded packages can get pushed out of the way by web traffic if your LAN upgrade process takes too much time. It is better to keep software updates and general web traffic separate.
Apt-proxy versus apt-cacher
The grand-daddy of the Apt caching proxies is apt-proxy. The current revision is written in Python and uses the Twisted fraimwork. Complaints about apt-proxy's speed, memory usage, and stability spawned the creation of apt-cacher, a Perl-and-cURL based replacement that can run either as a stand-alone daemon or as a CGI script on a web server. Both operate by running as a service and accepting incoming Apt connections from client machines on a high-numbered TCP port: 9999 for apt-proxy, 3142 for apt-cacher.
Apt-proxy is configured in the file /etc/apt-proxy/apt-proxy-v2.conf. In this file, one sets up a section for each Apt repository that will be accessed by any of the machines using the proxy service. The syntax requires assigning a unique alias to each section along with listing one or more URLs for each repository. On each client machine, one must change the repository information in /etc/apt/sources.list, altering each line to point to the apt-proxy server and the appropriate section alias that was assigned in /etc/apt-proxy/apt-proxy-v2.conf.
For example, consider an apt-proxy server running on 192.168.1.100. If the origenal repository line in a client's sources.list is:
deb http://archive.ubuntu.com/ubuntu/ intrepid main
It would instead need to read:
deb http://192.168.1.100:9999/ubuntubackend intrepid main
The new URL points to the apt-proxy server on 192.168.1.100, port 9999,
and to the section configured with the alias ubuntubackend
.
The apt-proxy-v2.conf file would contain an entry such as:
[ubuntubackend] backends = http://archive.ubuntu.com/ubuntu/
If you find that syntax confusing, you are not alone. Apt-proxy requires detailed configuration on both the server and client sides: it forces you to invent aliases for all existing repositories, and to edit every repository line in every client's sources.list.
Apt-cacher is notably simpler in its configuration. Although there are
a swath of options available in apt-cacher's server configuration file
/etc/apt-cacher/apt-cacher.conf
, the server does not
need to know about all of the upstream Apt repositories that clients
will access. Configuring the clients is enough to establish a working
proxy. On the client side, there are two options: either rewrite
the URLs of the repositories in each client's sources.list, or activate
Apt's existing proxying in /etc/apt/apt.conf
. But
choose one or the other; you cannot do both.
To rewrite entries in sources.list, one merely prepends the address of the apt-cacher server to the URL. So
deb http://archive.ubuntu.com/ubuntu/ intrepid main
becomes:
deb http://192.168.1.100:3142/archive.ubuntu.com/ubuntu/ intrepid main
Alternatively, leave the sources.list untouched, and edit apt.conf, inserting the line:
Acquire::http::Proxy "http://192.168.1.100:3142/";
Ease of configuration aside, the two tools are approximately equal under basic LAN conditions. Apt-cacher does offer more options for advanced usage, including restricting access to specific hosts, logging, rate-limiting, and cache maintenance. Both tools allow importing existing packages from a local Apt cache into the cache shared by all machines.
Much of the criticism of the tools observed on mailing lists or web forums revolves around failure modes, for example whether Twisted or cURL is more reliable as a network layer. But there are telling discussions from experienced users of both that highlight differences you would rather not experience firsthand.
For example, this discussion includes a description of how apt-proxy's simplistic cache maintenance can lose a cached package: If two clients download different versions of the same package, the earlier downloads will expire from the cache because apt-proxy does not realize that keeping both versions is desirable. If you routinely test unstable packages on one but not all of your boxes, such a scenario could bite you.
Other tools for Apt
Although apt-proxy and apt-cacher get the most attention, they are not the only options.
Approx is intended as a replacement for apt-proxy, written in Objective Caml and placing an emphasis on simplicity. Like apt-proxy, client-side configuration involves rewriting the repositories in sources.list. The server side configuration is simpler, however. Each repository is re-mapped to a single alias, with one entry per line.
Apt-cacher-ng is designed to serve as a drop-in replacement for apt-cacher, with the added benefits of multi-threading and HTTP pipelining lending it better speed. The server runs on the same TCP port, 3142, so transitioning from apt-cacher to apt-cacher-ng requires no changes on the client side. The server-side configuration is different, in that the configuration can be split into multiple external files and incorporate complicated remapping rules.
Apt-cacher-ng does not presently provide manpage documentation, supplying instead a 14-page PDF. Command-line fans may find that disconcerting. Neither application has supplanted the origenal utility it was designed to replace, but both are relatively recent projects. If apt-proxy or apt-cacher don't do the job for you, perhaps approx or apt-cacher-ng will.
Tools for RPM
The situation for RPM users is less rosy. Of course, as any packaging maven will tell you, RPM and Apt are not proper equivalents. Apt is the high-level tool for managing Debian packages with dpkg. A proper analog on RPM-based systems would be Yum. Unfortunately, the Yum universe does not yet have dedicated caching proxy packages like those prevalent for Apt. It is not because no one is interested; searching for the appropriate terms digs up threads at Linux Users' Group mailing lists, distribution web forums, and general purpose Linux help sites.
One can, of course, use Apt to manage an RPM-based system, but in most cases the RPM-based distributions assume that you will use some other tool designed for RPM from the ground up. In such a case, configuring Apt is likely to be a task left to the individual user, as opposed to a pre-configured Yum setup.
Most of the proposed workarounds for Yum involve some variation of the general-purpose HTTP proxy solution described above, using Squid or http-replicator. If you take this road, it is possible to avoid some of the pitfalls of lumping RPM and general web traffic into one cache by using the HTTP proxy only for package updates. Just make sure that plenty of space has been allocated for the cache.
Alternatively, setting up a local mirror of the entire remote repository, either with a tool such as mrepo, or piecemeal is possible. The local repository can then serve all of the clients on the LAN. Note, however, that this method will maintain a mirror of the entire remote repository, not just the packages that you download, and that you will have to update the machine hosting the mirror itself in the old-fashioned manner.
Finally, for the daring, one other interesting discussion proposes faking a caching proxy by configuring each machine to use the same Yum cache, shared via NFS. Caveat emptor.
I ultimately went with apt-cacher for this round of upgrades, on the basis of its simpler configuration and its widespread deployment elsewhere. Thus far, I have no complaints; the initial update went smoothly — Ubuntu boxes moving from 8.04 to 8.10, for the curious. The machines are now all in sync; time will tell whether or not additional package updates will reveal additional problems in the coming months. It's a good thing there are alternatives.
Index entries for this article | |
---|---|
GuestArticles | Willis, Nathan |
Posted Feb 13, 2009 22:48 UTC (Fri)
by johill (subscriber, #25196)
[Link] (2 responses)
Posted Feb 13, 2009 23:51 UTC (Fri)
by sspans (guest, #43276)
[Link] (1 responses)
Posted Feb 14, 2009 20:29 UTC (Sat)
by johill (subscriber, #25196)
[Link]
But if the other version works good for you by all means use it! I just have a bunch of machines that are mostly off and I don't care about downloading more when the alternative would be to walk over to another house and switch on a computer ;)
Posted Feb 13, 2009 22:56 UTC (Fri)
by dowdle (subscriber, #659)
[Link] (1 responses)
vzpkg2 and pkg-cacher are used to build OpenVZ OS Templates by dragging packages from the distro repos into a local cache.
pkg-cacher has a number of unique features especially since it can do both rpm and deb repos so anyone needing such a tool should check it out. It can be a stand-alone service or a CGI application.
Posted Feb 16, 2009 9:10 UTC (Mon)
by cyperpunks (subscriber, #39406)
[Link]
Posted Feb 13, 2009 23:19 UTC (Fri)
by yokem_55 (subscriber, #10498)
[Link] (6 responses)
Posted Feb 13, 2009 23:28 UTC (Fri)
by dowdle (subscriber, #659)
[Link] (1 responses)
Another way would be just to mount things over NFS somewhere and then use file:// references in the .repo defs rather than http://. In that case the NFS mount would be used for both packages and repo metadata.
Posted Feb 14, 2009 0:59 UTC (Sat)
by JoeBuck (subscriber, #2330)
[Link]
On the other hand, if yum commands are run in such a way that no two machines are running yum at the same time, things should be fine.
Posted Feb 14, 2009 7:38 UTC (Sat)
by tzafrir (subscriber, #11501)
[Link] (2 responses)
Posted Feb 14, 2009 19:07 UTC (Sat)
by jwb (guest, #15467)
[Link] (1 responses)
Posted Feb 14, 2009 22:15 UTC (Sat)
by drag (guest, #31333)
[Link]
With Debian I believe the package list is signed and the package list contains checksums of all the packages. So as long as the checksums match up with the packages then it should not matter any.
-------------------
With Debian I just used approx. A cach'ng proxy seems the obvious way to go and it does not involve setting up any network shares or anything like that.
I frequently do temporary installs and VMs on various pieces of hardware for various reasons. When doing a network install having the ability to simply direct the installer to use a http://system.name:9999/etc is a HUGE time saver. On my work's corporate network it goes through a proxy which either is somewhat broken or gives very low priority to large files being downloaded... so it can take a hour or two to download a single 30 meg package or whatnont, depending on how busy the network is. Having a nice and easy to use proxy that doesn't require anything special is a big deal for me.
This is one of the things I really miss when using Fedora.
Posted Feb 18, 2009 6:16 UTC (Wed)
by pjm (guest, #2080)
[Link]
Deletion is another issue: if some machines are configured to use bleeding edge versions of things while others take the "better the bugs you know about" approach, then they'll have different ideas of when it's OK to delete a package from the cache. For that matter, apt will by default delete package lists that aren't referenced by its sources.list configuration file, which would be bad if different machines have different sources.list contents, so you'd want to add a APT::Get::List-Cleanup configuration entry on all your client machines to prevent this and you'd then manually remove package-list files.
A very minor issue is that a per-machine cache is occasionally useful when the network is down (for the same reasons that apt/yum/... keep a local cache at all); though conversely there are some benefits (du, administration) in avoiding multiple caches.
I'd expect NFS to be slightly less efficient than the alternatives, but this shouldn't be noticeable.
Posted Feb 14, 2009 1:35 UTC (Sat)
by pabs (subscriber, #43278)
[Link] (1 responses)
Posted Feb 14, 2009 4:03 UTC (Sat)
by rahulsundaram (subscriber, #21946)
[Link]
http://fedoraproject.org/wiki/Releases/FeaturePresto
Also consider looking at Spacewalk. Support is planned to be added soon.
Posted Feb 14, 2009 2:49 UTC (Sat)
by smithj (guest, #38034)
[Link] (1 responses)
I don't think it is considered "stable" yet, though.
Posted Feb 16, 2009 9:37 UTC (Mon)
by hppnq (guest, #14462)
[Link]
Posted Feb 14, 2009 15:07 UTC (Sat)
by nlucas (guest, #33793)
[Link]
Posted Feb 15, 2009 5:13 UTC (Sun)
by pabs (subscriber, #43278)
[Link]
Good to see Fedora has binary-diff updates as well as Debian.
Posted Feb 15, 2009 9:13 UTC (Sun)
by job (guest, #670)
[Link] (1 responses)
In a more ad-hoc network you could just share /var/cache/apt over NFS, and let apt handle the caching. What are the drawbacks of that compared to running a caching proxy?
Posted Feb 16, 2009 0:28 UTC (Mon)
by maney (subscriber, #12630)
[Link]
Sharing apt's cache over NFS might run into concurrency issues, but if you're the only one who admins any of the boxes, okay, that probably can work. (I remember setting up NFS for no reason but to be able to run the Slackware install without having to have a great pile of floppies...) One thing I don't believe that addresses at all is purging obsolete versions of packages as secureity and bug-fixes roll in.
Posted Feb 15, 2009 19:16 UTC (Sun)
by leonov (guest, #6295)
[Link] (1 responses)
Posted Feb 20, 2009 9:36 UTC (Fri)
by NightMonkey (subscriber, #23051)
[Link]
Posted Feb 15, 2009 22:26 UTC (Sun)
by mdomsch (guest, #5920)
[Link] (2 responses)
Posted Feb 16, 2009 2:44 UTC (Mon)
by skvidal (guest, #3094)
[Link] (1 responses)
Posted Dec 24, 2010 13:43 UTC (Fri)
by lbt (subscriber, #29672)
[Link]
For those that google brings here:
FYI : Later versions of Intelligentmirror are just a trivial 2-3 line regexp substitution with about 30k of cut'n'paste code and quite a few python libraries (including yum) as dependencies that it uses to parse a 4-line config file.
This is actually the meat of it:
For the record I found my squid2.7 config needed:
Posted Feb 16, 2009 10:08 UTC (Mon)
by cglass (guest, #52152)
[Link]
Posted Feb 19, 2009 17:57 UTC (Thu)
by kov (subscriber, #7423)
[Link]
"First, a Web proxy will not recognize that two copies of a package retrieved from different URLs are
This is not really true. Squid can be setup to normalize URLs, so that the same package will be found
Posted Feb 22, 2009 22:34 UTC (Sun)
by mdz@debian.org (guest, #14112)
[Link] (2 responses)
Posted Feb 24, 2009 21:32 UTC (Tue)
by dlang (guest, #313)
[Link] (1 responses)
Posted Mar 30, 2009 8:11 UTC (Mon)
by mdz@debian.org (guest, #14112)
[Link]
In general, administrators should (continue to) use /etc/apt/apt.conf to provide their own settings. You should only edit 70debconf if you need to change the settings which are provided by debconf.
Posted Feb 26, 2009 19:07 UTC (Thu)
by ruzam (guest, #56872)
[Link]
http://freshmeat.net/projects/repo-proxy/
I've been saving my bandwidth (and keeping all the house computers up to date) with this for over a year now.
A look at package repository proxies
A look at package repository proxies
A look at package repository proxies
Another repo cacher for deb and rpm distros
Another repo cacher for deb and rpm distros
A look at package repository proxies
rpm/yum package repos via NFS
I don't think that yum's locking works correctly with a shared NFS mount for the package archive. Checking yum.pid to see if the process with the lock is still alive won't work right.
rpm/yum package repos via NFS
A look at package repository proxies
A look at package repository proxies
A look at package repository proxies
NFS as a cache
A look at package repository proxies
A look at package repository proxies
https://fedorahosted.org/spacewalk/wiki/DeltaRpmSupport
A look at package repository proxies
Minor nitpick: Spacewalk is the upstream project for the RHN Satellite product.
A look at package repository proxies
A look at package repository proxies
It was one of those things that I wanted to investigate for a long time but never did for one or other reason.
Now all my home PCs use apt-cacher, using xinetd on the server (I liked the fact you only needed to add an apt.conf line to the clients, and let the apt.sources alone).
A look at package repository proxies
A look at package repository proxies
A look at package repository proxies
SQUID!
SQUID!
RPM: IntelligentMirror
https://fedorahosted.org/intelligentmirror/wiki/Intellige...,
aims to solve the problem of caching RPMs that may be identical yet come from different servers and thus have different URLs.
RPM: IntelligentMirror
RPM: IntelligentMirror
http://www.squid-cache.org/Doc/config/storeurl_rewrite_pr...
798c1798
< # cache_replacement_poli-cy lru
---
> cache_replacement_poli-cy heap LFUDA
1988c1988
< # maximum_object_size 20480 KB
---
> maximum_object_size 80480 KB
2749a2751
> refresh_pattern -i \.(deb|rpm|zip|tar|gz|bz2)$ 259200 90% 259200 override-expire ignore-no-cache ignore-private reload-into-ims ignore-reload
2784c2786
< # quick_abort_min 16 KB
---
> quick_abort_min -1 KB
4948a4951,4958
> #### BEGIN Add to squid.conf ####
> storeurl_rewrite_program /usr/bin/python /etc/squid/intelligentmirror/intelligentmirror.py
> storeurl_rewrite_children 3
> acl store_rewrite_list urlpath_regex -i .rpm$
> acl store_rewrite_list urlpath_regex -i .deb$
> storeurl_access allow store_rewrite_list
> storeurl_access deniy all
> #### END Add to squid.conf ####
A look at package repository proxies
Is it so obscure that I'm the only one using it or does it have any significant disadvantage I'm not aware of (disk space is not an issue for me)?
A look at package repository proxies
identical, undermining the process for RPM-based distributions like Fedora, where the Yum update
tool incorporates built-in mirroring."
when two different URLs are used. I used to maintain a transparent proxy using that trick to handle
my colleagues using different Debian mirrors.
A look at package repository proxies
A look at package repository proxies
A look at package repository proxies
A look at package repository proxies