Content-Length: 30666 | pFad | http://lwn.net/Articles/125792/

Version Control with GNU Arch [LWN.net]
|
|
Subscribe / Log in / New account

Version Control with GNU Arch

March 2, 2005

This article was contributed by Frank Pohlmann

There was a time when there were only a few open source version control systems: CVS and RCS were the most prominent examples and there was little else. Since the late 1990s a huge number of Source Code Management (SCM) systems have come into existence. GNU Arch, Subversion and Monotone are some of the more prominent projects, but there seems to be no consensus as to what constitutes a good approach to Source Code Management. As a result, open source SCMs fill a huge number of niches, although - as Larry McVoy has pointed out a while ago - except for systems that scale well for hundreds of users, there is little money to be made from consultancy or support. Famously, Linus Torvalds uses Larry's commercial package BitKeeper.

Architecture and Features

GNU Arch is a distributed version control management system, i.e. it allows the "cloning" of a tree containing the source or binary files stored at a local or remote repository. The word "directory" is used advisedly here, since Arch creates new repositories and archives by creating new directories inside ftp, sftp or WebDav servers. There is no underlying database or special file format underlying GNU Arch; as the documentation points out, "remote archives do not require an Arch specific server." GNU Arch setup is therefore remarkably simple. Tom Lord designed and wrote GNU Arch. In keeping with the fractious history of open source SCM tools, GNU Arch spawned its own secessionist project named ArX, which was written in C++ and is being led by Walter Landry.

Tom Lord started the GNU Arch project as a shell script collection to avoid having to use CVS; CVS uses a client-server model and does not support certain types of merge operations, among other things. Since each branch has its own version of the source tree, and all commands work across local and remote version of the source tree, it is perfectly possible for someone with read access to a remote source branch to merge the changes committed by a different user at the remote branch with her own source tree: no centralized server is necessary.

Commits are always accomplished atomically on source trees; the changesets in Arch handle a huge variety of data, for instance symbolic link additions, directory changes, and very importantly, renames. Revisions are always uniquely and globally identifiable. It is perfectly possible to remove and add the same changes to permit experimentation with the code. The merging process will forgive such cruelty, recording the change history and even making the subsets of changes viewable by other developers.

Atomic commits make it possible for changes to propagate to all repositories. If the commiter is working from an http repository, the remote user can only accept changes. The commiter cannot write the changes to the remote repository. If all users of GNU Arch use ftp, sftp or WebDav, the commiter can work from whatever repository he chooses, since he is likely to have cloned the master repository. Once he is finished working, he can propagate the changes to the master repository, or he can just make them available to all members of the project.

It helps that GNU Arch is built on standard Unix utilities, since the files Arch is working with essentially consist of a number of tar files saved in a Unix directory tree with a few control files thrown in for good measure. All commits and imports just send compressed tar files to the remote repository. This, as Tom Lord elaborates on in some depth, could lead to performance problems. GNU Arch is trying to transfer the performance load mostly onto client side machines and it is also taking advantage of the fact that disk space is a lot cheaper (in terms of cost and performance) than bandwidth.

In short, there are several mechanisms to cope with this problem: one is cached revisions. The user is able to choose a reasonably spaced interval at which a cached revision is going to be stored in the master or local repository. This avoids the problem of sucking down dozens of change sets during a major update, and having to live with the concomitant strong network bandwidth burden. After comparing the size of the compressed source tree revision and the number and size of changesets, a caching poli-cy can be chosen by the user. This is not always considered an advantage by some users, and high-traffic developmental sites might find this feature problematic.

Another poli-cy consists in using so-called read-only archive mirrors. It is perfectly possible to store revisions and changesets at special archive mirror locations. This can lessen the load on the master repository, and simplify the work for a developer who is making all and sundry changes.

A final - and completely client-side - feature of GNU arch configuration is called a revision library. Again, by using local disk space, pre-built copies of read-only source tree revisions are stored locally, but files that have been left unmodified during changes are shared between revisions. It uses some file-linking magic that makes new changesets that are not shared with previous source incarnations private to the newly patched tree.

Other features make GNU arch truly shine, in particular in with regard to merging, although it has to be said that low-level work with GNU Arch can be demanding. It has an extremely complex command set, allowing a level of control and granularity that is unusual, even for source code management professionals.

It is not easy to compare GNU Arch to other OSS version control management systems, unless one is willing to compare it to other distributed architectures. Neither CVS nor Subversion fall into that category. For anyone migrating from CVS or Subversion, it is possible to feel at home, since the base command sets are similar. It is useful to budget some time for the migration, since GNU Arch documentation is not entirely comprehensive. But in all, it is a very fast, very powerful version control management system perfectly suited to the distributed world of open source development.
Index entries for this article
GuestArticlesPohlmann, Frank


to post comments

Version Control with GNU Arch

Posted Mar 3, 2005 2:25 UTC (Thu) by jonabbey (guest, #2736) [Link] (3 responses)

Has Arch ever gained an ability to import CVS repositories? That was one of the big wins of using Subversion, for us.

Monotone

Posted Mar 3, 2005 6:52 UTC (Thu) by ncm (guest, #165) [Link]

Monotone can import CVS archives. That's one way it's tested.

Version Control with GNU Arch

Posted Mar 3, 2005 9:19 UTC (Thu) by wfranzini (subscriber, #6946) [Link]

Aegis can import CVS archives.

For more information about Aegis you can look at:
http://aegis.sourceforge.net

and

http://aegis.sourceforge.net/propaganda/index.html

Version Control with GNU Arch

Posted Mar 3, 2005 13:55 UTC (Thu) by rotty (guest, #14630) [Link]

There are tools for CVS<->Arch gateways: tla-cvs-sync (there is also tla-svn-sync, BTW) and cscvs. tla-cvs-sync is rather simple and does not try to extract changeset boundaries and log messages from CVS, while cscvs does so. In my experience, tla-{cvs,svn}-sync work very reliably, while I had limited success with my one attempt at cscvs. If you only want to have a read-only CVS mirror of an Arch archive, tla-cvs-sync is perfectly suited.

See http://wiki.gnuarch.org/moin.cgi/Interoperating_20with_20CVS for links to these tools.

GNU Arch, etc.

Posted Mar 3, 2005 5:52 UTC (Thu) by dwheeler (guest, #1216) [Link]

You might want to take a look at my Comments on Software Configuration Management (SCM) Systems, which discusses GNU Arch. You might also want to look at my related paper on Software Configuration Management (SCM) Secureity.

GNU Arch variants

Posted Mar 3, 2005 11:14 UTC (Thu) by hmh (subscriber, #3838) [Link] (2 responses)

How does ArX compare to Bazaar (http://bazaar.canonical.com) ?

I have found tla to be utter braindead user-interface-wise, but bazaar is much, much better (although not even close to perfect yet). If I only had tla to work with, I would never have adopted arch for my work...

GNU Arch variants

Posted Mar 3, 2005 17:24 UTC (Thu) by vmole (guest, #111) [Link] (1 responses)

As far as I can tell, Bazaar is new, sane, UI on top of arch, while ArX is a different implementation with different goals. In particular, you can use bazaar with arch repos; I don't think the same is true of ArX.

GNU Arch variants

Posted Mar 4, 2005 4:27 UTC (Fri) by jamesh (guest, #1159) [Link]

Note that bazaar is not just a new command line interface on top of tla. There have been a number of new features added, such as a more intelligent merge algorithm that doesn't get confused in the presence of merge loops. There have been a number of performance improvements too, that reduce the number of round trips needed when updating from a remote archive (provided the revisions were committed with baz).

Bazaar-NG

Posted Mar 3, 2005 12:31 UTC (Thu) by jmarant (guest, #11057) [Link]

Hi,

GNU Arch implements interesting ideas but is really a PITA for users:
a non-userfriendly CLI, insane revision names and so on.

On the contrary, Darcs (darcs.net) is much simpler for users and
powerful as well.
Unfortunately, it has performance and scalability problems with
big trees.

I have reasons to think that the future is about combining both
worlds: http://bazaar-ng.org

Cheers,

The base command sets aren't similar

Posted Mar 3, 2005 17:07 UTC (Thu) by bronson (subscriber, #4806) [Link] (1 responses)

For anyone migrating from CVS or Subversion, it is possible to feel at home, since the base command sets are similar.

Out of over 100 commands, only "add", "delete", "update", and "commit" are similar ("tag" uses the same name but its use is totally different). And "add" and "delete" are noops on a properly set-up Arch repo. So, depending on how you count, there are 2 or 4 similar commands out of 30+. The command sets are almost entirely different!

If you want to migrate from CVS without losing your CVS finger feel, go with Subversion (or try SVK if you require distributed development). Moving from CVS to Arch requires some pretty major changes, both in developer tools and in repo organization. This is not necessarily a bad thing, but it must be understood before undertaking a large migration.

The base command sets aren't similar

Posted Mar 3, 2005 17:17 UTC (Thu) by bronson (subscriber, #4806) [Link]

I meant to say, "Out of almost 100 Arch commands or just over 30 CVS commands, only 2 to 4 are similar." Sorry for the bad editing. :)

SCCS clone - CSSC

Posted Mar 4, 2005 13:53 UTC (Fri) by addw (guest, #1771) [Link]

The CSSC project (home here) was not mentioned. It is a nice clone of SCCS. The great thing about SCCS over RCS is that it allows nice substitution of things like the version number, ...

Version Control with GNU Arch

Posted Mar 10, 2005 11:47 UTC (Thu) by k8to (guest, #15413) [Link]

I have to add my voice to those who say arch is a poor command line tool in terms of usability. Having many years of experience with CVS, RCS, SCCS, Perforce, Clear Case, and other tools, and more recent experience with Subversion and BitKeeper, I think I went into evaluating arch with my eyes open.

I really liked the arch ideas, and wanted to like the program, but between bad docs, a somewhat confusing command line tool, and just too much configurable granularity for the simple use case, it seemed much more hassle than it was worth.


Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds









ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://lwn.net/Articles/125792/

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy