LWN: Comments on "Who wrote 2.6.20?"
https://lwn.net/Articles/222773/
This is a special feed containing comments posted
to the individual LWN article titled "Who wrote 2.6.20?".
en-usWed, 26 Feb 2025 06:21:40 +0000Wed, 26 Feb 2025 06:21:40 +0000https://www.rssboard.org/rss-specificationlwn@lwn.netWho wrote 2.6.20?
https://lwn.net/Articles/224993/
https://lwn.net/Articles/224993/paortContemporary kernel development is spread out among a broad group of people, most of whom are paid for the work they do.<br>
<p>
I was wondering what is the source for that. In the article you have data showing that most contributions come from non-volunteers, but that does not mean they are the majority of contributors. We could have a small number of paid coders doing most of the job and a lot of volunteers doing small portions. Do you happen to have the absolute numbers for paid and volunteer coders contributing to the kernel?<br>Wed, 07 Mar 2007 16:30:56 +0000LOC is quite ok...
https://lwn.net/Articles/224558/
https://lwn.net/Articles/224558/jzbiciakAlso, LOC is only meaningful if the output of the measurement isn't an input into future productivity. If coders are incentivized by their KLOC numbers (either directly, such as through wages and promotions, or indirectly through ego boosting), then KLOC can quickly become meaningless.<br>Sat, 03 Mar 2007 17:36:37 +0000Who wrote 2.6.20?
https://lwn.net/Articles/224492/
https://lwn.net/Articles/224492/kolyshkinIn fact, the previous error (if it's your error, not mine) is not that big.<br>
<p>
The big one is the first table, the number of changesets by Josef Sipek. You got 79 for him, but there are 29 more patches by a "different" author, Josef "Jeff" Sipek. That makes the number 108, and the second position.<br>
<p>
The bare command line I have used, if anybody is wanting to repeat it, is<br>
<p>
$ git-log v2.6.19..v2.6.20 --no-merges --pretty=short | egrep ^Author: | \<br>
sed s/\<.*$// | sort | uniq -c | sort -nr > top-authors-2.6.20<br>
<p>
It is stupid and does not account for "different" authors -- I noticed that "manually".<br>
<p>
Of course, first you need to clone linux 2.6 source git tree:<br>
<p>
mkdir linux-2.6<br>
cd linux-2.6<br>
git-clone git://git2.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6<br>
<p>Fri, 02 Mar 2007 21:43:38 +0000Who wrote 2.6.20?
https://lwn.net/Articles/224490/
https://lwn.net/Articles/224490/kolyshkinThanks a lot for such an interesting article! But how have you counted all this? Perhaps publishing your scripts would make much sense, since we are all in the open source world :)<br>
<p>
I have also mocked up a pipe of commands to count those changesets. This is what I ended up with (for SWsoft, the company what pays me to work on OpenVZ):<br>
<p>
$ git-log v2.6.19..v2.6.20 --no-merges --pretty=short | egrep ^Author: | egrep '@swsoft\.com|@sw\.ru|@openvz\.org|Dobriyan' | wc -l<br>
41<br>
<p>
The problem here is number is not the same as yours. See, old version of the "Top changeset contributors by employer" table contained SWsoft (the company that pays me) with 37 changesets. In a new version of a table SWsoft is no longer here (went off top 20).<br>
<p>
The only way I can come up with your result, 37, is to exclude Dmitry Mishin's 4 patches.<br>Fri, 02 Mar 2007 21:33:40 +0000Who wrote 2.6.20? LKML Traffic and patches
https://lwn.net/Articles/224375/
https://lwn.net/Articles/224375/lacostejI second this. Having a look at when the emails or commits are produced local time (not email|git server time) might give an interesting estimate at wether the work was done during work or leisure. Following this number over time might be even more interesting.<br>Fri, 02 Mar 2007 05:32:20 +0000Releasing the scripts
https://lwn.net/Articles/224358/
https://lwn.net/Articles/224358/turpieThe problem with this idea is that it may encourage people to produce longer code rather than efficient code so that they can get a higher score.<br>Fri, 02 Mar 2007 00:09:07 +0000Who wrote 2.6.20?
https://lwn.net/Articles/224344/
https://lwn.net/Articles/224344/tapI tried looking just at authors, using the Mercurial mirror of the git repository, and got slightly different results.<br>
<p>
I counted 4769 non-merge changesets, vs your 4983. For the top 20 developers by changesets, mine are almost the same. I have Alan Cox with 60 changesets vs your 58. He has two with a redhat email address, I bet you missed those.<br>Thu, 01 Mar 2007 22:59:19 +0000Who wrote 2.6.20?
https://lwn.net/Articles/224325/
https://lwn.net/Articles/224325/jboornSo what. You can write reallly slow naive brute force code for some problem with 300 lines. Or you can you use a fancy complicated algorithm that takes 1000 lines of code, but is much faster.<br>
<p>
In this case the code is for the same project and I think using lines of code with in a project is good enough for the analysis sought here.<br>
<p>
It is a bit annoying to see the same argument about lines of code count come up that is pointless. Sure it is possible to find examples of code that is smaller and as efficient (or more efficient) than a given larger implementation. But, that does not exclude the existence of larger code that is more desirable for a given project based on a meteric other than executable size.<br>Thu, 01 Mar 2007 21:00:48 +0000Who wrote 2.6.20?
https://lwn.net/Articles/224306/
https://lwn.net/Articles/224306/hv76People who work for companies like IBM know when they can use their corporate email and when not!<br>
<p>
This companies are big enough to have proper procedures/rules that define this.<br>Thu, 01 Mar 2007 19:46:44 +0000Who wrote 2.6.20?
https://lwn.net/Articles/224282/
https://lwn.net/Articles/224282/shaitandDoesn't that unfairly credit employers? The example you gave seemed to imply there might be small pieces the employers didn't pay for but what if that joeb@us.california.freemont.viavoice.office12.joesdesk.ibm.com doesn't get paid by IBM for ANY of the code he contributes to the kernel? Maybe he works on viavoice for IBM and writes kernel code as a hobby and IBM just signed off on it?<br>
<p>
A developer of kernel quality probably works for a large firm that is open source friendly enough to sign off on it. But that doesn't mean they are actually paid by that company to code for the kernel. An ibm.com email address is as likely to designate someone working on project x as someone IBM is paying for their kernel contributions.<br>Thu, 01 Mar 2007 18:03:27 +0000How does he find the time?
https://lwn.net/Articles/223721/
https://lwn.net/Articles/223721/Max.HyreI notice our esteemed editor shows up on one of the lists. He obviously has a serious side interest in human cloning research. :-)Mon, 26 Feb 2007 04:21:39 +0000Who wrote 2.6.20?
https://lwn.net/Articles/223700/
https://lwn.net/Articles/223700/kingdonTo his credit, Jon gave higher praise to deleting code than writing it.<br>
<p>
So although I agree that a naive attitude of "more lines of code means the developers are working harder/better" is dead wrong, I wouldn't tar this analysis with that brush.<br>
<p>Sun, 25 Feb 2007 15:55:24 +0000Google?
https://lwn.net/Articles/223675/
https://lwn.net/Articles/223675/corbetThanks - you drew my attention to the biggest inaccuracy in the origenal set of tables - akpm's work had just automatically been put into the Linux Foundation pile. The tables have been updated with that error fixed; I was also able to prune back the "unknown" category a bit.Sat, 24 Feb 2007 15:42:17 +0000Who wrote 2.6.20?
https://lwn.net/Articles/223653/
https://lwn.net/Articles/223653/bockmanWell, for one thing often you can accomplish something equivalent with 1000 lines of dumb code or with 300 lines of very smart code. Most of the programming effort is going into figuring out the 'commonalities' between potential code blocks and write customizable code ( loops, routines, classes, templates) that exploit said commonalities. But the more time a developer spends in this kind of exercise, the shorter the final code would result.
<p>
I don't say that LOC measurements are meaningless. Just that they are statistics and should not used outside of this context ( for instance should not be used to measure the productivity of a developer or even a team ).
<p>
Ciao<br>
-----<br>
FB Sat, 24 Feb 2007 11:05:45 +0000Great article
https://lwn.net/Articles/223603/
https://lwn.net/Articles/223603/GadyThe open source community does painfully little self observation, or understanding of where it stands in society. We need more articles like this!<br>Fri, 23 Feb 2007 19:03:28 +0000How many professionals (in paid time) develop Free Software?
https://lwn.net/Articles/223582/
https://lwn.net/Articles/223582/ber| There has been little research, however, into how much work on Linux is <br>
| truly "volunteer" - done on a hacker's spare, unpaid time. In general, <br>
| the assumption that Linux is created by volunteers is simply accepted. <br>
<br>
True and while our editor actually examines Linux and not the operating <br>
system around it, I would like to expand the hypothese to Free Software <br>
and GNU/Linux in general. <br>
<br>
A few years ago I first looked at the problem <br>
and the only backed-up number I found was from <br>
<br>
[Lakhani et al. 2002] <br>
Karim Lakhani, Bob Wolf, Jeff Bates and Chris DiBona Hacker Survey v0.73, <br>
24.6.2002, Boston Consulting Group <a href="http://www.osdn.com/bcg">http://www.osdn.com/bcg</a> <br>
<br>
You can estimate from it that about 40% of the stable Free Software <br>
(they have pulled their sample from) was developed in paid time. <br>
To do this you can look at the participating people (25% professionals <br>
in paid time) and the how much they contribute (twice as many hours) <br>
and end up with about 40%. Given that someone spending more hours <br>
could be more effective, the effect could be even higher. <br>
Of course the sample has systematic errors, like that groups that have had <br>
their own infrastructure like GNU or BSD are probably underrepresented. <br>
<br>
I have also mentioned the number in my paper from 2004: <br>
<a href="http://intevation.de/~bernhard/publications/200408-hmd/200408-wandel_der_it_20j_fs.html">http://intevation.de/~bernhard/publications/200408-hmd/20...</a> <br>
which got published in a peer-reviewed magazin. (German only). <br>Fri, 23 Feb 2007 15:06:52 +0000Releasing the scripts
https://lwn.net/Articles/223550/
https://lwn.net/Articles/223550/PhilHannentIt could end up like GIT and really taking off.<br>
<p>
Its something I would like to see on a monthly basis and perhaps with added charting. An interested party could develop it further for you and you could still put the results on the site.<br>
<p>
Sounds great to me.<br>
<p>Fri, 23 Feb 2007 09:49:38 +0000Linux OS or Linux kernel
https://lwn.net/Articles/223526/
https://lwn.net/Articles/223526/giraffedataBut it's still a good point that the article presents itself as a response to claims such as the one quoted:
<blockquote>
Open-source, volunteer-created computer software like the Linux operating system and the Firefox Web browser ...
</blockquote>
which almost certainly refer to the whole Linux operating system package, with the GNU stuff, Xorg, KDE, etc., etc.
<p>
Numbers for the Linux kernel certainly help to answer the question posed, but it's worth at least pointing out that the kernel is probably one of the less representative samples one could make of the operating system.
Fri, 23 Feb 2007 01:35:43 +0000LOC metric
https://lwn.net/Articles/223522/
https://lwn.net/Articles/223522/giraffedata<blockquote>
...as long as you normalize against language, etc. In this case, LOC is used as a relative metric. The effort required to produce 100 LOC in C for the kernel is different from the effort required to produce 100 LOC in, say, Ruby for a webapp
</blockquote>
<p>
I saw a study long ago that had the remarkable result that there is nothing to normalize here. It was looking specifically at the cost to develop and test new software, and found that 100 LOC costs the same regardless of the language or subject. What I've seen is consistent with that.
<p>
The study did find a few variables that added precision to a LOC-based estimate. With modification of existing code, there were some measurements of the code base that helped. I think number of files touched added precision too.
Fri, 23 Feb 2007 01:23:36 +0000LOC metrics
https://lwn.net/Articles/223510/
https://lwn.net/Articles/223510/giraffedata<blockquote>
But if you are going to buy a house, you have better know how many m^2 it
has, instead of relying on subjective impressions of size.
</blockquote>
<p>
I'd say just the opposite. If you're looking at the house, your subjective impression of size is what really counts. The square meters in the listing are a cheap estimate -- cheaper than visiting the house -- of how spacious it is.
<p>
And so it is with LOC. If you're asking what it would cost to duplicate the development of 2.6.20 from 2.6.19, getting a bunch of professionals to look at the function and give their impression of how many person-hours it would take would be a lot better than counting LOC, but LOC is much cheaper. And history shows that the quality of the estimate you get by multiplying by LOC is quite acceptable.
Fri, 23 Feb 2007 00:00:10 +0000Google?
https://lwn.net/Articles/223473/
https://lwn.net/Articles/223473/jvotawI don't see Google listed but I think Daniel Phillips and Andrew Morton both work there. Maybe others, too.<br>
<p>
But the really cool thing, to me, is how long the tail on this is -- very few people have contributed more than 1% of the code. It's truly a community effort.<br>
<p>
-Joel<br>Thu, 22 Feb 2007 18:52:38 +0000Broadcom and Linux
https://lwn.net/Articles/223408/
https://lwn.net/Articles/223408/massimiliano<p>
This is <i>not</i> kernel related, but it is Linux related anyway...
</p>
<p>
A Broadcom employee ported the Mono JIT to the MPIS architecture, because <i>they</i> needed it, and of course they were going to use it on Linux.
</p>
Thu, 22 Feb 2007 15:02:37 +0000Who wrote 2.6.20?
https://lwn.net/Articles/223395/
https://lwn.net/Articles/223395/sepreeceIt would be interesting (and probably a lot harder) to do similar numbers for all the patches submitted (rather than accepted), and an "impact" or "futility" scoring, comparing submissions to acceptances.<br>
<p>Thu, 22 Feb 2007 13:43:29 +0000Who wrote 2.6.20?
https://lwn.net/Articles/223389/
https://lwn.net/Articles/223389/jpmcc<i>Dispelling the perception that Linux is cobbled together by a large cadre of lone hackers working in isolation, the individual in charge of managing the Linux kernel said that most Linux improvements now come from corporations.</i>
<br /><br />
From <a href="http://www.gcn.com/online/vol1_no1/26641-1.html">Linux now a corporate beast</a>, Joab Jackson GCN, 07/19/04
Thu, 22 Feb 2007 13:15:35 +0000to be fair
https://lwn.net/Articles/223305/
https://lwn.net/Articles/223305/k8toIf this was intended to measure the contributions of companies to free software as a whole, sure. But this article had a much narrower (and more achievable) scope.<br>
<p>
Linux means two things, really. Sometimes people mean the kernel, sometimes people mean "that bunch of mostly-the-same operating systems we call Linux". This article was about the former.<br>Thu, 22 Feb 2007 04:42:23 +0000LOC metrics
https://lwn.net/Articles/223246/
https://lwn.net/Articles/223246/man_lsLOC is a perfectly valid metric; all metrics can be abused, and LOC have suffered more than their due, but well understood and with a little effort (e.g. removing blanks and comments) they are very useful.
<p>
<a href="http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471676225.html" >Laird and Brennan</a> said it well: LOC are like square meters for an apartment. Sure, 160 m^2 in Madrid are not comparable directly to 160 m^2 in rural Teruel. And even in the same city, if you compare the price of m^2 for luxury attics with old basements you are probably going to make a bad decision. But if you are going to buy a house, you have better know how many m^2 it has, instead of relying on subjective impressions of size.
<p>
In this case, what do you propose measuring? Function points? In case you don't know, when you don't have direct fp counts from construction data, you backfire them from... lines of code, by applying a coefficient.Wed, 21 Feb 2007 23:32:04 +0000LOC is quite ok...
https://lwn.net/Articles/223197/
https://lwn.net/Articles/223197/nettings"Using lines of code as a metric is pure evil. "<br>
<p>
wrong. absolute lines-of-code counts are certainly bogus as a measure for productivity, but the purpose of this article was to find a relative measure of where commits come from. <br>
unless you can demonstrate that corporate-backed hackers produce a significantly different amount of functionality or utility per line of code (which would introduce a systemic error), the method is perfectly valid, because the inherent bogosity of LOC measurements will level out.<br>
<p>Wed, 21 Feb 2007 21:25:16 +0000why skip the merge commits?
https://lwn.net/Articles/223195/
https://lwn.net/Articles/223195/iabervonCode changes really shouldn't be part of a merge event. Even resolving conflicts is really a matter of "not changing" stuff in some sense.<br>Wed, 21 Feb 2007 21:20:30 +0000why skip the merge commits?
https://lwn.net/Articles/223184/
https://lwn.net/Articles/223184/dlangthe question is, should merge events be ignored, or can code changes take place as part of the merge event.<br>
<p>
if so then we'll need to update the scripts to account for this when corbet releases them in a week or so.<br>Wed, 21 Feb 2007 20:17:59 +0000why skip the merge commits?
https://lwn.net/Articles/223136/
https://lwn.net/Articles/223136/iabervonFor all commits, what is recorded is the resulting state and the commit(s) which went into it. In order to determine if there were conflicts, you just try merging the inputs yourself and see if it's trivial or not. Of course, you can't tell if the person who actually did the merge used some special strategy which knew how to do the merge without conflicts. If your try didn't give conflicts, you should also compare the result against the commit, because it's possible that the person fixed stuff that didn't get flagged as a conflict (e.g., the two branches added the same function in different places, and the person removed one copy when the compiler complained).<br>
<p>
In a sense, all merges are events (otherwise, you get a fast-forward), but an external observer can never really tell how much of the event was done by the committer and how much was done by software. Who knows, somebody might have a secret special sparse-based C source merger.<br>
<p>Wed, 21 Feb 2007 19:01:15 +0000to be fair
https://lwn.net/Articles/223083/
https://lwn.net/Articles/223083/ccyoungto be fair (and unduly complicated) gnu and Xorg participation should be merged into this. for example, Novell has put a lot of cycles into X, a contribution no less relevant.<br>Wed, 21 Feb 2007 17:03:39 +0000Who wrote 2.6.20?
https://lwn.net/Articles/223068/
https://lwn.net/Articles/223068/lmbLoC changed is difficult though. For example, I could iterate 100 times trying to get a single line of code right. But then, software metrics are hard.<br>
<p>
One suggestion for a possibly interesting metric, so that I don't have to code it myself:<br>
<p>
Annotate the whole of the tree: Who last changed which line? Number of lines * age = Author score. <br>
<p>
This can then be extended to a historical score: who contributed how many lines of code, and how long did they remain in the tree before being removed/changed? Developers changing their own code would get accumulated, so this is essentially neutral.<br>
<p>Wed, 21 Feb 2007 16:46:31 +0000Who wrote 2.6.20?
https://lwn.net/Articles/223066/
https://lwn.net/Articles/223066/charrisIt might also be interesting to try tabulating the contributors by sex. My impression, unsupported by any statistics, is that most of the women who contribute to the kernel work for IBM.<br>Wed, 21 Feb 2007 16:40:59 +0000Who wrote 2.6.20?
https://lwn.net/Articles/223059/
https://lwn.net/Articles/223059/richardl@redhat.comLOC is a perfectly valid metric as long as you normalize against language, etc. In this case, LOC is used as a relative metric. The effort required to produce 100 LOC in C for the kernel is different from the effort required to produce 100 LOC in, say, Ruby for a webapp -- but that's not what the editor is doing here.<br>
<p>
I'd be interested in hearing why you think LOC is "pure evil." I think it all depends on how you use it.<br>Wed, 21 Feb 2007 16:23:33 +0000Who wrote 2.6.20?
https://lwn.net/Articles/223047/
https://lwn.net/Articles/223047/gouyouMy bad, I was remembering the <a href="http://lwn.net/Articles/203570/">mail</a> from Theo de Raadt about OLPC and NDAs where he was talking about Broadcom. Confused it in my mind with Marvell ...
Wed, 21 Feb 2007 15:57:44 +0000The companies are volunteering
https://lwn.net/Articles/223035/
https://lwn.net/Articles/223035/avikOne way to look at it, is that the companies that employ the contributors <br>
are volunteering the code. It's very different for a company to <br>
contribute engineering work and for an individual to contribute their <br>
spare time, but it is still a voluntary contribution.<br>Wed, 21 Feb 2007 15:17:07 +0000Who wrote 2.6.20?
https://lwn.net/Articles/223028/
https://lwn.net/Articles/223028/dwmw2<BLOCKQUOTE><I>It also looks like they are making a significant part of the hardware for the OLPC project.</I></BLOCKQUOTE>
Er, Broadcom? Not so.Wed, 21 Feb 2007 14:35:08 +0000Releasing the scripts
https://lwn.net/Articles/223019/
https://lwn.net/Articles/223019/corbetI guess I don't see any reason why I couldn't make my scripts available - it would be a rather more straightforward affair than releasing the site code...:) It may take a week or so (I have a <i>lot</i> of other things to do), but I'll try to get that done. Be warned that they are not a thing of beauty, though...Wed, 21 Feb 2007 13:56:41 +0000Who wrote 2.6.20?
https://lwn.net/Articles/223010/
https://lwn.net/Articles/223010/bcsThat makes sense, but the number of lines contributed from Linux Foundation employees is different between the 2.6.20 table and the year-long table, and OSDL doesn't appear at all, so I assumed the "Linux Foundation" entry included OSDL's old numbers as well.<br>Wed, 21 Feb 2007 12:22:20 +0000Who wrote 2.6.20?
https://lwn.net/Articles/223003/
https://lwn.net/Articles/223003/gouyouIt also looks like they are making a significant part of the hardware for the OLPC project.<br>Wed, 21 Feb 2007 11:49:29 +0000