Below are the findings from my recent disambiguation analysis. There were no real surprises, but a few conclusions did stand out for me.
Conclusions
edit- Between January 2007 and May 2009, the number of articles increased 64%; the number with "(disambiguation)" in their names increased 66%, suggesting the total amount of disambiguation needs has increased in line with the total number of articles.
- United States-related terminology was common: "United States" the most common country to appear; and individual state names turned up more than the second country in the list, the United Kingdom. The trend, however, has been to remove state names from brackets-defined disambiguation; in January 2007, 7 different states were more common than "United States" and one state (Missouri) was more commonly found than all the 50 states put together today.
- Music-related terminology was the most common theme: three of the top four terms were related. These accounted for just under a sixth of the whole population, and just under a fifth of terms used more than once (53,789).
- The need to disambiguate between people was clear: a profession was a component of a sixth of the sample, though the profession-born-year notation was surprisingly uncommon, considering the number of times it causes controversy (only 1437 times in the entire population).
- Since January 2007, the following other shifts in usage have occured:
- movie has successfully been amalgamated into the term film;
- both game and computer game have fallen out of usage, many instances becoming video game;
- football player has become footballer;
- every use of constituency (there were 565) now defines of what it is a constituency;
- many use of single have been phased out in favour of other alternatives such as song, or the more specific EP;
- MO, a contraction, has been deprecated;
- television has been replaced by more specific alternatives;
- the names of sports such as ice hockey and American football have been some of the biggest gainers.
Tabulated data
edit
Top 50edit
The above list is imperfect, and some entries towards the lower end of the table may be missing, due to errors carried forward with the sampling. |
Special groupingsedit
n.b. All counts in the table above are from the sample, not the full population.
|