Photo: Flickr user lifeontheedge

Saturday, September 09, 2006

Developing Wikipedia literacy.

Awhile back I buried some how-to-read-wikipedia advice in this post:

Wikipedia articles provide an abundance of informal cues that other media lack, cues that can seem unprofessional on first glance. For example, the USA Today article's main image shows (without adjacent comment) the paper's "Miners Survive!" headline gaffe. Since the pupose of the picture is to illustrate the paper, platonically, wouldn't a more neutral photo be better? Yes, but I'm not about to shoot one -- which is precisely the point: nobody cares much about USA Today; it doesn't have any advocates."

USA Today doesn't have any serious detractors, either, so the pic will probably change within a year. (If it was at the center of a controversy it wouldn't have lasted a day.) This type of informal information bubbles up all over. Warren and Livonia are both suburbs of Detroit, but Livonians don't think of themselves that way.

(Sure enough, the USA today pic has changed.)

Tuesday, September 05, 2006

That Wikipedia's written by the public thing got picked up on slashdot, where it birthed a few fresh ideas:

Personally, I think this policy of focusing on total edits for Wikiality is brilliant: it keeps the generalists/prestige mongers focusing on copy editing -- where they can help -- and away from content creation -- where they usually can't. Wikipedia is largely the creation of a bunch of specialist nuts. The "Wiki-Elite" are the nuts whose speciality is Wikipedia. Better to keep them away from the content; otherwise, it's akin to having someone with a degree in journalism reporting on a technical issue.

- DingerX

I think the problem is arising because of lack of distinction between two different types of "editors." There are people who edit the content of an article (content editors), and there are people who edit the copy (copyeditors). One is concerned with altering the actual material that is being presented to present a different subset of information. The other is concerned with making edits for grammatical consistency, readability, and style.

- gEvil

Wikipedia could help crack this whole logjam with some simple user interface improvements. Each titled section should have a "trackback" link for linking to it in another page (eg. if I linked/quoted it in this post). They've already got the "id" HTML tag. In fact, each paragraph should have a "link/quote me" link, maybe even a link that adds an ID to a sentence, phrase or paragraph fragment upon linking to it.

Wikipedia is an "open reference" site. It should include much more support for embedding its content into other content. Each entry could have stats of who links/quotes to it. And an interface with a customizable formula with user-specified weighting to factors like linking/quoting, editing, initiating, commenting. Then we could all easily use the Wikipedia at a meaningful level of granularity, encouraging much more quoting (which encourages more chance of editing by a wider audience), and backfeeding more data about how Wikipedia is created and used.

- Doc Ruby

With all that history data available, why doesn't wikipedia have a "blame annotation" mode so I can see who last touched a given line of an article, and when?

- arodland

Wikipedia really is written by the public -- if this tremendously important bit of research is independently confirmed, it upsets conventional thinking.

Wales seems to think that the vast majority of users are just doing the first two (vandalizing or contributing small fixes) while the core group of Wikipedians writes the actual bulk of the article. But that's not at all what I found. Almost every time I saw a substantive edit, I found the user who had contributed it was not an active user of the site. They generally had made less than 50 edits (typically around 10), usually on related pages. Most never even bothered to create an account.

To investigate more formally, I purchased some time on a computer cluster and downloaded a copy of the Wikipedia archives. I wrote a little program to go through each edit and count how much of it remained in the latest version. Instead of counting edits, as Wales did, I counted the number of letters a user actually contributed to the present article.

If you just count edits, it appears the biggest contributors to the Alan Alda article (7 of the top 10) are registered users who (all but 2) have made thousands of edits to the site. Indeed, #4 has made over 7,000 edits while #7 has over 25,000. In other words, if you use Wales's methods, you get Wales's results: most of the content seems to be written by heavy editors.

But when you count letters, the picture dramatically changes: few of the contributors (2 out of the top 10) are even registered and most (6 out of the top 10) have made less than 25 edits to the entire site. In fact, #9 has made exactly one edit -- this one! With the more reasonable metric -- indeed, the one Wales himself said he planned to use in the next revision of his study -- the result completely reverses.

I don't have the resources to run this calculation across all of Wikipedia (there are over 60 billion edits!), but I ran it on several more randomly-selected articles and the results were much the same. For example, the largest portion of the Anaconda article was written by a user who only made 2 edits to it (and only 100 on the entire site). By contrast, the largest number of edits were made by a user who appears to have contributed no text to the final article (the edits were all deleting things and moving things around).

When you put it all together, the story become clear: an outsider makes one edit to add a chunk of information, then insiders make several edits tweaking and reformatting it. In addition, insiders rack up thousands of edits doing things like changing the name of a category across the entire site -- the kind of thing only insiders deeply care about. As a result, insiders account for the vast majority of the edits. But it's the outsiders who provide nearly all of the content.

And when you think about it, this makes perfect sense. Writing an encyclopedia is hard. To do anywhere near a decent job, you have to know a great deal of information about an incredibly wide variety of subjects. Writing so much text is difficult, but doing all the background research seems impossible.

On the other hand, everyone has a bunch of obscure things that, for one reason or another, they've come to know well. So they share them, clicking the edit link and adding a paragraph or two to Wikipedia. At the same time, a small number of people have become particularly involved in Wikipedia itself, learning its policies and special syntax, and spending their time tweaking the contributions of everybody else.

Other encyclopedias work similarly, just on a much smaller scale: a large group of people write articles on topics they know well, while a small staff formats them into a single work. This second group is clearly very important -- it's thanks to them encyclopedias have a consistent look and tone -- but it's a severe exaggeration to say that they wrote the encyclopedia. One imagines the people running Britannica worry more about their contributors than their formatters.

And Wikipedia should too. Even if all the formatters quit the project tomorrow, Wikipedia would still be immensely valuable. For the most part, people read Wikipedia because it has the information they need, not because it has a consistent look. It certainly wouldn't be as nice without one, but the people who (like me) care about such things would probably step up to take the place of those who had left. The formatters aid the contributors, not the other way around.

Sunday, September 03, 2006

Flickr's added geotagging and there are already millions of photos on the map. 2 thoughts spring immediately to mind:

1. Someone needs to mash this up with Wikimapia. (It can be done; there's an API.)

2. The hill ahead of Digital Universe keeps getting steeper.