Photo: Flickr user lifeontheedge

Tuesday, March 04, 2008

The Final Wikipedia Article

Wikipedia Weekly is a podcast. (iTunes) It is also the single best source for Wikipedia commentary.

This week's episode is actually the first in several months, but boy is it worth it.

It starts with some reassuring financial news:

The foundation is now audited properly; I think it's doing quite well. They're running around San Francisco drumming up support, drumming up big donors ... it's being run more professionally ... There is enough goodwill out there, there are enough rich people who are willing to fund us if we ask. And the foundation is now going about doing that.

But the really awesome part is this discussion around the 40-minute mark. It was so good I transcribed it:

- There's only so many articles you can actually make -- a lot of people see wikipedia as being "finished", in a way. Once you conquer everything, what is left to conquer?

- "All the virgin areas have at least been addressed", right? In terms of the knowledge of humanity. And you're kind of replenishing things on top of it that are current events, and you might be shoring up some parts of it, but the major work has been done. So how does the project survive when most of the work has been done?

- I'll quote that back to you in 20 years, fuzzy, when wikipedia is -- you know, owns its own mountain. Like that famous guy who closed down the patent office in like 1890 and said "oh, there's nothing left to invent; it's just a classification scheme from now on".

- I understand that sentiment, but you have to admit that with 2 million-some articles, most of the sum of human knowledge has been put down on virtual paper in Wikipedia. Right? You can't argue against that. That is a fact!

- No. I argue against that.

- The sister projects also have a big reach that they have to cover -- wikispecies, wikiquotes, wikiversity.

- But I think that's one of the fallacies of the wikiprojects -- that you can suddenly turn to these people who are interested in writing about Pokemon and sex acts and cartoons and Naruto and say "now, take the time that you had, and start making a taxonomy of human life". No -- you just can't do that, because they're not interested in it. Or "come in and write definitions".

- It's because those people who were interested in those things were the first to arrive, the first adopters of Wikipedia. But millions of people out there -- think of the academics in religious studies. They haven't caught on to this yet, they haven't come to this. When the baby boomers all start retiring and have the time to actually edit Wikipedia, and they know it and they're used to the idea, they'll fill out all their stuff. There are whole areas that aren't nearly as covered as Naruto and molecular biology, and we're all aware of that. It doesn't mean that we're nearly finished, and we don't need to take those people who were on Naruto and divert them onto this new topic. There's lots more to be done.

- Well, I think there's more to be done. I'm not sure there's lots more to be done, that's the thing.

- There's heaps more -- we've just scratched the surface.

- No, I think we're 80% done with the surface (laughs). The sum of human knowledge is a finite thing, right? It's growing, but it's finite.

- No! No!

- What about the articles that'll happen about future events, crazy kids who have parties and get themselves arrested and stuff like that?

- But then that becomes current events, keeping up with the headlines. And that's my passion, I love writing articles about current events as a role in history of what's going on, but if we have 2-point-some million articles now, do you see it being 20 million articles in five years?

- Yes!

- You do? I actually don't.

- What else is there to write about?

- We're seeing the top of the S-curve already, Liam! We've already done podcasts about this. We're hitting the top of the S-curve, you can see that in the statistics. You can't make it to 20 million in five years given the curve we see now.

- Tell me in five years. Tell me the same thing in five years.

- Alright; this is a virtual handshake.

- You can't be the patent office man and say, "i know what's going to happen in the future". I can't tell you which articles haven't been made, because they haven't been made yet.

- The difference is that with patents, with creating inventions, that's synthesis. It's creating new things out of elements that you have right now. If you're talking about the sum of all human knowledge, that's a very different enterprise. You're documenting what is known in the known world. It's finite, but it's growing -- but it's not growing at the exponential rate all the time, it's not infinitely growing in exponential ways.

- But you're assuming that the only way wikipedia would grow is by adding new articles.

- Well, it has to, unless you think that the cat article should grow from five pages to ten pages to 20 pages to fifty pages.

- Or we get more loose with our standards and we allow every single thing ever to become an article.

- Quantity does not equal quality.

- I agree with you somewhat and I think there's going to be a new type of article. You already have articles about africa, then you have articles about the economics of africa; and then you're going to have comparisons of african economics to blah blah blah, or "the evolution of african economics through history". So you're going to have some "analysis" articles that are growing --

- -- precisely. Precisely.

- Those type of things might grow, but those are tougher to write. And that's why I think the growth will be somewhat limited.

- They'll be tougher to write and they'll be slower to write, but they're just as important!

- I agree.

- If not more important. We've got the article on africa, and we've got the article on cat, but we haven't got the article on all types of variants in the history of the cat, or the economy of africa, or whatever.

- We'll have a special gambling episode where we all bet on how much we think it's going to grow in the future. There used to be a pool on wikipedia that predicted when we're going to hit 1 million, 5 million, etc. I think it's still around, and we should take a look at that.

- And not just there was a page called the "millionth article pool", which was not just when it was but what that article would be entitled. (laughter)

- There's also the final article pool. (laughter)

- The final article pool. What is the last article in wikipedia going to be about?

- 42!


David Gerard said...

The Wikipedia is a work in progress essay has links to missing article lists at the bottom. One guess is at least 20 million possible articles we've yet to write, easily.

Every state-level politician ever in every country? Every town in every country, the way we have complete coverage of towns in the US? Every article in every existing encyclopedia and every other Wikipedia? The WikiProject missing topic lists? The requested articles? Editors' own lists?

It's not even the low-hanging fruit that's been picked - it's the fruit that was just lying on the ground. There's plenty still in arm's reach.

Bored? Policy-weary? Write something.

llywrch said...

David Gerard's comment misses an important point: the "low-hanging fruit" theory doesn't apply to Wikipedia content in general, but to the current mode of content creation.

Based on my years of experience both improving content & watching or helping other editors do the same, our current mode of production is based on the premise of "I have a spare hour right now; what can I do?" So contributions tend to be of the easy-to-do variety: fix misspellings & typos, format articles, sort stubs. A few more imaginative types (like me) will grab a book, & add material from it to relevant articles as I read through the source.

But what about creating articles from scratch? Improving articles on general topics? Or writing articles on topics absent from most encyclopedias which require combining material -- while avoiding the problems of original research? This requires a lot more time, effort, & discipline. But what if after all of that effort the contributor runs into serious problems, say another editor starts making drastic revisions to this work? (Often the revisions are improvements, but almost as often they are not.) Yes, we are all supposed to check our egos at the door before we click on the "edit" button, but it would be a wrong to say that this always -- or even most of the time -- happens.

Retreating to the corners of Wikipedia to do this kind of work only delays the problem. An example is the current ArbCom case concerning PHG: an editor did just that & now a number of other editors are concerned about the quality of his work. (At least one of his articles under suspicion even passed a Featured Article review.) Anything a constructive contributor can do, so can a tendentious one.

Even if this were not the case, I can tell you from experience that working on articles in these little-frequented areas gets lonely; it would be nice to get some useful & timely input one the many articles I have worked on concerning Ethiopia. Or even receive more than the occasional clue that my contributions are being read. As a result, when I find that I have the time to start working on new articles for which I have collected ample source material recently I have decided to spend my time on other things.

This is how we lose the dedicated editors who actually try to reach up into the tree.


Sage said...


If you want to know how much your articles are being read, there's a hit count utility up now. (I don't know when it went live, but I just noticed it a few days ago.)

Brian Mingus said...

.2% of articles are Featured class.
.7% of articles are A class.
.3% of artcies are GA class.
4.4% of articles are B class.
26% of articles are Start class.
68% of articles are Stubs.

It's extremely easy to create a stub, but extremely hard to write an article. The number of "articles" by the loose definition used is a poor measure of how much work has to be done. The number of edits per article has been increasing faster than the number of articles created for years, and that trend will continue.