Photo: Flickr user lifeontheedge

Thursday, June 19, 2008

Deletionism by the numbers

Deletionism continues to annoy the public. Plus, they finally know it's called deletionism!

It wouldn't matter to me if it was just about the article being deleted. But it's about more than that, which makes me sad (see kitten).

Let's run the numbers.

Everyone who adds knowledge to wikipedia does so for a reason. The motives might vary (that warm fuzzy feeling of helping humanity? enjoying the sound of one's own fingers on the keyboard?), but let's invent a generalized unit of contributor motivation called a kitten.

1 kitten = the amount of motivation needed to get 1 person to spend 1 minute trying to improve an article

We can say, quite literally, that Wikipedia runs on kittens. In fact, entrepreneurs discover this every day when they try to start a "crowdsourcing" site and nobody shows up.

So, what generates kittens? Foremost, it's the possibility of someone else learning from what you wrote -- not just immediately, but at any time in the future. (This point is vital; I'll come back to it.)

The Heavy Metal umlaut article contains a history section that begins:

The German progressive rock band Amon Düül II (aka Amon Duul II) released their first album in 1969. However, their name came from "Amon, an Egyptian sun god, and Düül, a character from Turkish fiction", so this use of umlauts was not gratuitous. The third part of Yes's progressive rock epic "Starship Trooper" is entitled "Würm" (on The Yes Album, released 1971). However, this again is probably not gratuitous, seemingly coming from the Würm glaciation.
That section is read by about 700 people per day.

Now, how many people will read that section over the entire course of history? It's actually possible to estimate this using calculus. (Even if, like me, you failed that course.)

Imagine that for every day that passes, there's a 1-in-ten-thousand chance that the Heavy Metal Umlaut article will vanish even without getting formally deleted (wikimedia servers might perish in a global thermonuclear war, for example).

Now, if this has already happened, and you have somehow escaped ArthurDent-style, you can key in the data to your portable Hitch Hiker's Guide and come out with something like this:



The area underneath the line is the total number of views. If the article has been around for five years, that's:

5 years x 365 days a year x 700 views a day = 1,277,500 views

That's a lot of kittens. And note that the sooner nuclear war happens, the fewer kittens there are (because who's going to write about umlauts when they should be stocking the fallout shelter?).

Even if you have no way of knowing the exact date the article will be destroyed, only the chance it has of surviving each day, you can still graph the effective number of times the article will be viewed on a particular day in the future.




Even with a 1-in-a-ten-thousand chance of blowing up each day, the article's likely to last the better part of a century!

Just the same way, you can find the total number of times the article's been viewed before a particular day. This is just the area underneath the first graph:




With a 1-in-ten-thousand chance of being destroyed each day, the article will rack up exactly seven million views over its lifetime.

That, my friends, is just a fuckload of kittens.

So what exactly is the point of all this? And did you really fail calculus?

Yes, but nevermind that. The point is that deletionism is very damaging.

This is a story about trees.
I think of the oak beams in the ceiling of College Hall at New College, Oxford. Last century, when the beams needed replacing, carpenters used oak trees that had been planted in 1386 when the dining hall was first built. The 14th-century builder had planted the trees in anticipation of the time, hundreds of years in the future, when the beams would need replacing.
Think about that: what would it be like to live in a climate of such incredible stability? And how many factors allowed the college to survive?
  • It was never expunged by a theocratic regime (political stability).
  • It was never destroyed in a war (geographic location on a rainy island far from napolean and hitler).
  • It was never invaded by marauding wolves (luck?).
If you wanted to graph Oxford University, its daily survival rate would be like 99.99999%.

Stability is a good thing. (There's a reason singapore solved malaria before it could embark on economic growth. Time-sensitivity keeps people poor. If you're worried about dying from malaria, you might just take that 5000% payday loan.)

Stability is also what gets people to write articles for keeps. And even tiny changes in the daily chance of deletion create huge cumulative effects over time.

These videos show the lifetime pageviews of an article, just like before. (The horizontal scale is ten years, instead of a hundred. The first video shows views per day; the second shows total views.)

At the start of the videos, the daily chance of the article vanishing is 1 in 10,000. At the end, it's 1 in 500.



Here are direct links, if you're reading this via feed: 1, 2.

But that's oversimplified!

Well, it's just an illustration. There are all sorts of feedback loops and variations between articles, and the whole thing is far too complex to analyze using the mathematical part of your brain; you need the social part (which is 10 times bigger).

And really, the social part already knows that deletionism sucks. :)

I'd like to thank red bull for making this article possible. If you enjoyed it, and want to help me buy more red bull, you can pre-order our upcoming book, How Wikipedia Works.

Cheers,

Ben

7 comments:

pfctdayelise said...

Well hey... thankyou Red Bull!

Great post, Ben, and great graphs.

llywrch said...

You eat a lot of Red Bull, Yates, back in the hippie days?

Geoff

LA2 said...

I guess you need to have failed calculus to understand these graphs.

What exactly has future readership got to do with deletionism? If I write that the president has a love affair with an actress, that would get a lot of readers, but it isn't true so it isn't the kind of free knowledge that Wikipedia is about. The problem with non-notable subjects that can't be verified by sources is that we can't know if they are true or not.

Should we promote free knowledge or any free stuff that people might want to read? That's the issue here.

Ben Yates said...

Geoff: It was really back in '57, with johnny and the crew, up in the palo alto strip. (But I hardly need to tell you that.)

Ia2: I didn't exactly say "future readership"; I said "the possibility of someone else learning from what you wrote". Wikipedia already does a pretty good job of alienating people who are trying to spread misinformation, so I don't think that's a huge concern.

I think we can both agree that Heavy Metal Umlaut is an article that could never have been started today -- or it would be so much work to keep it from getting deleted. That's a bad thing. Really, all you need to do is go to AfD to see well-cited articles coming from a different cultural perspective (like this one on a way of thinking about music common in northern europe) getting wiped away because of self-appointed purifiers who haven't bothered to take the time or energy to actually understand where the article's coming from or think about what it could become in the future. They're squelching out a lot of potential in advance, and that fact is immediately apparent to anyone who explores the site.

I still like wikipedia, and will still like it even if it becomes way more deletionist, but (as I said) this type of thing is a little sad.

Ben Yates said...

Here's the AfD for that music article. Note that despite the term having 60 thousand google hits, the people who happened to be in the room decided it was original research. Poof, the article is gone forever. It will never attract more scandinavian musicologists. It will never be incorporated into a cool dynamic music-genre-tree built in 2012.

Hell, I don't think I would have started the article Metrosexual now; I wouldn't have wanted to throw a bunch of time into babysitting it in AfD.

Ben Yates said...

Er. Also, note that sleaze rock had about 7000 pageviews a month and was referenced in about 50 other articles.

llywrch said...

Considering Ben's recent responses:
1. This trend towards doctrinaire deletionism is disturbing. As for "articles like this from the past would no longer be allowed to start and grow"... I've seen a few examples of articles from the past which were a solid part of accepted Wikipedia culture being tossed into the maws of AfD to vanish. (BJAODN, for example.) In other words, even [[Heavy Metal Umalaut]] is not safe.
2. On the other hand, certain kinds of articles always draw out the unimaginative types who want Wikipedia to be just like a 1950s encyclopedia -- even if they have never seen one. Some kinds of articles will always need to be baby-sat to keep from being deleted. (I look forward to my fight over "Ethiopian Revolution" & "Ethiopian Civil War" -- while it is an agreed point that there was an Ethiopian Revolution, when it exactly ended, & when that country's civil war began is an unresolved issue, one which I plan on solving by simply selecting an arbitrary event in time: everything that happened before that point is part of the "Revolution"; everything after is part of the "Civil War".)
3. Back when I was more involved in AfD, whether an article had any links to it was one of my tests whether to keep/delete; a pretty good kook-detector is if certain kinds of articles have any links to them. I wish that this had been more widely accepted.

Geoff

PS -- I'm a little surprised you didn't pick up on my allusion to the movie "Repo Man", but I understand: the life of a Wikipedia editor is always intense.