Photo: Flickr user lifeontheedge

Thursday, April 03, 2008

Wikipedia Weekly roundup

Update: Erik Moller responds in the comments -- they're in the process of getting a donated statistics server.

Like I said, the podcast is on a roll.

Episode 43.

Skip the belated discussion on Jimmy's shenanigans -- it's mostly there because they feel obligated to cover it. Start at the 20-minute mark, instead, and don't miss the wii metaphor at 42:11.

(I also got namedropped; hurrah.)

Episode 44.

Stable versions are finally implemented in software! (They're late, they were promised to us in 2006.)

What are stable versions? Well. Unlike a traditional piece of writing, a Wikipedia article is never finished. Over time mistakes get removed, but they also get inserted; writing sometimes gets better, but it doesn't always stay better.

Stable versions are particular revisions of an article that are marked as superior. You'll be able to read a stable version and trust that the very text you're reading has been vetted, that someone didn't add "leona is gay!" five seconds before you opened the page.

Plus, stylistic edits will stick! Right now, you can go through a whole article and shoehorn it into a tight, structured, beautifully clear piece of writing. I used to spend hours doing that. But after a couple months of heavy editing -- people appending facts willy-nilly -- the article always descends back into the murk. It gets more accurate, but waaaaaay less readable. Theoretically, stable versions will fix this problem by inserting periodic finish lines.

Anyway, stable versions are coded into software, but there are a lot of kinks to be worked out before they're actually rolled out. (Social kinks, not technical ones.) In social software, interface decisions are political decisions, so we've got to tread carefully.

The other interesting thing in the podcast: discussion about the million-dollar donation wikipedia just got. It's a fucking godsend. No more need for advertising. (It's sort of funny that five minutes of talking to a rich guy pays off more than years of painstaking debate. Interesting times we're living in.)

But Andrew Lih's also got a dead-on analysis of where the foundation is going wrong. It should be spending its energy trying to keep the community healthy, but it doesn't even have tools for monitoring the community's health.

One of the biggest things that we've had problems with currently is statistics -- I mean, we haven't had good english-langauge statistics since 2006, which is absolutely unforgivable given the way things are going on now.

If we have one million dollars a year, a good chunk of that should be dedicated to finding out what's going on in the community. Is it healthy? Are these projects sustainable? Identifying places of weakness, of strength, and strategizing over that.

We don't need twenty staff members to do that. We do need a statistics server. We do need CPU power, memory, and computing power to do that. And that's something that has been ignored for at least 2 years now, and that's something we really need to do, but we don't have that on the radar screen of the foundation right now.


The dump of the entire english language database is 133 gigabytes! Which means just downloading takes a long time, and to uncompress it, to process it, you need a computer with tons of memory, and we're not talking like 4 gigs or 8 gigs; we're talking multiple tens of gigabytes just to load it into memory, and that's something we just don't have but I think is extremely important.

Tuesday, April 01, 2008

Wikimedia receives 3-year, $3,000,000 grant. That's a large fraction of the budget (and it's from the alfred p sloan foundation, in case you're worried).

Sunday, March 30, 2008

Some tech kids from the university of edinburgh sent me a link to wiki-answers, their wikipedia search engine.