Photo: Flickr user lifeontheedge

Friday, December 23, 2005

Good data visualizations would be fucking helpful (for evaluating an article's trustworthiness, finding related content, etc.). But right now the best visualizations are run on static wpedia dumps, and even those models are probably too complex for my mid-speed computer -- and if my computer can't handle a dynamic model of a static database, wikimedia servers can't possibly accommodate rich realtime visuals).

Ah, well. At least there's an application for Moore's Law.

Analyzing Wikipedia's Categories -- "the first semantic map of the English Wikipedia data" (pdf)

Three academics have created an awesome visual model of the wikipedia category structure (their model can be tweaked and tilted and filtered in various ways, but they've only provided static snapshots -- bah. My kingdom for a good, free model in Java. Well, in Cocoa and Core Image as long as I'm fantasizing.)

There are three diagrams. I've taken a GIF snapshot of one of them, for those who don't want to page through the pdf:

Click for fullsize image.

Each dot represents a category (not an article). The categories/dots are arranged by how much they have in common -- if an article is in two categories at once, those categories are considered to have something in common. Categories with lots of shared articles (like "Cities in Michigan" and "County seats in Michigan") are therefore close together; so are categories within categories.

The diagram can be color-coded in all sorts of ways, but this particular snapshot color-codes by words included in category names. You can see a cluster of "birth" and "death" categories in the lower right because almost every biographical article is in two of these types of categories (for example, John Adams is in "1735 births" and "1826 deaths" (he shares the latter with Jefferson)).

The text is less exciting than the graphics, but it's good reading, and it reinforces the graphics' coolness and conceptual importance:

By espousing an inclusive point of view policy and involving non-experts in the discussion, Wikipedia arguably has the potential to provide an open and dynamic platform complementary to the scientific peer-review process for reasoned debate on issues for which there is no accepted expert view.

Wikipedia is by no means the first website that relies on massive user participation: anyone can post news on Slashdot, offer goods at eBay or review books at Amazon. These sites, though, facilitate participation through hard- coded reputation mechanisms. Users grade others' contributions, and the website compiles an overall score to help direct further interaction. Reputation is computed based on individual assessments using a predefined algorithm.

Wikipedia, on the other hand, relies on facilitating human interaction rather than superseding it. Encyclopedic content is so complex that 'a process of reasoned discourse' is the only practical way to reach agreement; contributors can get to know each other and a community forms. The decisions on new structures and procedures, such as how to go about deleting articles or when to temporarily block editing, are then delegated to the community as well rather than instituted centrally. Individual reputation forms as 'a natural outgrowth of human interaction'


The resulting constitution of decision making in Wikipedia is hybrid. Members actively avoid majority voting, instead striving to reach consensus on any issue, but can use polls (democracy) as a non-binding tool in this process. Individual users who gain reputation through their contributions form a merit-based aristocracy, with several layers of privilege: anonymous users, regular users, administrators who can, e.g., delete or block pages in a single Wikipedia, and two higher levels that can, e.g., confer administrator status. Mediation and arbitration committees resolve disputes, while a rare issue may require the judgment of the 'benevolent dictator', Mr. Wales (monarchy)

[Note from Wikipedia Blog: I've written about wikipedia's resemblance to real-life political structures.]


This paper presented, to our knowledge, the first semantic map of the English Wikipedia data.

No comments: