Saturday, December 24, 2005
- "Three-strikes laws", under which a third felony conviction yields life imprisonment, are intended to deter repeat offenders. However, they may encourage repeat criminals to kill witnesses — since the sentence for murder is no worse than the sentence for a lesser third offense.
- In India, a program paying people a bounty for each rat pelt handed in was intended to exterminate rats. Instead it led to the farming of rats.
- Paying the executives of corporations proportionately to the size of their corporation is intended to encourage them to grow their companies by growing the bottom line (and not their earnings per share). However, it may cause them to pursue mergers to grow their companies, to the detriment of their shareholders' interest.
Friday, December 23, 2005
Good data visualizations would be fucking helpful (for evaluating an article's trustworthiness, finding related content, etc.). But right now the best visualizations are run on static wpedia dumps, and even those models are probably too complex for my mid-speed computer -- and if my computer can't handle a dynamic model of a static database, wikimedia servers can't possibly accommodate rich realtime visuals).
Ah, well. At least there's an application for Moore's Law.
Analyzing Wikipedia's Categories -- "the first semantic map of the English Wikipedia data" (pdf)
Three academics have created an awesome visual model of the wikipedia category structure (their model can be tweaked and tilted and filtered in various ways, but they've only provided static snapshots -- bah. My kingdom for a good, free model in Java. Well, in Cocoa and Core Image as long as I'm fantasizing.)
There are three diagrams. I've taken a GIF snapshot of one of them, for those who don't want to page through the pdf:
Click for fullsize image.
Each dot represents a category (not an article). The categories/dots are arranged by how much they have in common -- if an article is in two categories at once, those categories are considered to have something in common. Categories with lots of shared articles (like "Cities in Michigan" and "County seats in Michigan") are therefore close together; so are categories within categories.
The diagram can be color-coded in all sorts of ways, but this particular snapshot color-codes by words included in category names. You can see a cluster of "birth" and "death" categories in the lower right because almost every biographical article is in two of these types of categories (for example, John Adams is in "1735 births" and "1826 deaths" (he shares the latter with Jefferson)).
The text is less exciting than the graphics, but it's good reading, and it reinforces the graphics' coolness and conceptual importance:
By espousing an inclusive point of view policy and involving non-experts in the discussion, Wikipedia arguably has the potential to provide an open and dynamic platform complementary to the scientific peer-review process for reasoned debate on issues for which there is no accepted expert view.
Wikipedia is by no means the first website that relies on massive user participation: anyone can post news on Slashdot, offer goods at eBay or review books at Amazon. These sites, though, facilitate participation through hard- coded reputation mechanisms. Users grade others' contributions, and the website compiles an overall score to help direct further interaction. Reputation is computed based on individual assessments using a predefined algorithm.
Wikipedia, on the other hand, relies on facilitating human interaction rather than superseding it. Encyclopedic content is so complex that 'a process of reasoned discourse' is the only practical way to reach agreement; contributors can get to know each other and a community forms. The decisions on new structures and procedures, such as how to go about deleting articles or when to temporarily block editing, are then delegated to the community as well rather than instituted centrally. Individual reputation forms as 'a natural outgrowth of human interaction'
The resulting constitution of decision making in Wikipedia is hybrid. Members actively avoid majority voting, instead striving to reach consensus on any issue, but can use polls (democracy) as a non-binding tool in this process. Individual users who gain reputation through their contributions form a merit-based aristocracy, with several layers of privilege: anonymous users, regular users, administrators who can, e.g., delete or block pages in a single Wikipedia, and two higher levels that can, e.g., confer administrator status. Mediation and arbitration committees resolve disputes, while a rare issue may require the judgment of the 'benevolent dictator', Mr. Wales (monarchy)
[Note from Wikipedia Blog: I've written about wikipedia's resemblance to real-life political structures.]
This paper presented, to our knowledge, the first semantic map of the English Wikipedia data.
The Long Tail weblog has some interesting things to say about wikipedia and other stuff (and scroll down to the comments for a good discussion).
...these systems operate on the alien logic of probabilistic statistics, which sacrifices perfection at the microscale for optimization at the macroscale.
When professionals--editors, academics, journalists--are running the show, we at least know that it's someone's job to look out for such things as accuracy. But now we're depending more and more on systems where nobody's in charge; the intelligence is simply emergent. These probabilistic systems aren't perfect, but they are statistically optimized to excel over time and large numbers. They're designed to scale, and to improve with size. And a little slop at the microscale is the price of such efficiency at the macroscale.
Probability-based systems are, to use Kevin Kelly's term, "out of control". His seminal book by that name looks at example after example, from democracy to bird-flocking, where order arises from what appears to be chaos, seemingly reversing entropy's arrow.
Tuesday, December 20, 2005
Yet another chapter in the embarrassing war for "I Invented Wikipedia!" credit between Wales and Sanger. I tend to side with Sanger, mostly because managers often take credit where it's not due, but it's impossible to know what really happened (nor does it particularly matter).
Gunslinger is a name given to men in the American Old West who had gained a reputation as being dangerous with a gun.
Jesters typically wore brightly colored clothing in a motley pattern. Their hats were especially distinctive; made of cloth, they were floppy with three points (liliripes), each of which had a jingle bell at the end. The three points of the hat represent the asses' ears and tail worn by jesters in earlier times.
A rake is a stock character, a man who wastes his (usually inherited) fortune on "wine, women, and song," incurring lavish debts in the process.
A redshirt is a stock character whose sole purpose is to die violently soon after being introduced.
Nasreddin was a populist philosopher and wise man, remembered for his funny stories and anecdotes. He often appears as a whimsical character of a large Persian, Arab, and Turkish folk tradition of vignettes, not entirely different from zen koans.
Monday, December 19, 2005
A solid Globe editorial with a nice closing line:
The Seigenthaler affair is a reminder that the age of the casual reader, if it ever in fact took place, is rapidly passing away. Most readers may not fancy themselves encyclopedists, authors, or journalists-manqués, but they can no longer assume that what passes for fact is unimpeachable. The ecology of information turns them into editors and reviewers perforce. The effect of this revelation may in time prove healthy-if we wake up to our responsibilities as readers.
"Thus, a straightforward phenomenon such as the probability of finding a raisin in a slice of cake growing with the portion-size does not generally require a theory of emergence to explain. It may, however, be profitable to consider the "emergence" of the texture of the cake as a relatively complex result of the baking process and the mixture of ingredients."