Saturday, October 4, 2008

Analyze this

There is a real story hidden in this silly "analysis" of the language Joe Biden and Sarah Palin used in their debate. But first the bullshit detection.

Text readability metrics are not bunk - until you try to use them this way. They measure text complexity and map it onto grade level. The sort of opaque, self-interrupting sentences that politicians in general - and both Biden and Palin in particular - often utter look complex to the algorithm. Gibberish is often complex, but the readability algorithm doesn't care about underlying meaning.

Paul Payack and CNN appear to be praising Palin's grade level. Instead, they should spend time on clarity of presentation, which Palin can only manage in the most superficial way. CNN and Payack barely scratch that story. After all, it's a judgement call, not something Microsoft Word can give you at the touch of a button.

By the way, Payack apparently did not use Word. When I used its built-in Flesch-Kincaid algorithm, his example sentences gave significantly different scores than he reported. That's another problem with text metrics; there are several of them.

(Note: Word says that this entry's grade level is 8.6. With a little tinkering, I can get it down to 7.5, but that doesn't make it better. A little tinkering in the other direction gets me to 10.5 with little change in meaning.

Also, if you have nothing better to do, you can play on your own with the transcribed text from here.)

No comments: