It's unwise to overlook significant data trends, even if you're not sure what's causing them.
That was my immediate response to Wonkblog's Ezra Klein, who just published a column titled "Data are no good without theory." Read that again. Just rolls off the tongue, doesn't it? The thing is, I always like to stop and think when someone says something that is supposed to be just obviously true.
Data are no good without theory. Or, stated more fully... But that's just what Klein doesn't do. Instead, he gives us some examples:
- If the Washington Redskins win their last home game before a Presidential election, the incumbent party wins -- a rule that has held true in 18 out of 21 elections since the Redskins moved to DC.
- Statistical data about climate change is "a bit on the noisy and ambiguous side" -- it's believed because the underlying causal explanation is persuasive.
- In the absence of good reasons to believe that "a high debt:GDP ratio would cause slow real growth even in the absence of high interest rates," you look for overwhelming empirical evidence that it's the case (and Reinhart-Rogoff notoriously didn't have it).
Well, that's a nice, quick ramble around the data rose garden, but if you're thinking it doesn't really support the headline claim, I'd say you're right. The first example is just silly -- it's not that we don't know how football results cause election outcomes; rather, we know they don't.
The second example is misleading. It's hardly surprising that evidence in such volume should have some rough edges, but it's overwhelmingly consistent. If you don't agree, go argue with NASA. The final example is confusing, because the authors got some Excel coding wrong.
Let's rewind. Between 1950 and 1954, a series of epidemiological studies published in the US and the UK showed a powerful statistical association between smoking and lung cancer. The data was sufficient for health authorities to issue warnings about smoking and health, and ultimately for tobacco to be classed as a human carcinogen.
More than 50 years later, researchers are still looking for the precise causal mechanism involved, and are still looking for a reliable toxicological model. This is an example of data being so persuasive that you don't wait for a satisfactory explanation before acting on it.
If you want to look further back, John Snow, a London physician, halted a cholera outbreak in 1854 by having a public water pump shut down. He didn't know the mechanism of transmission of cholera -- nobody did then -- nor were analyses of water samples from the pump conclusive. The data showing incidence of cholera among pump users was unexplained, but nevertheless conclusive. Data scientists can still learn from his work.
Here's the takeaway. If the data makes a compelling case that a business strategy is wrong -- that it's shrinking revenues and alienating customers -- it's probably best not to go full steam ahead just because you don't know why the bad effects are happening.
In fact, changing course might help elucidate the causes of the problem.
Sure, it's possible that someone entered the wrong numbers in the spreadsheet. If Klein had written, "Bad data are no use at all," he'd have been right. And yes, in an ideal world, it should be possible to describe the reasons for data trends.
But good data has its own story to tell, independent of its theoretical underpinnings. Be sure to listen.
— Kim Davis , Senior Editor, Internet Evolution