Flu season is almost over and the healthcare community is using lessons it learned this year to prepare for the 2013-2014 season.
One tool at its disposal is Google Flu Trends, a fascinating experiment that represents an intersection of numerous buzz-worthy terms -- data mining, crowdsourcing, analytics, and digital healthcare. More impressively, it combines these tactics into a cohesive product that actually produces important results. But as many lessons as we can learn from its relative success, we can also learn some key lessons from its recent miss, in which it predicted a peak infection rate that turned out to be nearly twice that of the Center for Disease Control's real-world data.
So what can your firm learn from Google’s algorithmic misfire?
Sampling-based predictions sometimes fail -- and so does crowdsourcing
Any company involved in analytics can learn a lot from Google Flu Trends.
(Source: Creative Commons / Flickr / William Brawley)
While Google missed, it was not alone; the Flu Near You service, which relies on volunteer self-reporting (a different data collection method), underpredicted the infection rate by almost as big a margin. The best tracking method is, of course, to collect data that is comprehensive and absolute (which is what the CDC does with its flu tracking by taking information directly from medical centers). However, such tracking is typically slow.
To get faster tracking, there are two major classes of analytics: Self reporting (volunteer programs and surveys, for example) and crowdsourcing. Each has its own flaws. Self-reporting tends to suffer from smaller data sets (mainly because people don’t have the time or inclination). Crowdsourcing can suffer from skewed demographics or emotional factors. For example, in the Google Flu Trends miss, widespread media coverage provoked public fear (an emotion) leading to more searches about flu, causing Google’s algorithm to overshoot.
Use multiple strategies
Both crowdsourcing and self-reporting failed to accurately predict flu trends this winter. But that doesn’t make them bad tools. Rather, it shows the importance of not being overly reliant on one data analysis tactic. Try to incorporate multiple strategies into your company’s toolkit.
Anticipate the effects of emotion, media
Whether you’re studying brand perception or trying to analyze the stock market, humans are emotional creatures. Thus algorithms designed to predict or quantify human behavior must be placed in the context of emotion. In modern society perhaps no single factor sways human emotion as much as mass media. The media doesn’t always agree on everything, but when they do focus unilaterally on an issue (like a record flu season), a savvy algorithmic would recognize certain public behaviors would be inflated (such as, for example, over-searching flu terms).
Whatever your business or organization, you likely have a wealth of data already at your disposal. While lacking the cachet of slick tactics like crowdsourcing or volunteer self-reporting, old-fashioned collection of massive, transparent data sets can be tremendously helpful. While it may be (generally) a slower, more expensive, and less effective method than more modern data-analysis tactics, compiling such data allows you to fact-check your algorithms and results collected via other strategies.
Tirelessly improve your algorithms
If you have an algorithm, chances are it is not flawless. Humans are imperfect and so are our algorithms. Developing optimal data mining requires a commitment to continuous improvement on the algorithm front. Don’t take your misses as failures; take them as opportunities for future gains. Google took a long time to figure out how to make Flu Trends as accurate as it is today -- and yet there’s still a long way to go to make it even better and prevent future misses.
If you adopt a mindset of continuous improvement, your algorithms will benefit.
— Jason Mick is senior news editor at the independent tech news site DailyTech.