Chances are that you’ve left a considerable electronic trail behind you in your travels across the Internet. Your email address, mailing address, birth date, age, credit card numbers, and more are all stored in scores of e-commerce systems, social networking sites, and maybe even a job board or two. And your business likely has a trove of similar data about everyone you’ve ever done a transaction with over the Web.
Of course, the longer that data is there, the greater the probability that it will be inappropriately disclosed -- either accidentally, or through a cyber-attack. The resulting exposure can lead to identity theft, or any number of digital assaults on individuals’ privacy. And as the folks at TJX Corp. can testify, that sort of data breach can cost your company hundreds of millions of dollars as well.
A Dutch researcher proposes that the way to eliminate the risk of accidental data disclosure is to let the data slowly decay until all the data fades away. Dr. Harold van Heerde of the Centre for Telematics and Information Technology (CTIT) at the University of Twente is researching ways to gradually replace details
from a database of personal information with more and more general information over time.
Of course, letting data degrade is the exact opposite of what most IT managers strive to do with customer data. After all, customer data is an asset: We use it more and more each day in an attempt to improve our relationship with customers, deliver better service, and understand patterns in their behavior. So high data quality is important. But for most uses beyond the transactional relationship with customers, we don’t need high-resolution data. Often, the data can be "anonymized" to a large degree for the purposes of larger analytical tasks, and there’s definitely a shelf-life attached to the value of data for any given transaction.
Van Heerde and a team of computer scientists from the Netherlands and France originally proposed the idea of data degradation to protect private information, in a paper presented at the 2008 Conference on Information and Knowledge Management. The idea in itself seems simple enough -- by gradually anonymizing data by removing personal identifying information, the data remains useful for things like market analytics and other business intelligence applications, but becomes useless to anyone who might be able to gain access to the data accidentally or through deliberate hacking.
A similar sort of time-bomb approach to data destruction was introduced in some mobile applications based on Java in the past decade. Mobile clients that use "data fading" keep track of how much time has elapsed since the last successful synchronization of the data with the source, and then start to destroy the data after a certain maximum "quiet period."
There are some significant barriers to data fading on a database server -- many of them pointed out by van Heerde and his colleagues in their original paper. For example, there’s the issue of data that’s been “destroyed” remaining in database backups. And while there have been plenty of exposures of personal data through cyber-attack, the most wide-ranging and severe exposures have often been because of the loss of backup tapes in shipment or because data has simply "walked out the door" on removable media.
Also, data degradation can’t be entirely automatic -- it would require some integration with data retention policy tools, particularly with data that might fall under data retention regulations (or might be the target of legal discovery). If you’re purging your transactional databases of older data on a regular basis and moving it to an offline backup, you’re likely already doing most of what data degradation would achieve from the standpoint of protecting customer data.
There’s also the question of whether there’s anything really gained in terms of personal data protection from cyber-attacks. While letting data degrade can protect information from older transactions if a site is compromised, it still leaves the most recent and potentially most valuable data vulnerable.
"Data degradation however, as any data retention model, cannot defeat trail disclosures performed by an adversary spying the database system from its creation," van Heerde and his colleagues wrote. So data degradation technology itself can’t prevent a breach of sensitive data -- it’s just an enhancement to standard access controls.
Most of what van Heerde’s proposed technology would do could be simply handled by good data management practices. Unfortunately, like common sense, good data practices are not common enough.
— Sean Gallagher is an award-winning IT journalist and the former head of InformationWeek Labs. Gallagher is now an independent journalist and technology consultant based in Baltimore. He can be reached at: firstname.lastname@example.org.