"Do Not Track" is controversial. Internet Evolution executive editor Nicole Ferraro learned that firsthand last August in her quest to ferret out whether allowing marketers to track our Internet travels is a good idea and, if so, for whom.
Unlike Nicole, I cower at controversy, preferring to focus on "what ifs."
We are assured by those tracking us that the data they gather is anonymous. What if it’s not? Would that change your mind about the benefits of targeted advertising?
I can read your minds. “No, not another piece on why we shouldn’t trust advertisers.” Nope. For argument’s sake, let’s say those capturing our personal information are completely trustworthy.
So, where am I going with this? While preparing this post I found several sources announcing that "anonymization" of our personally identifiable information (PII) doesn’t work.
A paper titled "Broken Promises of Privacy," written by Professor Paul Ohm at the University of Colorado Law School, demonstrates how supposedly anonymized databases aren’t -- not even close. And individual identities can be "de-anonymized" with relative ease. After a phone conversation with Dr. Ohm, I let something he said to me to sink in:
“Data can either be useful or perfectly anonymous, but never both.”
Professor Ohm backs up his claim by citing the research of Dr. Latanya Sweeney. In her paper, "Computational Disclosure Control," Dr. Sweeney studied anonymized data about state employee hospital visits released by the Group Insurance Commission (GIC) of Massachusetts. Dr. Ohm explains:
"By removing fields containing name, address, social security number, and other 'explicit identifiers,' GIC assumed it protected patient privacy, despite the fact that 'nearly one hundred attributes' per patient and hospital visit were still included; including ZIP code, birth date, and sex."
Using ZIP code, birth date, and sex designations, Dr. Sweeney was able to isolate the health records of employees, including a former governor of Massachusetts.
Another example involves Dr. Arvind Narayanan and advisor Dr. Vitaly Shmatikov, co-authors of the above-linked paper on PII. They discovered that one-third of Twitter users also own a Flickr account. Cross-referencing anonymized Twitter social graphs with Flickr connection information allowed the researchers to identify Twitter accounts.
What immediately intrigued me was Dr. Narayanan’s comment:
"The level of anonymity that society expects -- and companies claim to provide -- in published databases is fundamentally unrealizable."
That was back in 2009. I recently chatted with Dr. Narayanan about some new concerns of mine. “Remember what you told me two years ago. Is it still relevant?”
Dr. Narayanan responded:
"I’d perhaps change 'published' to 'published or outsourced.' I would add that companies need to be more honest with consumers about what guarantees they can and cannot provide, and sooner or later they need to move away from data anonymization as a silver bullet."
Moving away from data anonymization is not going to happen anytime soon. I checked out the privacy policies of a few well known Websites. The policy at Facebook is a typical example:
"The only information we provide to advertisers is aggregate and anonymous data, so they can know how many people viewed their ad and general categories of information about them."
The numerous mentions of "anonymous" had me asking: Is there some sort of standard that must be followed to anonymize data? I was able to find only one standard.
Before any data can be considered anonymous, the Health Insurance Portability and Accountability Act (HIPAA) requires the removal of any reference to "18 Identifiers" (Box 2, near the bottom). Fortunately, ZIP code and age are two of the 18 identifiers, so Dr. Sweeney’s approach will not work. As for other online entities such as Facebook, it appears the definition of "anonymous" is up to the individual organization.
Hopefully, I've avoided the online-tracking debate and shed light on a real concern: that anonymous data as currently defined really isn’t anonymous. Stay tuned. Help is on the way, courtesy of Dr. Narayanan and his team.
— Michael Kassner is a writer and consultant specializing in information security.