The Macrosite for News, Analysis and Opinion about the Future of the Internet
Gordon Haff

Anonymized Doesn't Mean Anonymous

Written by Gordon Haff
3/16/2010 29 comments
DISCUSS   Digg   Del.icio.us   Reddit   Email This   TWEET THIS

Netflix Inc. got lots of publicity from its million-dollar Netflix prize contest, as researchers vied “to improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences.”

Algorithms were developed using 480,000 records of customers and their associated ratings of movies, along with the date of each rating and the title and year of release of the rated movie. No other personal data was included; that is, it was supposedly anonymized, with customer names and other identification replaced with randomly assigned IDs.

They key word here is “supposedly.” Researchers at the University of Texas at Austin wrote a paper in which they claim that it’s possible to sometimes de-anonymize data by correlating it with other datasets tied to a person’s real-world identity, such as a profile on the Internet Movie Database (IMDb).

In response to a lawsuit and an inquiry from the Federal Trade Commission over privacy concerns, Netflix has decided not to go ahead with a sequel Netflix Prize. This second contest was to have provided additional data such as renters’ ages, ZIP codes, and genders. Even without combining information from multiple sources, some types of data are fairly straightforward to tie to a person. When a team at AOL released search data in 2006, The New York Times showed how easy it was to track down one of the searchers.

What surprised me at the time wasn’t the simplicity of the unmasking, but that so many people apparently didn’t do searches that made it obvious who they were. Certainly in my case, I have no doubt that my queries would map to my “true name” with little ambiguity if only because I frequently check to see what comes up if someone searches for me online.

And, even in the absence of such personally identifiable information, just a birthdate, ZIP code, and gender are sufficient to identify something in the neighborhood of 63 to 87 percent of the US population.

But aggregating data sources, as in the Netflix Prize example, shows how much you can reveal even when you may not think you’re revealing anything at all. The pipl search engine gives some sense of the amount of personal information available for the mining; if you’re like me you may find the results a bit unnerving.

Clearly, common names make it harder to zoom in on a particular individual. Ongoing research may also provide ways to make it possible to release data sets for research purposes that are effectively anonymized. However, it’s fair to say that as our digital footprints grow, the potential to connect the dots among different parts of that footprint grows as well.

Does it matter? Much of the online commentary seemed to take issue with the researchers, the FTC, and lawyers more than it did with Netflix. I suspect that’s because the data was going to a geekily respectable purpose, improving movie recommendations, rather than, say, to an insurance company or employer looking for reasons to deny coverage or a job.

But it’s worth noting that a federal law, the Video Privacy Protection Act, limits the disclosure of video rental information, so concern about this sort of information becoming public is hardly a newfound and academic concern. When that law was enacted, its purpose was quite narrow -- to keep political opponents or others from using video rental history to embarrass someone. (It was passed partly in reaction the publication of Robert Bork’s video history during his 1988 Supreme Court nomination process.)

Yet, in today’s interconnected world, such information is not just information in its own right. It’s also a potential window into other aspects of someone’s online identity.

— Gordon Haff, Senior Analyst at Illuminata Inc. on grids/supercomputing

DISCUSS   Digg   Del.icio.us   Reddit   Email This
Current display:       newest comments first       display in chronological order
Page 1 of 3   Next >
Ira Winkler
Thinkernetter
Sunday March 21, 2010 6:44:07 PM
no ratings

If you're being serious, I would love a mainstream reference for that.

aum007
IQ Crew
Sunday March 21, 2010 10:16:30 AM
no ratings

Ira,

Astrology actually does impact Movie choices...

Ashish.

 

Ira Winkler
Thinkernetter
Saturday March 20, 2010 12:41:22 PM
no ratings

Stopping Netflix from putting together a contest to try to give people better service, becuase of concerns that are already out of the proverbial barn seems like a lose-lose situation.  We are not any better protected, and we dont get better movie recommendations.  Researchers lose the chance to make some money by winning a contest.

Why couldnt they just add rules like only giving birth year and 3 digit zip code (first three digits)?  Unless you believe astrology impacts movie choices, you dont need the date, just an age range.

JoeFoster
Rank: Web master
Friday March 19, 2010 5:33:13 PM
no ratings

Once something is on the Net it's no longer anonymous.

Much like when I went to High School. I graduated from a High School in a different country, along with barely 300+ in the 20 years the school was open. The school was an international school.

How hard would it be to find out my history there, at least as much history was available? Not very if someone wanted to search back to the mid 1960's.

At the Veteran's hospital I go tothere are three people with the same name as myself and one even has the same last four digits as my Social Secuity number. And this is a VA that is considered small. But there's a picture of me so that's a second identifier. I wonder if the NSA or some other governmental department has my information? Especially since I had a very high security clearance during my years in the military.

I think I'll try pipl and see what it says about me.

jj

JC Cameron
IQ Crew
Thursday March 18, 2010 5:16:02 PM
no ratings

Where will all of this lead?!? Right now, every second, someone is sharing something they shouldn't about themselves, their families, and their friends (never mind intel on their business, their government, and other truly sensitive info).

Technological convenience is far outstripping our ability to understand the ultimate impact our online interactions will have.  Back in the day (cough - 10 years ago), nearly everyone hid behind nicknames and pseudonyms.  Now, we are blogging and twittering, reconnecting with high school friends on Facebook and posting our resumes for anyone to read. 

All of this is data is immensely valuable information to all kinds of different groups (friends, co-workers, competitors, governments, crime syndicates). We are just now seeing the tip of the iceberg regarding the security (actually insecurity) of all of our tech advances...from thieves breaking into your home because you posted the fact that you were out of town to crime syndicates who have nearly unlimited information on how to steal your identity. 

So, yea, this social networking craze is absolutely amazing - but we really should be worried where it will all lead because most of it is scary.

JC Cameron, President
Revenution, Inc.

mtechie
IQ Crew
Thursday March 18, 2010 3:33:38 PM
no ratings

You had me at pipl search.  Not only is a lot available online, it can be neatly organized for easy reading.  I knew it was easy to find information online, I just wasn't aware it was that easy.  The pipl search site could be used as an example to younger web users who aren't yet aware so much about them is easily accessible.  Online reputation management services might have a big boom during the next several years as more people learn how easily they can be found online.

dbergman
IQ Crew
Wednesday March 17, 2010 10:04:04 PM
no ratings

If someone wants to find out some private info about you, they will. Any site that has information about can only protect you so much. Your friend wo takes a snapshot of your private facebooj profile and posts it. The person behind you at the library who snaps a photo with his cell phone of your banking page. The pressure is not on the govt' to protect us, nor is it solely on the shoulders of the sites...it is our own responsibility to be smart as well. If someone is willing to go to such great lengths as to reverse engineer TCP-IP packets and go through your trash and tape together your shredded documents...well...they are going to get it.

Kurtkeys
IQ Crew
Wednesday March 17, 2010 2:15:18 PM
no ratings
1 saves

My background has been a matter of public record since my first Security Clearance Background Investigation took place in 1976. Every friend, Every Group associaion, Every address and every police contact I have ever had is on the books. And then a second time when I applied for a Federal Firearms License. And I have given you more personal information right here than you will ever find on the internet about me.

Respectfully,

Kurt

SeanFromIT
IQ Crew
Wednesday March 17, 2010 1:46:05 PM
no ratings

They only make the info public in anonymized forms...the problem is that the info can be pieced back together too easily. There's a good satirical clip about government/private companies overstepping and using such info in America: From Freedom to Fascism by Aaron Russo. Funnily enough, it doesn't look like Netflix has his film :-)

Gordon Haff
Thinkernetter
Wednesday March 17, 2010 1:37:20 PM
no ratings

A lot depends on how common your name is. My name, for example, appears to be the only Gordon Haff with any Web presence. And even if there's some ambiguity it's usually not hard to narrow down the choices based on rough location. So what? Maybe nothing but having access to all that data at least raises the possibility of tying that known identity to other, supposedly, private/anonymous information.

Page 1 of 3   Next >
The ThinkerNet does not reflect the views of TechWeb. The ThinkerNet is an informal means of communication to members and visitors of the Internet Evolution site. Individual authors are chosen by Internet Evolution to blog. Neither Internet Evolution nor TechWeb assume responsibility for comments, claims, or opinions made by authors and ThinkerNet bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
previous posts from Gordon Haff
Gordon Haff
Gordon Haff   6/14/2010   9 comments
For more than the past year, we've seen a steady drumbeat of announcements in the technology space that analysts and developers have taken to calling “NoSQL.”
Gordon Haff
Gordon Haff   11/20/2009   5 comments
Arms merchant or army? That's a fundamental question for vendors in the cloud computing space. Do they just sell their tooling to any and all comers, who then become the actual purveyors of hosted infrastructure, developer platforms, and software? Or do they offer their own cloud-based services, perhaps even keeping much of their technology in-house for competitive advantage?
5
of
IETV: the thinkerNet on film
5
of
2pm EDT
Thu
Sep 2nd
2pm EDT
Thu
Sep 30th
an IBM information resource
sponsored content
big blue blog
Todd Watson
Todd Watson   7/29/2010   Post a comment
IBM announced today it has entered into a definitive agreement to acquire Storwize, a privately held company based in Marlborough, Mass.
white papers & case studies
an IBM information resource
sponsored content
Getting to Work on Smart Work: How IT Is Transforming the Implementation of the 'Internet of Things'
Organizations in all industry sectors are becoming more instrumented, interconnected, and intelligent -- and that's changing the way they approach virtually every facet of their operations. It's up to IT to help organizations adopt a "Three I's" approach that leverages the emerging Internet of Things and enables them to work smarter.

READ THIS eBOOK
your weekly update of news, analysis, and
opinion from Internet Evolution - FREE!

REGISTER HERE
Wanted! Site Moderators
Internet Evolution is looking for a handful of readers to help moderate the message boards on our site – as well as engaging in high-IQ conversation with the industry mavens on our thinkerNet blogosphere. The job comes with various perks, bags of kudos, and GIANT bragging rights. Interested?

Please email: moderators@internetevolution.com
Internet Evolution – not for thickies
Steve Saunders' Outernet
The Death of Anonymity: Part 3

Part 3 of 4   |  
See complete series
10|28|09   |   1:35   |   4 comments


What can users today do to protect their online privacy? The simplest and most obvious option is to not use the Internet – at all. However, once all digital information is consolidated over the Internet, trying to protect digital identity by simply unplugging from the Internet becomes impossible – a fact that has manifest implications for civil liberties, Saunders says.
Steve Saunders' Outernet
The Death of Anonymity: Part 2

Part 2 of 4   |  
See complete series
10|27|09   |   2:08   |   8 comments


By 2011 the number of Internet-connected sensors will exceed 1 trillion, making your chances of doing anything or going anywhere unnoticed pretty much zero. Saunders talks about how the 'sensortization' of the Internet is eliminating the traditional divide between online and offline populations.
Steve Saunders' Outernet
The Death of Anonymity: Part 1

Part 1 of 4   |  
See complete series
10|26|09   |   1:29   |   13 comments


The 20th Century Internet was characterized by the ability to interact with other people and information on the Internet largely without anyone knowing who you were. The Internet of this century, conversely, will be defined by identity. Saunders explains how Internet users are unwittingly contributing to the demise of the anonymous Internet.
Rob Salkowitz
The Use & Abuse of BI

2|1|10   |   2:19   |   4 comments


Data mining of social networks means people might face unforeseen consequences as a result of their seemingly innocuous personal choices and associations.
what.the.ferraro
More Pitiful Privacy from Facebook

12|16|09   |   02:08   |   2 comments


Facebook's new privacy controls just don’t cut it with little miss 'Air Quotes.'
Steve Saunders' Outernet
The Death of Anonymity: Part 4

Part 4 of 4   |  
See complete series
10|29|09   |   1:40   |   7 comments


In the final episode of this series about the death of Internet anonymity, Saunders describes how the Internet of the future will start to attain a level of intelligence that requires no human intervention. Scary.
Marissa Mayer
VP of Search Products & User Experience, Google

10|26|09   |   01:20   |   4 comments


Google's Marissa Mayer explains how its partnership with Twitter both makes Google search more comprehensive and extends its social-networking reach.
Steve Saunders' Outernet
Search Inversion & Profiling: Part 3

Part 3 of 3   |  
See complete series
10|21|09   |   1:40   |   No comments


Steve Saunders talks about the risks inherent in uncontrolled, widespread profiling of Internet users, and how one day this practice could form the basis of a new industry, the Outernet, which in economic terms will have outgrown the commercial value of the Internet itself.
Steve Saunders' Outernet
Search Inversion & Profiling: Part 2

Part 2 of 3   |  
See complete series
10|20|09   |   1:29   |   1 comment


Search companies and social networks are collecting incredibly detailed information about their users, says Steve Saunders, who predicts that these 'profiles' could one day become commodities to be bought and sold by companies on 'profile markets' or 'identity exchanges’ – the digital DNA equivalents of the financial and commodities exchanges on which stocks, oil, and gold are traded.
Steve Saunders' Outernet
Search Inversion & Profiling: Part 1

Part 1 of 3   |  
See complete series
10|19|09   |   1:52   |   6 comments


One of the most important Internet issues of all time is being ignored by the media. In this three-part video series Steve Saunders explains how search companies are turning the tables on their users by creating user profiles for financial gain, and how soon this trend will explode into full scale profiling.
Sweeney Blog
Tweets Show West Is Best

7|30|10   |   2:47   |   No comments


Hey, Eastern Timezoners: Lighten up! Or at least Tweet happier thoughts.
Reiter's Block
Inside RIM’s Tablet Survey

7|29|10   |   2:50   |   2 comments


Research in Motion recently emailed a survey about smartphone use and tablet computer preferences. Could it be a prelude to a RIM tablet? Of course!
Second Shooter
Let’s Make Up Our Minds on Copyright

7|29|10   |   2:07   |   2 comments


There's a public-policy war on copyright that nobody is winning, and inconsistencies in viewpoint and interpretation seem to be multiplying. We need to step back and think our policies over again, or we risk having a strategy that fails everyone.
The Sole Man
Cloud-Based Video Sharing: Not Promising

7|28|10   |   2:49   |   1 comment


Ultraviolet is an industry-wide attempt to standardize video content delivery across multiple platforms. Apart from the fact that it’s based in the cloud, relies on the DRM system, and isn’t backed by Apple… it sounds great!
Wisdom of the Big Chair
Using the Web to Clean the Gulf

7|28|10   |   2:12   |   3 comments


The Internet played a key role in disseminating information and helping with the Gulf cleanup. Bravo, Internet!
Second Shooter
The Third Way or the Highway

7|27|10   |   2:09   |   4 comments


The FCC's Sixth Broadband Report has a hidden secret. But here’s a hint: The regulatory body plans to regulate broadband as a telecommunications service.
Singer at C-Level
I Predict You Will Watch This Video

7|27|10   |   1:59   |   No comments


Wouldn’t it be great to be able to predict what your customers want before they know they want it? Check our our latest tutorial about Predictive Analytics to find out how: www.internetevolution.com/tutorial-predictive-analytics.asp
The Sole Man
Shiver Me Timbers

7|26|10   |   2:21   |   No comments


Digital pirates find easy pickings in the open waters of the Internet. Aaarrrrrr!
Cirque Du Solez
Spontaneity Gives New Meaning to 'On the Road'

7|26|10   |   1:46   |   6 comments


Once defined by epic journeys, planning, and maps, the phrase "on the road" takes on new meaning in a digital age, where we can make all our decisions using our connected devices en route.
what.the.ferraro
Facebook the Movie... Awful

7|23|10   |   2:39   |   6 comments


Nothing quite says jumping the gun like making a movie about a six-year-old company.

Enabling People and Organizations to Harness the Transformative Power of Technology