The Macrosite for News, Analysis and Opinion about the Future of the Internet
Nicole Ferraro

Facebook Gets Harvested

Written by Nicole Ferraro
2/16/2010 23 comments
DISCUSS   Digg   Del.icio.us   Reddit   Email This   TWEET THIS

Claiming 400 million users, Facebook has become a rich hub of global data. Such information proves useful to advertisers, who get to target individuals based on their alleged interests. Now one engineer is looking to make data available to the academic community, but his methods are questionable.

Pete Warden, a former Apple engineer, is trying to make data on 215 million Facebook users publicly available to the academic research community, which he hopes will use it to better understand the world. He's spent the past six months analyzing such data, thanks to a Facebook flaw, which enabled him to harvest public profiles without being logged in. Avoiding the log-in allows Warden to subvert Facebook's terms of service, which, he says on his blog, is important because the terms "prohibit you from these sorts of shenanigans."

If that sounds sketchy, here's more: While Warden says he's removed identifying profile URLs from his data, he's kept locations, Fan page lists, partial Friends lists, and names.

"I find it extremely problematic that Warden plans to release the data with only the most minimal attempt to anonymize it," Michael Zimmer, a professor in the School of Information Studies at the University of Wisconsin-Milwaukee, told us via email. "This is extremely sensitive information to be releasing, especially given how other researchers have routinely shown how easily seemingly anonymous datasets can be re-identified simply by comparing social graphs."

Warden's work seems ethically challenged for other reasons as well. For starters, thanks to Facebook's new "privacy" controls, users with public profiles have no choice but to make Fan Page lists available.

Zimmer also expresses his concern that, while users' profiles may be public, it's unlikely that they expect to have their data mined in this way.

"This is a version of the 'privacy via obscurity' concept -- the data is public, but it is unlikely that most people will ever access it," he writes. "Simply, by making some profile information public, users might expect some people to see it, but their expectations quite likely do not include the possibility that a researcher will randomly look them up, scrape their data, and the data of all their friends, and systematically do that for millions of accounts."

With all that we're learning about the way our data is used online, it's getting harder to make the case for the unknowing user. Users should be aware that, when posting content to the Internet, anything is possible. But Zimmer is correct in asserting that most users are not... and Warden's decision to include people's names in this research seems both unnecessary and dangerous.

Furthermore, is it even worth it? Sure, Facebook is full of information about hundreds of millions of people, but how accurate is it when the site is also filled with fake and duplicate profiles? How accurate is it when people are becoming "Fans" of pages for reasons other than being an actual fan? Considering the fact that Facebook users aren't actively participating in a data mining experiment, there's no onus on them to provide accurate information. This diminishes the potential for real research here.

Warden's initial findings based on his data are pretty uninspiring as well. He's found that God is the No. 1 Fan Page in almost every southern U.S. state, whereas in San Francisco it's Barack Obama.

Wow. Really? This is what you've bypassed Facebook's TOS for?

While it may seem silly to think about researchers diving into mindless Facebook data, it should be a concern, particularly as we learn that corporations, the government, and banks are among the many institutions interested in finding out who our Friends are.

Warden was supposed to release the data to the research community last week, but Facebook has asked him to hold off while it assesses the privacy implications. As of this blog's publication, a Facebook spokesman hadn't responded to request for comment.

— Nicole Ferraro, Site Editor, Internet Evolution

DISCUSS   Digg   Del.icio.us   Reddit   Email This
Current display:       newest comments first       display in chronological order
Page 1 of 3   Next >
Paul Whyte
Researcher
Tuesday March 9, 2010 1:33:04 AM
no ratings

With the hacking history of Facebook founder, I'm now more than skeptical that your information on Facebook is far from being private:

 

At Last -- The Full Story Of How Facebook Was Founded

DavidSilversmith
Thinkernetter
Wednesday February 24, 2010 10:52:04 PM
no ratings

This post talks about how this person was able to "harvest public profiles without being logged in." This raises an interesting legal issue

If data is publicly posted on a web server for us on the Internet - when do the terms and conditions and potentially laws of the country where the server is located come into play.

If you, as this person did, never load a page and thus never see the terms and conditions - are you bound by them?

If you visit a web page and your ISP mistankingly blocks the terms and conditions box as spam - are you still subject to a privacy policy you hever saw?

These issue remain, to the best of my knowledge unanswered.

We once found a team of developers who, on their resumes, listed that they ran or worked on a project that was mining data from my company's web site.  Clearly they did not see anything wrong with this!



mamaflynny
IQ Crew
Saturday February 20, 2010 10:30:25 AM
no ratings

I agree with you Ariella. I subject everything I post on FB to the New York Times test knowing full well someone may be able to access my data without my permission.  

A while back I posted about analyzing your FB friends with SAS for those with little else to do with their time. http://blogs.sas.com/sasdummy/index.php?/archives/97-Running-SAS-PROCs-on-your-Facebook-Friends.html

 

homesteadtraders
IQ Crew
Wednesday February 17, 2010 5:52:30 PM
no ratings

I stated in another article on Facebook, that I would not have anything but a commercial account, which is exactly what I am developing now.

Now, what I have also found that I don't like, is that when friends and relatives want me to look at/join their pages, I'm "told" (when I try to use the links) that I need to have a "personal page" in order to even look at any of them. Seriously? I'm invited to join a page, and because I refuse to have a "personal page", (but I am a member with 2 business pages in the works) I can't even look at them? Really?

I had been thinking about putting a bare bones personal page up, but now that this has happened and I'm being forced to open a personal page or I'm denied entry to those pages I've been invited to look at, I am now certain that I will not have any type of personal page.

You know, it is one thing if you don't have an account at all and want to look at pages. I guess I can understand being denied access then. However, when you have an account, and someone has invited you to look at their page, and you're still denied because of no "personal page", that is just not right.

kenton
IQ Crew
Wednesday February 17, 2010 4:06:22 PM
no ratings

It is important to make the distinction between deletion and deactivation. If you delete your account all content is deleted from the Facebook servers. If you deactivate, your information is stored and you can return. The Canadian Privacy Comissioner had Facebook clarify the distinction in order for people to better understand what would happen to their data in either instance.

 

Oh, and as for the post; he's no different from somone who goes and takes pictures through your window of all your stuff and then posts it online. Is it illegal? If he has a good enough lawyer he'll get off. Is it immoral? In most social circles I would hope so. As someone else has already commented, getting academics to use this kind of information in their studies is not very likely. They have standards to uphold and this guy didn't meet any of them.

Fiercesome
Rank: Scrivener
Wednesday February 17, 2010 3:12:18 PM
no ratings

I agree with Kurt, this guys needs to be tried.  He didn't go down to the library of congress and check out some books or copy down a phone book, he circumvented FBs TOS.  That's like saying a thief isn't guilty for rifling through your stuff because he didn't pick the front door lock, he climbed in through an open window.

Thank god my FB profile is full of lies but I'm sure some of my group memberships reveal more about me personally than I'd care to share with strangers looking to sell to me or farm me or what have you.

There is a management quiz I took once when applying for a job and after it was over, I was told I got a question correct that 90% of applicants get wrong.  The question was something to the effect of:  "The manager goes in the safe to get change and leaves it open as he walks up front to give it to the cashier.  While he is gone, a thief steals the rest of the money out of the safe.  Who is at fault?"  A) Manager  B) Cashier  C) Thief or D) All of the above.  The correct answer is C, but I almost got it wrong by picking A (as most management or management potential did).

I'm not going to debate the legal ramifications of what Warden did, I'm sure other forums will.  But just because you can do something, doesn't mean you should.  Jeffery Dahmer liked to taste test different people to see what the differences were.  But that doesn't make it right.  That's a gross (literally) example, but c'mon...  we already have to worry about hackers doing it, now this?

 

Terri Eberle
IQ Crew
Wednesday February 17, 2010 1:31:38 PM
no ratings

One would think the academic research community would have better things to do with their time - cure cancer? suggest ways to improve the economy?

chayes
IQ Crew
Wednesday February 17, 2010 11:29:09 AM
no ratings

Agreed that disabling my page won't delete the content.  This just points out one more issue with Facebook and social media sites in general.  We are aware that breaches like this can occur, and I don't think that there is anyone on Internet Evolution who is surprised by the latest information.  That being said, there are many Facebook users who would be surprised by this.  They don't stop to think that what they post is public, and that their information can be harvested by someone.  I would like to think that this flaw was made public in the hope that it would be fixed, and not  exploited for personal gain.  But the cynic in me says otherwise.

Ariella
IQ Crew
Wednesday February 17, 2010 9:39:41 AM
no ratings

 

I regard anything I put up on Facebook as 100% public, the same way I regard my own site.  I don't rely on privacy settings to keep info secure.  If I want to speak only to particular people, I'll send them a direct email.  However, some Facebook users seem to feel as secure as they would among a group of close friends.  They give out information that could be harvested for identity theft, such as birthdates, anniversary dates, and maiden names.  One of my FB connections just reported that someone hacked into her bank account and removed hundreds of dollars.  She was shocked that it could happen.  But I wonder if the thief mined some of the information available about her online to get into her account.

 

tnieusma
IQ Crew
Wednesday February 17, 2010 9:27:17 AM
no ratings

Warden is quoted as saying the "Hopefully I'll get to see a bunch of interesting [academic research] papers come out of it, worst case. And I'd like to be the guy people turn to when they need stuff like this."

The challenge here is that academic research has standards. "Some guy, though clever, who hijacked Facebook information" in not an appropriate addition to any reference list. Though some of what he is finding out may be interesting on the surface, there is a huge opportunity for error.

For example, I have many friends who have "dummy" FB pages just to play games on or pages for every pet they own etc. With many people having multiple pages, much of the data is bound to be skewed. The general trends identified may be accurate, but we know much of that data already. I mean, Mormons in Utah - what a shock??? (not)

If there were accuracy to the data, it would be far more interesting and useful. Companies do this type of information crunching all of the time based on Credit Card purchases, frequent shopper cards etc.. With those, there is some accuracy, in the social virtual world, it is just not the same and opens doors for flimsy research.

 

Page 1 of 3   Next >
The ThinkerNet does not reflect the views of TechWeb. The ThinkerNet is an informal means of communication to members and visitors of the Internet Evolution site. Individual authors are chosen by Internet Evolution to blog. Neither Internet Evolution nor TechWeb assume responsibility for comments, claims, or opinions made by authors and ThinkerNet bloggers. They are no substitute for your own research and should not be relied upon for trading or any other purpose.
previous posts from Editor's Blog
Nicole Ferraro
We're live on IE Radio right now with Dale Fuller, CEO of MokaFive. Come along and join us.
Nicole Ferraro
If you've been exhibiting signs of IE Radio withdrawal, today is your lucky day: IE Radio picks up yet again, this time with Dale Fuller, CEO of MokaFive. Fuller joins us at 2:00 p.m. ET.
Nicole Ferraro
Nicole Ferraro   8/31/2010   17 comments
There's a new trend afoot... It's called social networking. People use their computers and phones to connect with each other and make friends.
Nicole Ferraro
Nicole Ferraro   8/30/2010   24 comments
The old adage "Don't talk to strangers" seems to be completely lost on the majority of Web users. Sigh. Mom is not happy.
Nicole Ferraro
Nicole Ferraro   8/27/2010   46 comments
Earlier this week, the Los Angeles Times posted a story entitled "Blogger, beware: Postings can lead to lawsuits," discussing the growth of lawsuits in the era where everyone is a publisher. The article points to some recent cases where bloggers wrote some racy things -- like the blogger who said that three Chicago judges "deserve to be killed" -- and were taken to court.
5
of
IETV: the thinkerNet on film
5
of
2pm EDT
Thu
Sep 30th
an IBM information resource
sponsored content
big blue blog
an IBM information resource
sponsored content
Getting to Work on Smart Work: How IT Is Transforming the Implementation of the 'Internet of Things'
Organizations in all industry sectors are becoming more instrumented, interconnected, and intelligent -- and that's changing the way they approach virtually every facet of their operations. It's up to IT to help organizations adopt a "Three I's" approach that leverages the emerging Internet of Things and enables them to work smarter.

READ THIS eBOOK
your weekly update of news, analysis, and
opinion from Internet Evolution - FREE!

REGISTER HERE
Wanted! Site Moderators
Internet Evolution is looking for a handful of readers to help moderate the message boards on our site – as well as engaging in high-IQ conversation with the industry mavens on our thinkerNet blogosphere. The job comes with various perks, bags of kudos, and GIANT bragging rights. Interested?

Please email: moderators@internetevolution.com
Internet Evolution – not for thickies
Steve Saunders' Outernet
The Death of Anonymity: Part 3

Part 3 of 4   |  
See complete series
10|28|09   |   1:35   |   4 comments


What can users today do to protect their online privacy? The simplest and most obvious option is to not use the Internet – at all. However, once all digital information is consolidated over the Internet, trying to protect digital identity by simply unplugging from the Internet becomes impossible – a fact that has manifest implications for civil liberties, Saunders says.
Steve Saunders' Outernet
The Death of Anonymity: Part 2

Part 2 of 4   |  
See complete series
10|27|09   |   2:08   |   8 comments


By 2011 the number of Internet-connected sensors will exceed 1 trillion, making your chances of doing anything or going anywhere unnoticed pretty much zero. Saunders talks about how the 'sensortization' of the Internet is eliminating the traditional divide between online and offline populations.
Steve Saunders' Outernet
The Death of Anonymity: Part 1

Part 1 of 4   |  
See complete series
10|26|09   |   1:29   |   13 comments


The 20th Century Internet was characterized by the ability to interact with other people and information on the Internet largely without anyone knowing who you were. The Internet of this century, conversely, will be defined by identity. Saunders explains how Internet users are unwittingly contributing to the demise of the anonymous Internet.
The Incredible Hultquist
Social Networks & Hiring Pitfalls

10|16|09   |   2:16   |   5 comments


More companies are trolling social networks to find and vet potential job candidates. Beware the pitfalls of blurring the line between personal and professional lives.
Steve Saunders' Outernet
Welcome to 2029

10|6|09   |   2:01   |   4 comments


It is 20 years since the invention of the World Wide Web, and the Internet has changed beyond recognition since then. Steve Saunders peers into the future to predict what the Web will look like in another 20 years time – and he doesn’t like what he sees.
Second Shooter
McAfee Offers the Ultimate Virus Defense

4|29|10   |   2:13   |   17 comments


McAfee has figured out how to prevent virus problems: Stop your system from running altogether. We could take this logic into taxes, email, and more, or we can start to demand vendors do online updates with a bit more care. The credibility of the whole online service concept is at stake.
Rob Salkowitz
The Use & Abuse of BI

2|1|10   |   2:19   |   4 comments


Data mining of social networks means people might face unforeseen consequences as a result of their seemingly innocuous personal choices and associations.
Steve Saunders' Outernet
The Death of Anonymity: Part 4

Part 4 of 4   |  
See complete series
10|29|09   |   1:40   |   7 comments


In the final episode of this series about the death of Internet anonymity, Saunders describes how the Internet of the future will start to attain a level of intelligence that requires no human intervention. Scary.
Steve Saunders' Outernet
Search Inversion & Profiling: Part 3

Part 3 of 3   |  
See complete series
10|21|09   |   1:40   |   No comments


Steve Saunders talks about the risks inherent in uncontrolled, widespread profiling of Internet users, and how one day this practice could form the basis of a new industry, the Outernet, which in economic terms will have outgrown the commercial value of the Internet itself.
Steve Saunders' Outernet
Search Inversion & Profiling: Part 2

Part 2 of 3   |  
See complete series
10|20|09   |   1:29   |   1 comment


Search companies and social networks are collecting incredibly detailed information about their users, says Steve Saunders, who predicts that these 'profiles' could one day become commodities to be bought and sold by companies on 'profile markets' or 'identity exchanges’ – the digital DNA equivalents of the financial and commodities exchanges on which stocks, oil, and gold are traded.
Wisdom of the Big Chair
More Texting, Less Bandwidth

9|2|10   |   1:56   |   No comments


Nielsen’s recent numbers on the increasing use of texting bode well for enterprise networks. Shunning the phone in favor of text messaging could mean reducing bandwidth.
Reiter's Block
RIM Caving on Security

9|2|10   |   2:31   |   2 comments


RIM is giving in to demands by India to snoop on encrypted BlackBerry data. It's time to develop cheap or free encryption software for BlackBerrys and other cellular phones.
Second Shooter
Taking Copyright Protection Too Far

9|1|10   |   2:08   |   No comments


Two studios have filed suit against an ad broker for placing ads to help monetize P2P sites suspected of copyright infringement. That's taking a dangerous step toward what might be a worthy goal.
Singer at C-Level
Video in the Cloud

9|1|10   |   2:16   |   2 comments


Software giants are looking for cloud solutions to support our insatiable appetite for video. There will be blood. Yum.
Mary E. Shacklett
Wish List for Mobile Devices, Part 1

Part 1 of 2   |  
See complete series
8|31|10   |   1:41   |   2 comments


By 2014, mobile devices will overtake laptops as the appliance of choice for consumers. But device makers still have some wishes to fulfill, including mobile app simplification and the ability to better perform word processing/spreadsheet functions.
Second Shooter
Google Shifts From Free Content

8|31|10   |   2:14   |   6 comments


Google's foray into pay-for-view movies may be an indicator that the days of free ad-sponsored content are numbered, or at least that ad sponsorship won't fund nearly enough content.
Sweeney Blog
A Sharp Website

8|30|10   |   2:27   |   6 comments


Pencil sharpening gets the digital and artisanal touch, just in time for test-takers everywhere.
Mary E. Shacklett
Online Education Gets a Boost

8|30|10   |   2:02   |   8 comments


Online education, improving to better replicate the interactions that occur between teachers and students face-to-face, grew in double digits during the recession. Still, there’s more work to be done.
Reiter's Block
Educating Bill Gates About Education

8|27|10   |   2:34   |   8 comments


Bill Gates says where you study is becoming much less important, and the best college lectures will soon be found online. Reiter disagrees.
Second Shooter
Gmail & VoIP: Death to PSTN?

8|27|10   |   2:09   |   18 comments


Google's decision to link VoIP calling of PSTN numbers with Gmail, and to let Google Voice "call" Gmail VoIP clients, will devalue the PSTN and force telcos to fund unprofitable services or create their own VoIP transitions.

Enabling People and Organizations to Harness the Transformative Power of Technology