What's private on the Web?


Hillary Mason of bit.ly wrote a nice summary of some key issues raised in the recent Search in Social Media 2010 workshop. (For other commentary, see Daniel Tunkelang”s post and our pre-workshop comments.) Hillary asked several important questions, that break out into two main topics: what and how can we compute from social data on one hand, and what are the implications of those computations. Aspects such as computing relevance, how to architect social search engines, and how to represent users’ information needs in appropriate ways all represent the what and how category. We can be sure that adequate  engineering solutions will be found these problems.

The second topic, however, is more problematic because it deals more with the impact that technology has on the individual and on society, rather than about technology per se.

Hillary asks

What data is available to social search? There are many kinds of social data, from e-mail (private) to blogs (public) and tweets (mostly public) — what is and should be searchable? How do we handle issues of privacy and identity management?

How do we evaluate accuracy and truthiness of social data?

How do we characterize social connections, around concepts like strong vs weak ties, and friend-of-a-friend vs friend-of-a-friend’s-friend? Can we converge on a single social graph representation?

Finally, how do we deal with the chasm between the industry participants (who have LOTS of data) and the academic participants, who suffer from a lack of public (and publishable) data?

This is a fascinating list permeated by issues of privacy. Despite assertions that privacy is a thing of the past and we should get over it, the public reaction to Google Buzz’s fizzy debut argues against that position. In fact, privacy maybe a particularly thorny problem for searching and aggregating social media. People leave extensive traces of their online activity on social sites (and on search engines in general), and a range of Social Network Analysis algorithms originally developed by sociologists to analyze research populations can be brought to bear at web scale on the problems of federating partially-overlapping social networks.

The danger, of course, is to do it too well! We have seen cases where the release of public data has serious consequences for the vulnerable, including women and political dissidents.

While the tools we create are neutral, they enable both positive and negative activities. Twiangulate.com can be used to find prospective clients and like-minded individuals, or it can be used to piece together networks of political dissidents. Google Buzz can make it easy to keep track of your friends, or to engage in verbal abuse and sexual harassment.

Finally, the issue of what constitutes legitimate use of public social network data (such as the Pete Warden’s Facebook crawl) needs to be discussed and understood. Last week the CSCW2010 conference saw a lively debate (ironically captured through Twitter) on the role of IRBs for collecting public data for research. There are interesting points on both sides, and we as researchers need to work through the issues and understand what is appropriate and inappropriate use of this data, and the public at large needs to understand better the implications of making this data publicly available. At the moment, I don’t think we have a good grasp on either aspect.


  1. Twitter Comment

    Posted “What’s private on the Web?” [link to post] #ssm2010 #cscsw2010

    Posted using Chat Catcher

  2. I have one question for anyone who insists that privacy is dead: What’s your e-mail password?

    Thanks for this thoughtful perspective. You’re right — privacy is not an engineering problem. But there are things we can do as engineers to give people control and ownership of their own data within the systems that we design, and we should be conscious of the privacy implications of the engineering decisions that we make.

  3. People who say privacy is dead are just trying to be provocative. What is dead (or in its death throes) is privacy through difficulty.


  4. […] by Maribeth Back in her post on ChatRoulette or the issues with Google Buzz discussed as part of Gene Golovchinsky’s post “What’s private on the Web?” . Those examples made sensitive information available directly. In the case of the following three […]

Comments are closed.