Hillary Mason of bit.ly wrote a nice summary of some key issues raised in the recent Search in Social Media 2010 workshop. (For other commentary, see Daniel Tunkelang”s post and our pre-workshop comments.) Hillary asked several important questions, that break out into two main topics: what and how can we compute from social data on one hand, and what are the implications of those computations. Aspects such as computing relevance, how to architect social search engines, and how to represent users’ information needs in appropriate ways all represent the what and how category. We can be sure that adequate engineering solutions will be found these problems.
The second topic, however, is more problematic because it deals more with the impact that technology has on the individual and on society, rather than about technology per se.
What data is available to social search? There are many kinds of social data, from e-mail (private) to blogs (public) and tweets (mostly public) — what is and should be searchable? How do we handle issues of privacy and identity management?
How do we evaluate accuracy and truthiness of social data?
How do we characterize social connections, around concepts like strong vs weak ties, and friend-of-a-friend vs friend-of-a-friend’s-friend? Can we converge on a single social graph representation?
Finally, how do we deal with the chasm between the industry participants (who have LOTS of data) and the academic participants, who suffer from a lack of public (and publishable) data?
This is a fascinating list permeated by issues of privacy. Despite assertions that privacy is a thing of the past and we should get over it, the public reaction to Google Buzz’s fizzy debut argues against that position. In fact, privacy maybe a particularly thorny problem for searching and aggregating social media. People leave extensive traces of their online activity on social sites (and on search engines in general), and a range of Social Network Analysis algorithms originally developed by sociologists to analyze research populations can be brought to bear at web scale on the problems of federating partially-overlapping social networks.
While the tools we create are neutral, they enable both positive and negative activities. Twiangulate.com can be used to find prospective clients and like-minded individuals, or it can be used to piece together networks of political dissidents. Google Buzz can make it easy to keep track of your friends, or to engage in verbal abuse and sexual harassment.
Finally, the issue of what constitutes legitimate use of public social network data (such as the Pete Warden’s Facebook crawl) needs to be discussed and understood. Last week the CSCW2010 conference saw a lively debate (ironically captured through Twitter) on the role of IRBs for collecting public data for research. There are interesting points on both sides, and we as researchers need to work through the issues and understand what is appropriate and inappropriate use of this data, and the public at large needs to understand better the implications of making this data publicly available. At the moment, I don’t think we have a good grasp on either aspect.