Making sense of Twitter searchWednesday, February 10th, 2010 by Gene Golovchinsky
Last week Jeremy and I attended the SSM2010 workshop held in conjunction with WSDM2010. In addition to chairing one of the panels, I got an opportunity to demonstrate an interface that I built to browse Twitter search results, to which Daniel alluded in his summary of the workshop. The system is described in a position paper (co-authored with Miles Efron) that has been accepted to the Microblogging workshop held in conjunction with CHI 2010.
The idea behind this interface is that Twitter displays its search results only by date, thereby making it difficult to understand anything about the result set other than what the last few tweets were. But tweets are structurally rich, including such metadata as the identity of the tweeter, possible threaded conversation, mentioned documents, etc. The system we built is an attempt to explore the possibilities of how to bring HCIR techniques to this task.
Each tweet is classified as an “original” tweet or a re-tweet. Retweets and replies are grouped into conversations. Retweets are detected through a number of heuristics, including whether they share URLs, include patterns such as RT @xx or via @xx (and a few other variants), contain similar text, etc. Detecting retweets is non-trivial in some cases due to lack of a structural representation of retweets. The new retweet API may mitigate this problem, but its usage is not widely adopted because it does not allow the person doing the retweeting to comment on the original tweet.
The system organizes the results into people, tweets, and documents, each displayed in a separate tab. The people view is further split into people who tweeted and those who retweeted; a person may appear in each pane.
People in the view are currently sorted by the number of tweets they contributed to the results set; other sort orders such as the number of followers, recency of tweet, TunkRank, etc.
The tweet view (shown partially below) groups tweets by the size of the conversation.
Finally, the document view shows documents mentioned by the tweets in the search results. Documents can be sorted by the number of mentions, by the first mention, or by the last mention. Mentions are identified by comparing URLs; shortened URLs are expanded prior to comparison, and some attributes tacked on by twitter clients are stripped out to determine canonical URLs.
Ordering by the number of mentions allows the discovery of important documents through the tweeters’ consensus. Drilling into each document shows the tweets (and people) that mentioned the document. Clicking on the document name opens the document in an adjacent pane. This interface allows the user to explore the documents and tweets that comment on them in an integrated way.
More to come
The system we built just scratches the surface with respect to potentially-useful ways to browse Twitter search (and other) results. For small and medium-sized query sets, it makes sense to display all results and let the user browse them directly; for larger collections that may contain thousands of tweets, a hierarchical browsing interface may be more appropriate. It should be possible to group tweets topically base on content or geographically,when geocoded. People can be grouped based their location, based on the strength of relations as determined by social network analysis, or based on ad hoc categories created by the user. Results may also be filtered to remove people who only contributed a single tweet, etc. The exact criteria for grouping will obviously depend on the specific tasks that the interface is designed to support, but the above set of criteria seems reasonably general.
Another useful direction to explore is to use full-text search on the collection of tweets and on the document set referred to by them to help browse and filter the results. Using this technique, we could, for example, find people based on the contents of the documents they tweet about. In reasonably large collections, this may be a viable means of finding key people for a particular topic.
In the months to come, Miles and I will continue to explore the possibilities suggested by this interface, and will try to deploy it in a more public way. Some of the challenges to overcome include making the system sufficiently responsive. The current prototype is hampered by certain inefficiencies in the Twitter API.