Making sense of Twitter search

on

Last week Jeremy and I attended the SSM2010 workshop held in conjunction with WSDM2010. In addition to chairing one of the panels, I got an opportunity to demonstrate an interface that I built to browse Twitter search results, to which Daniel alluded in his summary of the workshop. The system is described in a position paper (co-authored with Miles Efron) that has been accepted to the Microblogging workshop held in conjunction with CHI 2010.

The idea behind this interface is that Twitter displays its search results only by date, thereby making it difficult to understand anything about the result set other than what the last few tweets were. But tweets are structurally rich, including such metadata as the identity of the tweeter, possible threaded conversation, mentioned documents, etc. The system we built is an attempt to explore the possibilities of how to bring HCIR techniques to this task.

Each tweet is classified as an “original” tweet or a re-tweet. Retweets and replies are grouped into conversations. Retweets are detected through a number of heuristics, including whether they share URLs, include patterns such as RT @xx or via @xx (and a few other variants), contain similar text, etc. Detecting retweets is non-trivial in some cases due to lack of a structural representation of retweets. The new retweet API may mitigate this problem, but its usage is not widely adopted  because it does not allow the person doing the retweeting to comment on the original tweet.

The system organizes the results into people, tweets, and documents, each displayed in a separate tab.  The people view is further split into people who tweeted and those who retweeted; a person may appear in each pane.

Tweet Analysis UI showing people view

People view with tweeters (left) and retweeters (rights) and tweets for @jeremyhylton (tweets) and @dtunkelang (retweets)

People in the view are currently sorted by the number of tweets they contributed to the results set; other sort orders such as the number of followers, recency of tweet, TunkRank, etc.

The tweet view (shown partially below) groups tweets by the size of the conversation.

Tweets grouped by conversation

Partial shot of the tweet view showing tweet conversations

Finally, the document view shows documents mentioned by the tweets in the search results. Documents can be sorted by the number of mentions, by the first mention, or by the last mention. Mentions are identified by comparing URLs; shortened URLs are expanded prior to comparison, and some attributes tacked on by twitter clients are stripped out to determine canonical URLs.

Ordering by the number of mentions allows the discovery of important documents through the tweeters’ consensus. Drilling into each document shows the tweets (and people) that mentioned the document. Clicking on the document name opens the document in an adjacent pane. This interface allows the user to explore the documents and tweets that comment on them in an integrated way.

Document view showing popular tweets

Partial shot of the document view showing popular tweets and a document fragment. The timeline is not yet fully debugged!

More to come

The system we built just scratches the surface with respect to potentially-useful ways to browse Twitter search (and other) results. For small and medium-sized query sets, it makes sense to display all results and let the user browse them directly; for larger collections that may contain thousands of tweets, a hierarchical browsing interface may be more appropriate. It should be possible to group tweets topically base on content or geographically,when geocoded. People can be grouped based their location, based on the strength of relations as determined by social network analysis, or based on ad hoc categories created by the user.  Results may also be filtered to remove people who only contributed a single tweet, etc. The exact criteria for grouping will obviously depend on the specific tasks that the interface is designed to support, but the above set of criteria seems reasonably general.

Another useful direction to explore is to use full-text search on the collection of tweets and on the document set referred to by them to help browse and filter the results. Using this technique, we could, for example, find people based on the contents of the documents they tweet about. In reasonably large collections, this may be a viable means of finding key people for a particular topic.

In the months to come, Miles and I will continue to explore the possibilities suggested by this interface, and will try to deploy it in a more public way. Some of the challenges to overcome include making the system sufficiently responsive. The current prototype is hampered by certain inefficiencies in the Twitter API.

16 Comments

  1. Twitter Comment


    RT @HCIR_GeneG: Posted “Making sense of Twitter search” [link to post] #chi2010

    Posted using Chat Catcher

  2. Twitter Comment


    Oh shit! There goes the future—a glimpse! [link to post] Twitter search, navigation. Structure! Delight! #ssm2010

    Posted using Chat Catcher

  3. Twitter Comment


    Agree that looks a lot like one of my next projects :-) RT @jny2: the future [link to post] Twitter search, navigation. Structure!

    Posted using Chat Catcher

  4. Twitter Comment


    Ha! Yours and everyone’s! RT @xamat Agree that looks a lot like one of my next projects :-) RT @jny2: twitter future [link to post] …

    Posted using Chat Catcher

  5. Twitter Comment


    Posted “Making sense of Twitter search” [link to post] #chi2010

    Posted using Chat Catcher

  6. Twitter Comment


    @sophiabliu Actually, I prototyped something like that (near real-time, but can add realtime) [link to post] #cscw2010

    Posted using Chat Catcher

  7. An extra challenge for those of us working with twitter data… Dealing with topic (and other aspects of meaning) is interesting/hard in this space. As you point out, most tweets are useful insofar as they form part of some group of tweets that relate to a given topic, event, or whatever. So in many cases, a tweet’s topic is latent–not directly observable from lexical features.

    If we do want to enlist topical aboutness in our analysis of microblog data, I think we’ll need to do a bit of re-tooling of our standard arsenal. To take one example: measures such as TF-IDF have shown themselves to be unreliable in some initial work I’ve done… obviously TF doesn’t have much meaning here. And the intuition behind IDF doesn’t hold very well in microblogs (e.g. relatively common hashtags are often quite indicative of topic, while rare hashtags aren’t of much interest).

  8. My guess is that temporal clustering around parts of the social graph and around hashtags might create statistically-meaningful clumps. Also would be useful to see how document content can be pushed back through the link to inform tweet grouping.

    One challenge in all of this, however, is to retain transparency and predictability. If people don’t understand why some set of tweets is grouped, then it’s not useful to group them that way.

  9. Twitter Comment


    FXPAL: Making sense of twitter search – interesting [link to post] cc: @ristoh

    Posted using Chat Catcher

  10. Twitter Comment


    RT @genebecker: FXPAL: Making sense of twitter search – interesting [link to post] cc: @ristoh

    Posted using Chat Catcher

  11. […] For more on this and a cool demo, check out Gene Golovchinsky’s look at the SSM2010 twitter coverage. […]

  12. Shyam Kapur says:

    It looks like you guys are having a lot of fun with tweets. I am not at all surprised. I have also been working on designing some good interfaces on top of some sophisticated analytics technology for data such as tweets. I think some of you have already seen one manifestation of my work, TipTop at http://FeelTipTop.com . Visit again often to see how it is evolving. Those who have not seen it so far might want to visit TipTop soon.

  13. […] was also an interesting comment that relates to my interest in managing Twitter search results. Ehlrich and Shami write that Although BlueTwit contained mostly work related posts, possibly […]

  14. […] instead, a more holistic approach is more appropriate.  I described in one such approach in Making Sense of Twitter Search (the position paper was co-authored with Miles Efron and was presented at a CHI 2010 workshop on […]

  15. Twitter Comment


    Twitter: FXPAL Blog » Blog Archive » Making sense of Twitter search: [link to post]

    Posted using Chat Catcher

Comments are closed.