Aggregating Twitter


There are lots of ways to display search results, and the familiar (if impoverished) ranked list of links with snippets is just one possibility. It doesn’t work particularly well for Twitter, for example because for many kinds of searches it’s hard to make sense of the tweets individually; instead, a more holistic approach is more appropriate.  I described in one such approach in Making Sense of Twitter Search (the position paper was co-authored with Miles Efron and was presented at a CHI 2010 workshop on microblogging) . is another approach to visualizing sets of Tweets. For a given topic or user, it identifies documents referred to by your followers and builds a two-column online newspaper-style layout out of those documents. It classifies documents by broad categories (media, education, technology, etc.) and prominent hashtags (e.g., #facebook), show the leading paragraphs or two of the document, and the person who tweeted it. Media such as YouTube videos are embedded directly into the layout. And, you can, of course, switch to a list view.

I like this interface because it helps make sense of an otherwise hard-to-track information source, particularly if one follows a few people who tweet (or retweet) alot. I follow three or four such people, whom I relegate to a separate column in my TweetDeck setup so that those tweets don’t drown out the less frequent tweeters. Even so segregated, the high volume of these tweets make it difficult to find useful references unless I am actively following the tweets. makes it easier to track things because it automatically generates a digest of recently-referenced documents. It’s not perfect, but it seems useful. There are several things I would change about it:

  • Aggregation. In some cases, it would be nice to turn off the more prolific tweeters to see who else said something interesting. Alternatively, a weight could be assigned to people based on their tweeting frequency, with prolific tweeters discounted over infrequent ones. More sophisticated schemes derived from information retrieval (e.g., TF-IDF weighting)  could also be applied.
  • Categories. I would like to define my own categories so that the documents are organized in a way that I find useful, rather than having to hunt through the various sections to understand what’s there.
  • Depth of content. Having more articles in each category would make it easier to understand a topic; if not enough current material is available, older documents could be drawn in to provide depth. Also, it seems that many of the documents mentioned in tweets do not make it to the front page of, but are accessible through the list view. It would be good to have deeper linking from the front page other related  documents.
  • Curation. In the metaphor, the person whose feed is being aggregated is called the curator, but there seems to be little provision for actual curation. Having some more control over which articles get included, excluded, prioritized, etc. would make the final product more useful to me, and perhaps to others as well.
  • Transparency. It is not clear to me how the system decides which documents to highlight, whether this based on document content, or on some twitter-based metrics such as the number of times a document has been retrieved. It might also be interesting to know for a given document who (other than the person in your social network) retweeted a particular document. In work, the document view organized tweets by document mentioned, so that for any given document you could how many people  (and who, specifically) had tweeted about that document and what was said. This seems like useful information to have in certain circumstances.

Overall, however, I am a fan of this sort of innovative approach to presenting search results. In particular, the newspaper metaphor seems under-appreciated for its effectiveness for communicating complex, multidimensional information. While the design of a front page of a major newspaper is a complex manual task that balances many competing factors, the spirit of the multi-column, multi-emphasis layout can be automated in a straightforward fashion to display search results. This can be applied to summaries such as the one generated by or in response to queries, as I had done in my PhD work.

One of the findings from my research was that displaying more documents at a time (with less text for each) produced better recall. Those results, however, were based on a single-screen layout with vertical scrolling limited to each article only. It would be interesting to explore the effect extract size (ranging from a standard search snippet to the entire document inserted into an iframe) has on the usability and usefulness of the system. Another area worth exploring involves representing multiple different ranked lists of results in the same display. This could be used, for example, to provide by high-precision and high-diversity results.

Finally, it’s interesting to compare the site to existing news portals. It’s worth exploring how much value manually-edited web sites provide over a crowd-sourced newsfeed like With the right choice of contributors (editors, really) could be a viable way to produce and consume news. It might also serve as an interesting way to organize previously-captured information such as feeds associated with events captured by TwapperKeeper.

Share on: 


  1. […] This post was mentioned on Twitter by Gene Golovchinsky. Gene Golovchinsky said: Posted "Aggregating Twitter" #twitter […]

  2. […] Golovchinsky reported on a different way to visualize the twitter stream, a new site called Paper.Li. [OB IIW observation: I […]

Comments are closed.