Twitter is a trending topic in HCI research these days. The ICWSM conference is awash with interesting papers on mining and analyzing the Twitter stream, and the upcoming CHI 2010 microblogging workshop promises to be full of interesting discussion on a range of topics around how people use Twitter to communicate.
One of the established ways of studying Twitter use is to collect samples of tweets (e.g., see here) to perform statistical and social network analysis to understand the patterns latent in the tweets. This makes for interesting and (furthermore) publishable research.
On the other hand, the focus on large datasets and aggregate behavior forgets about individual. Not about the individual as a person who contributes tweets to the larger collection, but about the individual who needs to use Twitter to meet his or her information needs.
The kinds of research questions that can be answered by analyzing data collected through the fire hose or the garden hose are often very different from the questions that need to be answered to understand how to help people find information in Twitter.
We can think of the distinction between macro- and micro-level analysis: the macro level addresses network-wide phenomena to help understand overall data flow through the network and the aggregate purposes to which the data are put.
The micro-level, on the other hand, deals with how an individual (who cannot digest tens of thousands of tweets) approaches Twitter with specific tasks with associated information needs. Many of these are familiar to the information seeking research community:
- Re-finding a specific tweet or piece of information
- Trying to understand a particular topic and its associated documents
- Understanding a conversation among several people on a topic
- Catching up
- Following a live event
To help us build tools that help people manage information on Twitter, we need insight into the challenges specific to this medium that will help us build useful tools, and we need a reliable means of evaluating our tools to drive innovation. While some kinds of evaluation may be achieved by open-ended deployments (beta releases), more rigorous methods may also be useful. This rigor can come from test collections that can be used to compare systems or alternatives.
While the information retrieval community has benefited tremendously from the various TREC collections, no similar corpora exist for microblogging data. A test collection is more than just a large body of retrievable objects. It also needs to have well-defined information needs for which some kind of ground truth can be identified, so that principled comparisons can be made.
We need collections that can be used to help evaluate systems that help users with specific micro-level tasks. Researchers need to think about how to construct these collections, and how to perform evaluation on them. How, for example, do we establish some kinds of ground truth to make it possible to apply familiar IR metrics? What other kinds of metrics are appropriate to evaluate systems that (claim to) support the various tasks? Are there useful metrics specific to microblogging collections?
The advantage of Twitter as a data source for collection building is that it is relatively easy to build targeted collections. What’s lacking, it seems, is widely-accepted agreement on what makes for interesting research on micro-level Twitter phenomena.