On the Science of IR


Miles Efron posted recently on his take on the progress of the IR field in response to a question posted by Andrew Dillon at the last ASIST conference. Miles’ take was that progress was indeed being made for two reasons: the SIGIR conference has become more competitive over the years, and the diversity of corpora in the TREC umbrella has also increased. Unfortunately, I wasn’t there to hear the question or the subsequent discussion, but my guess as to what Andrew Dillon actually meant was not a question of statistical significance, but rather one of magnitude.

Every year we see incremental improvements in Mean Average Precision (MAP) scores reported in SIGIR (and in CIKM, and in other venues) for some narrow conceptions of the search task. The gains are real, but they may not matter. Similarly, Google recently reported (thanks Jeremy, thanks Greg) that a change in latency from 100 msec to 400 msec reduced the number of queries people ran by about 0.5%. Statistically significant, yes. Important? Maybe not.

The scientists among us like to measure things. That’s how we (and others) know we did something interesting. But it seems that what we really want to measure is difficult to observe, and so we settle on some plausible proxy. And so begins the slippery slope.

It is certainly true that having ongoing improvement in indexing and retrieval algorithms is a good thing. But in some ways it has become a victim of its own success, and, like commercial agriculture, now produces decent commodity goods at ridiculously low cost. To continue with the analogy, we need to diversify our notion of information retrieval to include not only the supermarket (where any time of day you can find exactly the same product that you’ve always bought but without the ability to really understand or control what’s in the box) but also the farmer’s market, where you can find more variety, more surprises, and more interaction with the people who grow the food you will be eating.

So there is still room in the field of information retrieval for progress, but the low-hanging fruit of precision-oriented search have been harvested. We now need to look to more difficult tasks, to exploratory search, to interaction, to collaboration.  Looking beyond the ranked list is not only a pragmatic strategy for innovation, it’s also good science.

Share on: 


  1. I like the farmer’s market metaphor. That’s a more elegant way of saying what I tried to get at at ASIST. For critics of experimental IR TREC makes an easy punching bag insofar as it (or more accurately its data) has supported the incremental gains we’re familiar with (it also supports extremely valuable experimentation, of course).

    In response to Andrew’s question, and on the post you mention, I brought up TREC because I’ve seen many new IR PROBLEMs emerge from TREC. Things like expert finding in the enterprise track, topic distillation in the Web and blog tracks, and novel evaluation in the million-query track. I see TREC as an incubator where researchers formulate new ideas of what IR is and what it should be. The down side, of course, is that its admirable focus on empirical testing constrains this process. How do you measure the success of a solution to a problem that may be only partially formed?

    TREC speaks to the vitality and inventiveness of IR as a field. But when it comes to hybridizing, localizing, and making retrieval organic (to continue, alas badly, the farming metaphor) we still face familiar and steep challenges.

  2. […] his comment to an earlier post, Miles Efron reiterated the usefulness of the various TREC competitions to […]

Comments are closed.