How much does time weigh?

on

As Miles wrote yesterday, our paper was accepted to SIGIR 2011. The idea that time has an impact in ranking documents is not new; the problem seems to be to know when to take it into consideration. For example, while Li and Croft showed improvements in ranking when incorporating the notion of recency, we found that the algorithm degrades performance of non-temporal queries. (This is obvious, in a sense: if a ranking algorithm is biased toward more recent documents, and recency is not important for a given query, it will de-emphasize otherwise well-matching documents, thereby reducing MAP.)

One of the contributions of our work, then, is to estimate whether a particular query is sensitive to time, and to apply temporal smoothing only when the query is likely to benefit from this. The approach yielded improvements in performance of temporal queries without compromising other kinds of queries. Of course the algorithm isn’t perfect, because it cannot divine the user’s intent. For interactive information seeking, therefore, it probably makes sense to give the user control over how much importance to assign to recency. Several ways of incorporating this input come to mind:

  • We can adjust the assessment of the probability that the query is a recency query by making the estimate more conservative.
  • We can incorporate the weight into the smoothing function.
  • We can calculate the unbiased and the temporally-biased ranked lists separately, and then allow the user to control the mixing weight a la Pickens and Golovchinsky.

It’ll be interesting to put together an interactive system around these algorithms to see how well users can manage the additional complexity and whether they perceive the benefits.

3 Comments

  1. Nice concept! Temporal attributes of a query *can* be important in certain contexts.

  2. Gene, is the approach targeted towards any temporal distribution, or tries to identify queries that benefit from recency? (Sorry, if I miss the obvious, I went through the paper very quickly.)

    Btw, you can replace the reference to the Dakka/CIKM08 paper with the more complete journal version: http://bit.ly/dGIFd3 (and fix the typo for my name, which is not Periotis :-)

  3. @Panos, we evaluated it with respect to recency. It could be adapted to other distributions. Certainly a simple translation in time would give recency with respect to that point. It should also be possible to flip things around and look at primacy — which documents first mention a particular concept — although the trick there is to establish an appropriate baseline from which to look for deviations. Perhaps a different approach, one not based on differences in time, might solve that issue.

    And yes, we’ll fix the typo! Sorry!

Comments are closed.