Blog Archive: 2010

Slides from CIKM 2010 Reverted Indexing talk

on Comments (1)

Here are the slides from our talk at CIKM 2010 last week. More details on reverted indexing can be found in an earlier post and on the FXPAL site, the full paper is available here, and the previous post describes why the technique works. The contribution of the paper can be summarized as follows:

We treat query result sets as unstructured text “documents” — and index them.

On term selection in reverted indexing

on Comments (4)

Jeremy Pickens contributed to this post.

Jeremy did a great job of presenting our Reverted Indexing paper, but the short session made it difficult to answer all questions and comments thoroughly. For example, William Webber wrote up a post summarizing our work, in which he observed

The authors surmise that the reverted index is more effective because it suggests more selective expansion terms, and they reproduce example term sets as evidence. This explanation is convincing enough as far as it goes; but what is not explained is why the reverted index’s expansion terms are more selective. The reason is not obvious. A single-term reverted index is not much more than a weighted direct index, mapping from documents to the terms that occur in them

I would like to address his comments because this is a key aspect of Reverted Indexing.

Continue Reading

Sue Dumais at CIKM 2010

on Comments (2)

Sue Dumais of MSR gave an excellent keynote address at CIKM last week, in which she emphasized the temporal nature of collections used for information retrieval and of the way people access information on the web. This was by far the most user-oriented talk at the conference that I attended, and a refreshing change from the vast array of machine learning papers in the rest of the conference.

The slides from the talk will be available on her site, but are substantially similar to her ECDL 2010 keynote talk. In short, Sue described how collections and documents change over time, and how people’s patterns of visiting web sites change in response to content evolution. She also introduced a new browser plugin for Internet Explorer called Diff-IE that helps people understand changes to the web sites they visit.

Continue Reading

A future of search

on Comments (10)

Jamie Callan of CMU gave an interesting and thought-provoking keynote talk at CIKM 2010. While traditionally search engines have been used in a more or less direct manner to identify useful documents that the user would then (manually) incorporate into other tasks, Jamie suggested a new class of applications that would use search engines for the purposes of identifying documents or parts of documents in some collection, but then would apply this information in pursuit of some other, more specialized, task.

While the notion of using a search engine as a component of another system is not particularly novel, the kinds of requirements that his proposed use imposes on search engines would certainly push the envelope.

Continue Reading

Reverted Indexing

on Comments (8)

Traditional interactive information retrieval systems function by creating inverted lists, or term indexes. For every term in the vocabulary, a list is created that contains the documents in which that term occurs and its relative frequency within each document. Retrieval algorithms then use these term frequencies alongside other collection statistics to identify the matching documents for a query.

In a paper to be published at CIKM 2010, Jeremy Pickens, Matt Cooper and  I describe a way of using the inverted index to associate document ids with the queries that retrieve them. Our approach combines the inverted index with the notion of retrievability to create an efficient query expansion algorithm that is useful for a number of applications, including relevance feedback. We call this kind of index a reverted index because rather than mapping terms onto documents, it maps document ids onto queries that retrieved the associated documents.

Continue Reading