Released: Reverted Indexing source code

on

I am pleased to announce that we are releasing a version of the reverted indexing framework as open source software! The release includes the framework and an implementation in Lucene.

Reverted indexing is an information retrieval technique for query expansion, relevance feedback, and a variety of other operations. The details are described on our web site, in several posts on this blog, and in our CIKM 2010 paper. The source code and JAR file can be downloaded from Reverted Indexing page; see the Javadocs for details of the API.

I’ve tried to make the code as easy to use as possible, but it would be useful to get feedback on what could be improved to make the library more useful. Of course others are welcome to implement the framework for other search engines, and I am happy to offer some help on such efforts. In principle, though, it’s quite simple: the Lucene implementation is just a few classes, with a few methods each.

Examples

Here is how you can create a reverted index given that you already have  Lucene inverted index:


 // Create the spec for the inverted index
IndexSpecification spec = 
  new IndexSpecification("inverted","id","body",1000);


// Create a searcher that will execute the basis 
// queries.
Searcher searcher = new LuceneSearcher( spec );


// Create the iterator that will generate the index 
// terms from an inverted index to be used as 
// basis queries
BasisQuerySource termSource =
  LuceneIndexTermIterator.createStandardIterator(
    spec.getLocation(),null,5,"basis-queries.txt");


// Create a RevertedDocument iterator with the 
// searcher and the iterator over the basis queries
RevertedDocumentIterator rdi =
  new RevertedDocumentIterator(searcher, termSource);


// Build the index
RevertedIndexing indexer = 
  new RevertedIndexingLucene();
indexer.buildRevertedIndex(rdi);
indexer.close();
searcher.close();

Once the reverted index is created, you can query it to perform a number of standard operations. First, you need to set up searching, which involves creating a Searcher on each index, and putting them together like this:

// Create the searcher on the inverted index
IndexSpecification spec = 
  new IndexSpecification("inverted","id","body",1000);
Searcher inverted = new LuceneSearcher(spec);



// Create the searcher on the reverted index
IndexSpecification revertedSpec = 
  RevertedIndexingLucene.revertedIndexSpec(
    "reverted", 500);
RevertedSearcher reverted =
    new LuceneRevertedSearcher(revertedSpec);


// Create the reverted querying instance
RevertedQuerying revertedQuerying =
    new RevertedQuerying(inverted, reverted);

Now you can do relevance feedback, assuming some user input:


String userQuery = "good stuff";
String[] docids = {"docid1", "docid2"};
RankedList relDocs = RankedList.fromDocIds(docids);
ExpansionResults results =
    revertedQuerying.runRelevanceFeedbackQuery(
        userQuery, relDocs, false );

Pseudo-relevance feedback works like this:


results =
    revertedQuerying.runPseudoRelevanceFeedbackQuery(
        userQuery, 5, false );

In each case, the ExpansionResults instance will contain the set of documents retrieved by the original (unexpanded), the set of expansion terms, and the final set of documents. When doing relevance feedback, you can also ask for the residual document list that excludes the documents used for relevance feedback.

If you just want to get some expansion terms given a collection of document ids, you can do this, without bothering with a RevertedQuerying object:


String[] docids = {"doc1", "doc2", "doc27"}
RankedList terms = reverted.runDocumentQuery(docids);

Each item in terms will contain a docid and a score, where the docid is a basis query that can be used in the inverted index. Typically, it will be of the form field:value which was generated by the indexing process.

Acknowledgments

While I am the one who bundled the code together and pushed it out, much of the credit for this work belongs to Jeremy Pickens who conceived of the idea in the first place. Of course his inspiration was Leif Azzopardi and Vishwa Vinay’s work on retrievability, and Leif also planted the idea that we should release the software as open source. Finally, I would like to thank Abdigani Diriye for beta-testing this API, and Andreas Girgensohn for his suggestions.

Comments, suggestions are always welcome!

1 Comment

  1. Excellent!

Comments are closed.