A future of search

on

Jamie Callan of CMU gave an interesting and thought-provoking keynote talk at CIKM 2010. While traditionally search engines have been used in a more or less direct manner to identify useful documents that the user would then (manually) incorporate into other tasks, Jamie suggested a new class of applications that would use search engines for the purposes of identifying documents or parts of documents in some collection, but then would apply this information in pursuit of some other, more specialized, task.

While the notion of using a search engine as a component of another system is not particularly novel, the kinds of requirements that his proposed use imposes on search engines would certainly push the envelope.

He started by motivating his talk with three applications: computer-assisted language learning, question-answering, and Never Ending Language Learner (NELL). The way such applications are constructed currently involves running some query, extracting information from the results, and throwing away the junk. Queries are typically keyword searches or simple patterns, but these often do not meet the true requirements of the applications that consume the results.

What he was proposing was a general solution that allows the search engine to know as much as possible about the application’s information need and the document contents. His claim is that current structured queries can handle simple document structure, but are too brittle and cannot handle more complex structures. His examples centered about using various NLP techniques to parse document structure and the kinds of failures that might break traditional approaches to index structured documents.

The best example of an alternative approach that Jamie described was based on work in indexing PubMed structure that was based on a relational schema that combined elements of a traditional document (author, title, abstract,  journal, etc.) with more meta-level information such as gene-t0-gene relationship information. (Unfortunately, I wasn’t able to catch the reference to this work; stay tuned.)

He concluded with a call for research on this new class of applications that combine multiple forms of knowledge and language analysis and metadata and structure of varying reliability. These applications pose many interesting unsolved core IR problems, require diverse information resources to exploit, and create opportunities for new retrieval models.

While I found the talk interesting and inspiring, I think some of the kinds of indexing and ranking algorithms that he wants to see already exist. Two examples come to mind: Ancestry.com and PowerSet.

Ancestry.com’s search interface implements a rich and flexible schema that incorporates exact matches and flexible best-match approach. While this is not exactly what Jamie suggested because it solves a direct search problem, it does address some of the search infrastructure requirements of flexible search capability.

As I understand it, PowerSet’s approach is to index (subject, verb, object) tripples are created from documents, and then indexed in a more traditional manner. This seems to be a step in the right direction, and fallback strategies for compensating for parser failures should make this more robust. Its incorporation into Bing for Wikipedia search is further indication of the viability of this approach.

10 Comments

  1. […] This post was mentioned on Twitter by Gene Golovchinsky, Tatsumi Kobayashi. Tatsumi Kobayashi said: RT @HCIR_GeneG: Posted "A future of search" http://palblog.fxpal.com/?p=4866 #cikm2010 Reflections on Jamie Callan's keynote […]

  2. Makes me think of the Rememberance Agent, an Emacs plugin which would search using the contents of the current buffer, and show those results continually in a small window. The idea was that related documents (code, email messages, etc) would always be at hand.

  3. Thanks for the writeup.

    I posted Michael’s notes from Jamie’s presentation. It includes the link to the paper on relational retrieval.

  4. Twitter Comment


    RT @HCIR_GeneG: Posted “A future of search” [link to post] #cikm2010 Reflections on Jamie Callan’s keynote

    Posted using Chat Catcher

  5. Twitter Comment


    I’ve just read it but thank you :) RT @ArjumandYounus: @miguelmalvarez Here … @HCIR_GeneG [link to post] #cikm2010

    Posted using Chat Catcher

  6. Twitter Comment


    Posted “A future of search” [link to post] #cikm2010 Reflections on Jamie Callan’s keynote

    Posted using Chat Catcher

  7. Twitter Comment


    @miguelmalvarez Here are some notes from it posted by @HCIR_GeneG [link to post] #cikm2010

    Posted using Chat Catcher

  8. @Ian, I think Jamie had in mind applications that have specific (potentially richly-structured) information needs that identify information that would then be transformed and incorporated into that application’s tasks. Remembrance Agent implemented half of that vision by creating queries from user activity, but it them presented the results a simple list rather than doing something with the results. But it does share something of that spirit.

  9. @Jefff, thanks for the notes & the link!

  10. Giovanna says:

    I just heard about this talk from Twitter and blogs. What I understand from your post is that Information Retrieval is now performed by machines or applications. This poses brand new challenges in how to formulate queries to represent information needs.
    Interesting also from the point of view of IR evaluation: in this case one needs pure task-oriented measures and can skip all the cognitive modeling of users, criteria related to usability etc.

Comments are closed.