Recall-oriented search on the web


Most popular web search engines are optimized for precision—getting that useful document in the in the top five or ten hits so that the user doesn’t have to page through the results to find it. This works well for known-item search (finding an address of a restaurant, a birthday of a movie star, etc.) and for searches that rely on combinations of keywords.

But some kinds of information needs don’t fit that pattern well. Sometimes the information being sought is spread over multiple documents, sometimes people need to find multiple instances of documents that match some query to compare or contrast them, etc. The task becomes more recall- rather than precision-oriented. Furthermore, these searches may be repeated over time, as the user finds information that causes the information need to change. Medical information seeking is one obvious such example. Are there others?

Some possible useful directions for research in this space

  • Identify, describe, and catalog classes of information seeking that exhibit these recall-oriented behaviors
  • Identify web-based resources that are associated with specific areas of recall-oriented search
  • Identify evidence of such searches in web logs for this sort of behavior
  • Create heuristics for detecting recall-oriented search behavior in near-real-time
  • Devise user interfaces to support recall-oriented search
  • Don’t reinvent the wheel — try to leverage existing search tools such as Yahoo BOSS or the Google Search API

I am sure that much of this work has already been done (as Daniel Tunkelang points out),  but it would be useful, I think, to bring it together in a coherent way to inform system design.

Share on: 


  1. Identify evidence of such searches in web logs for this sort of behavior

    What would such evidence look like? How would we know it if we see it?

    I am wondering this, because I’ve heard stories from some of these web search engines about how there is no evidence in the logs for this type of user behavior. And so the search engines say that nobody engages in this type of behavior. But in reading the blogosphere, and even in talking with others, I see examples all the time of people expressing information needs that are recall-oriented. And I find it hard to believe that they don’t use the web search engines, to meet those needs.

    So maybe the web search engines just don’t know how to look for it?

  2. […] Check out their recent post about “Recall-oriented search on the web“. […]

  3. Presumably, search logs include cookie-based session information based on which a particular user might be identifiable. My guess is that it may be possible to group queries based on time and keywords, and (with some more heavy-weight analysis) by clustering queries based on documents that were retrieved by them. Retrieved documents are probably not in the log, but queries could be re-run offline to approximate their search results. For query-query similarity computations (useful for establishing a search topic), it’s probably not so important to have the same result that the user saw, and you could probably get away with clustering based on the top 20-100 documents. Might be an interesting exploration.

  4. Two thoughts:
    There are several non-web search tasks that require high recall – legal & patent search come to mind, where exhaustivity is part of the task description. An academic literature review is another. Do these have analogues in web search?

    Does the size of a web collection make a truly recall-oriented task impractical? Does it even make sense in such a collection? You can think of many recall oriented “web search” tasks that are limited to sub-collections of the web — find all the airlines that fly from San Francisco to Pittsburgh; find all blog posts that discuss some current event. Exhaustive searches for this information is no so tough with an appropriate tool for that collection.

  5. jeremy says:

    Jon, I think there are a lot more scenarios than just legal, patent, lit review, etc. Think about any situation/information need in which you’re not just doing known item or navigational search. For example, searching for new music. Or trying to figure out where you want to go eat. Or even picture search! Some times I really am looking for a particular, famous, known picture. But most of the time I want to see all the photos that have been taken at/near a particular location. Or of a particular subject (e.g. “inauguration”) or at a particular time of day (e.g. “sunrise”). I’m not looking for any one sunrise picture, any one location picture, or any one event picture. I want to explore the whole lot of them, to see all the beautiful images that someone has created. It’s recall-oriented.

    Some of these recall-oriented needs are easier to address than others, especially if you have the proper metadata. But whether or not you have the metadata, the point is that the recall-oriented information need still exists. And to a much larger degree than the major web search engine companies let on. Like said, folks from some of the major engines have claimed that they see no evidence for recall-oriented tasks in their logs. I find that very hard to believe.

    Google, for example, has been extremely slow in recognizing the fact that recall-oriented searches do exist. But that is slowly changing. Here’s one example where they’ve finally realized that 10 blue links isn’t enough:

    Note the comment, however, from the author of this article: “Location-friendly service Yelp has offered something similar to this for quite some time now; Google, however has not been quite as innovative with its own tools.”

    What that says to me is that this is a domain in which recall-oriented information needs exist, but Google has been very slow to recognize that fact.

    I’m sure if we sat down for an hour together, we could come up with dozens and dozens of different recall-oriented examples/types that are applicable to the general web.

  6. So my question is given that this behavior exists, can we devise heuristics for identifying it in search logs? One obvious solution is to let the user decide whether to use the simple (“I’m feeling lucky”) interface or the more capable (“I am feeling inquisitive”) interface.

  7. I think part of the problem is that, since the current search engines do not support recall-oriented behavior, a recall-oriented user has to work around the search engine, e.g., going to Wikipedia or other information sources. I don’t know that the behavior shows up in the search logs. Offering a more flexible interface would help–perhaps something as simple as what you suggest.

Comments are closed.