Searching deeper

on

Daniel Russell wrote up a nice summary of my search for the origins of DanielĀ  Tunkelang’s name. Daniel R. drew two lessons from the exercise: one, that social search (although I would say the social was bordering on the collaborative, in this case) can be effective because it integrates insights of multiple people; and two, that some domain knowledge helped me navigate the search results more effectively.

I’d like to expand his second point a bit.

The effectiveness of modern search engines may have lulled people into the implicit belief that if Bing (or Google or Yahoo!) can’t find a document in the top few results, then that document doesn’t exist. This assumption may be true for really common information (such as what Britney Spears is up to at any given moment), but many kinds of information are notĀ  readily findable using a typical search engine. Sometimes, as Daniel Russell likes to document on his blog, it’s a matter of phrasing the query appropriately; but often you have to think more broadly about the collections you search. (For more on this, see Daniel’s Russell’s HCIR 2010 Keynote slides.)

Web search technologists sometimes talk about the Deep Web, that is, information that is accessible through a web browser, but not findable through a horizontal search engine (Google, Yahoo!, Bing, etc.). There are many databases (e.g., genealogical data) that you can find through the web, but whose contents are not indexed along with all the other web documents.

There are several reasons why this information is not available through a single search interface: database owners may want to charge (or otherwise control) access to the data, the data may be of only specialized interest, and the data may have a well-defined structure that a horizontal engine will not be able to manage effectively.

Again, in the genealogical realm, one can search using all sorts of (partial) information such as names, dates, places, and relationships, but these values are logically related by very well defined rules that don’t apply to other kinds of data. The search engine has to have specific knowledge about the nature of similarity of the records being searched; simply lumping names and dates into a “document” will not improve the quality of search results.

So one lesson for searching more effectively is to understand the nature of relevant collections, and to recognize when to use a web search engine as a search engine, and when to use it as a finding aid to find other collections.

Another thing to keep in mind as you’re searching databases is how much to you trust the ranking function: modern search engines like to estimate the quality of the match to improve precision, and that works well as long as the estimates are accurate. In some cases, however, it may be useful to look deeper in the result set if there is reason to suspect that the ranking algorithm isn’t giving you what you expect. This is particularly true in the case of genealogical data where there are multiple competing dimensions for similarity and the data is noisy or even wrong.

There is a well known (at least to me) joke about a guy looking for his glasses not where he lost them but under a streetlamp. Why? Because that’s where the light is. We often make the same mistake when picking a horizontal web search engine for our searches, rather than searching a more focused collection. Knowing where to look is one the skills that define good reference librarians, a skill that can be learned and practiced by anyone.

1 Comment

  1. Twitter Comment


    Posted “Searching deeper” [link to post]

    Posted using Chat Catcher

Comments are closed.