History mattersMonday, May 14th, 2012 by Gene Golovchinsky
Exploratory search is an uncertain endeavor. Quite often, people don’t know exactly how to express their information need, and that need may evolve over time as information is discovered and understood. This is not news.
When people search for information, they often run multiple queries to get at different aspects of the information need, to gain a better understanding of the collection, or to incorporate newly-found information into their searches. This too is not news.
The multiple queries that people run may well retrieve some of the same documents. In some cases, there may be little or no overlap between query results; at other times, the overlap may be considerable. Yet most search engines treat each query as an independent event, and leave it to the searcher to make sense of the results. This, to me, is an opportunity.
Design goal: Help people plan future actions by understanding the present in the context of the past.
While web search engines such as Bing make it easy for people to re-visit some recent queries, and early systems such as Dialog allowed Boolean queries to be constructed by combining results of previously-executed queries, these approaches do not help people make sense of the retrieval histories of specific documents with respect to a particular information need. There is nothing new under the sun, however: Mark Sanderson’s NRT system flagged documents as having been previously retrieved for a given search task, VOIR used retrieval histograms for each document, and of course a browser maintains a limited history of activity to indicate which links were followed.
- Is a particular document central to a given information need?
- Is this the first time this document has been retrieved?
- Have I ever seen this document before?
- Is this query breaking new ground, or largely just re-ranking previously-retrieved documents?
Querium includes several visualizations to help answer these and related questions. These include histograms that display the retrieval history of each document, filters that allow documents to be selected based on whether they were previously retrieved, clicked on, etc., and a query-centric overview of search results.
Querium keeps track of the top 100 documents retrieved by each query; a simple inversion associates each document with the queries that retrieved it. Whenever a search result is displayed, it is accompanied by a histogram that shows its retrieval history:
The histogram on left, for example, shows that the document in question was retrieved three times out of the four queries that were run, that it was ranked lower in query 2 than in queries 1 and 3, and that it was retrieved by more than one person (the colors represent different collaborators).
This histogram, on the other hand, shows that the document has only been retrieved once, and was highly ranked. Thus given a series of documents retrieved by a query, the searcher can quickly tell which documents are new, which ones have been seen before, etc.
In addition to aggregating metadata from retrieved documents to generate filtering facets, we can aggregate process metadata derived from retrieval patterns within a search mission. We have decomposed process metadata in three facets: retrieval, viewing, and assessment. Retrieval counts how many times a document has been retrieved, viewing counts how many snippets were viewed and how many were clicked on to show the full document, and assessment counts documents that were explicitly bookmarked (positive assessment), or marked as not useful (negative assessment). We can also filter based on the identity of the person running the query.
These counts form the basis of filtering operations that can be used to surface as-yet-unexamined documents, to highlight documents that represent recurring themes in the searches, and to review key information that has been identified for a given information need. And of course they can be combined with document metadata for more precise explorations.
Finally, we can display the overall retrieval history of a mission by looking at the queries as a whole. For each query, we show a symbolic representation of each retrieved document (a rectangle), decorated to reveal its history of interaction. The decorations indicate whether the document has been viewed (grey box), clicked on (grey lines), assessed positively (green lines), negatively (red lines), or used as part of a relevance feedback query (checkmark). The screenshot fragment below shows such a view:
As the searcher mouses over each document, all other retrieval instances are highlighted with a black border to show that a document may have been retrieved by other queries.
All of these techniques are designed to help people understand how a particular set of results relates to previous activity within the search task. We are in the process of evaluating the system to see which aspects we got right and which ones need more work. In particular, it will be interesting to see if making these patterns of retrieval visible will affect people’s perceptions of their search activity.