Exploratory search often takes place over time. Searchers may run multiple queries to understand the collection, to refine their information needs, or to explore various aspects of the topic of interest. Many web search engines keep a history of a user’s actions: Bing makes that history readily available for backtracking, and all major search engines presumably use the click-through history of search results to affect subsequent searches. Yahoo Search Pad diagnoses exploratory search situations and switches to a more elaborate note-taking mode to help users manage the found information.
But none of these approaches makes it easy for a searcher to manage an on-going exploratory search. So what could be done differently? We explore this topic in a paper we’ll be presenting at the IIiX 2010 conference this August. Our paper reviews the literature on session-based search, and proposes a framework for designing interactions around information seeking. This framework uses the structure of the process of exploratory search to help searchers reflect on their actions and on the retrieved results. It treats queries, terms, metadata, documents, sets of queries, and sets of documents as first-class objects that the user can manipulate, and describes how information seeking context can be preserved across these transitions.
We illustrate the framework with an example from a system we are building. The system allows searchers to explore a collection in a session-based manner. For each session, the system keeps track of all queries that were run (typed or relevance feedback) and all documents that were viewed or saved. It uses this information to help searchers make sense of the results accumulated over time.
Queries can be run by typing terms into a text box, or by selecting groups of useful documents for relevance feedback. Documents can be selected in an ad hoc manner, or by grouping all documents found relevant with respect to a particular query.
In addition to the results view, the system maintains a document history view and a query history view. The document history view shows a fused list of documents identified by all queries run in a particular session. It can be sorted in a variety of ways, including based on a fusion score (we use CombMNZ). This approach can surface documents that were retrieved by multiple queries but were never ranked high by any query.
Whenever a document is shown in a results list (whether for a particular query for the entire session), the system also shows that document’s retrieval history in a histogram. The histogram allocates a bar for each query in the query history. The height of the bar indicates how high the document ranked with response to the query. A gap in the histogram indicates that a document was not retrieved by a particular query. This visualization makes it quite easy to tell whether a particular query retrieved new documents or merely re-ranked previously-retrieved ones, and it makes it easy to tell whether some documents are being ranked consistently higher than others.
The system also keeps track of multiple users in each session, making it possible to collaborate with others. Each user’s queries and saved documents are added to the shared history, and color-coding in the query history view and in the document histograms is used to indicate who contributed what. New relevance feedback queries can be run on documents retrieved by other users in a session.
Much remains to be done on this project, including developing strategies for effective evaluation of this tool. Recall and precision measures based on a prior judgments of relevance are hard to apply to this kind of a multi-query environment designed to allow users to pivot around their found objects in a variety of ways. We will need to seek outcomes beyond the search task, and to collect qualitative use data to help us understand the effectiveness of these tools.