In a recent blog post, Vegard Sandvold proposed a taxonomy of search systems based on two dimensions — algorithmic vs. user-powered and information accessibility. The first dimension represents a tradeoff between systems and people in terms of who does the information seeking, and the second one measures the ease of finding information in some search space. His blog post was intended to solicit discussion, and, in that spirit, here is my take on his ideas.
Algorithms vs. People?
The distinction between algorithmic and user-powered endpoints on this dichotomy seems difficult to apply. It is possible to design algorithms that leverage people’s task behavior to help them with information seeking. One classical example is the Remembrance Agent that used recently-typed text in an editor to recommended similar documents. Another (shamelessly self-promoting) example is using the presence of freeform digital ink annotations on text to construct queries based on the implicated text to find related documents. These examples illustrate that simple, not even explicitly-search oriented, actions can cause information to be retrieved and offered to the user. In some ways, these actions are user-powered (the system is responding to what the user is doing), but in other ways they are algorithmic.
Perhaps a better way conceive of this dimension is whether it empowers the person: can useful information be identified effectively? Can the system help refine and focus the information need? Can the results be integrated into the activities that motivated the search in the first place?
Another way to look at these interactions is the degree to which they are open-ended. A known item search is like a toy with one button: press it, and music plays. Hurrah! But these kinds of toys do not encourage learning and interaction as much as open-ended toys such as blocks, Play-Doh, etc. that can be used in different ways and encourage exploration. So exploratory search needs open-ended tools that help people learn, that can be used in a variety of ways, ways not always anticipated by their creators.
Maybe one way to interpret the horizontal dimension in Vegard’s post, then, is to look at the degree to which interactions with the system are open-ended. While the challenge of finding useful measures still exists, some places to start are
- The diversity of means of expression of the information need
- The diversity of means of presenting search results
- The ease with which the user can move between various phases of the information seeking process
Based on Vegard’s examples, this dimension seems to capture the degree with which the system’s representation of the found information matches the searcher’s information need. The highest point is question answering, the lowest is something like “I am feeling lucky.” Accessibility is something of a reserved term, at least in American English. Perhaps what we’re really after here granularity of search results, or the degree to which the information that the system presents reflects the latent information need rather than the overt document structure in which that information is embedded. Of course for systems like Wolfram|Alpha, or in other structured databases, this distinction may get blurred. But in cases where the searcher is interested in finding information that’s embedded in documents, this dimension can capture the ease with which that information can be identified.
How can information granularity be measured? The better a system can localize potential answers, whether by constructing answers using NLP, by identifying extracts or well-matching passages, or by identifying sets of terms that represent document clusters, the higher the system scores on this dimension. The more work the user has to do to extract this information, the lower the score. But precise metrics seem difficult to identify, and it is unclear how to compare matching passages to automatically-generated cluster labels, for example. In part the different represenations are based on different approaches to information extraction, and in part searchers’ tasks will affect the utility of different representations. Despite these difficulties, this dimension may offer a broad characterization of the way information is presented to the user.
In summary, it’s probably useful to think of the two dimensions as input and output in the range of human and system behaviors that contribute to the information seeking process. These coarse dimensions offer a broad characterization of the design space, but more nuanced models may be useful to inform the design of specific information-seeking interactions.