The work honored with the paper award at the ECIR 2010 conference described an experiment that assessed the effectiveness of a case-based reasoning mechanism for suggesting possible actions for users engaged in an exploratory search task. The authors constructed DAFFODIL, a sophisticated interface for issuing queries, for saving documents, and for suggesting potentially useful query expansion terms. They performed a preliminary evaluation of the system on three search tasks, and compared subjects’ performance and behavior patterns with and without system-generated suggestions.
Although they found that people using the assistance feature issued fewer queries, viewed more results, and saved more documents, the experiment did not have enough power to reach statistical significance for the first two measures, but the propensity to save more documents also carried over to the propensity to save more relevant documents for two of three tasks. There was no difference in performance for the third task.
The experiment also did not have enough power to find statistically-significant differences in query reformulation strategies, although the reported effects were border-line significant. Some query manipulations such as restricting search to specific fields were used seven times more frequently in the experimental condition (vs. control), and term suggestions offered by the system were incorporated almost ten times more frequently.
Finally, the study results suggest that there was a learning effect that caused people to apply suggestions learned from two topics (with hints active) to the third topic (without hints). The short study’s duration makes it impossible to tell whether the effect is durable.
It would be interesting to learn if there were significant individual differences in subjects’ strategies for performing these tasks, and the extent that these strategies depended on the suggestion mechanism. In my PhD work, I had observed significant preferences in people’s propensity to pursue specific search tactics, and also found that some people had a bias for viewing more retrieved documents. A cluster analysis of people based on interface tactics they adopted showed that people who viewed more documents achieved higher recall without a loss of precision.
Complex and powerful information seeking interfaces such as DAFFODIL allow many means of expression of information needs, which may or may not produce statistically-significant (or important) differences in performance. The lack of significant differences in performance, however, does not mean that a range of different tactics should not be supported.
We need to build up a better understanding of tactics that people adopt when using complex information exploration interfaces. In addition to (or perhaps instead of) running controlled hypothesis-testing experiments, researchers should also instrument their systems to collect sufficient usage data to perform exploratory data analysis on their users’ behavior. This kind of data is relatively easy to collect (think about logging interaction events with enough context to tie them to performance), and does not complicate any existing experimental setup.
The benefits are two-fold: first,the community will collect a growing body of work that characterizes how people use complex tools, from which we may be able to identify more generalizable patterns, and second, if hypothesis testing fails to produce any statistically-significant results, post-hoc behavioral analysis may save the day.