How far to generalize?


The importance of understanding people’s activity to inform design is one of the central tenets of HCI. When design is grounded in actual work practice, it is much more likely to produce artifacts that fit with the way people work and the way they think. One key challenge when studying people for the purpose of informing design is to understand what aspects of existing work practice are essential to the work and what aspects are side-effects of existing technology (or lack thereof) and are fair game for innovation.

While HCIR research often relies on recall and precision measures to compare systems, qualitative methods are used as well. For example, Vakkari and his colleagues studied several students performing research for their Master’s thesis work. Researchers used a variety of techniques including diary entries and interviews to assess the evolution of searchers’ behavior over the course of a few months. Their findings led them to fill in some of the details of Kuhlthau’s model of information seeking.

While their study shed some light on existing practices of OPAC search, it is less clear how their findings inform the design of alternative ways of supporting people engaged in similar information seeking tasks. For example, they report on how searchers’ use of Boolean operators varied over the course of their research, finding that

As the task performance proceeded, the students’ use of Exhaust [entering all terms] decreased and Select [entering a fraction of the terms] increased. Those who used Exhaust were not as far advanced in their process, and their conceptual constructs were less developed than those who started with Select.

They also found that over the course of the project, students’ tactics evolved toward the use of more complex Boolean queries that contained several facets (synonyms ORed together) combined with AND operators. The is also the technique favored by those searching other complex collections with Boolean operators (see, for example, the Pubmed Search Strategies blog which I’ve written about a few times.) So their results appear accurate for Boolean queries, but are not as useful for more modern best-match search systems used by the majority of searchers. We can infer from the results of this study that people tend to start broadly and then  make their searches more specific as they gain an understanding of the topic. But it’s hard to know what to make of the findings beyond that level of generality, and unclear whether these observations will apply with the same clarity to probabilistic search engines and relevance feedback queries.

Ultimately the challenge for both quantitative and qualitative methods is to obtain findings that can generalize beyond the experimental situation. It’s often tempting to generalize beyond the facts of the experiment, particularly because the language we use to describe people’s behaviors can mask important differences in systems. Resisting that temptation, on the other hand, leads to seemingly “smaller” contributions that are so grounded in the task that was studied that their lessons are difficult to translate to other situations. Is it an impossible task then to derive broad meaningful results from specific studies? I am not sure I am willing to generalize that far.