HCIR 2011 took place almost three weeks ago, but I am just getting caught up after a week at CIKM 2011 and an actual almost-no-internet-access vacation. I wanted to start off my reflections on HCIR with a summary of Gary Marchionini‘s keynote, titled “HCIR: Now the Tricky Part.” Gary coined the term “HCIR” and has been a persuasive advocate of the concepts represented by the term. The talk used three case studies of HCIR projects as a lens to focus the audience’s attention on one of the main challenges of HCIR: how to evaluate the systems we build.
The case study of the Open Video introduced the notion of surrogates, which are representations of metadata designed for human consumption. One significant challenge, particularly for video objects, is to how to evaluate the effectiveness of surrogates. While most evaluations are done as laboratory studies, do these findings retain their validity in naturalistic settings?
The second case study focused on the Relation Browser, a faceted browsing tool. The Relation Browser is designed to reveal relationships across facets, and supports exploration through browsing rather than keyword search. This is a system that has undergone many revisions and redesigns over a decade. The lab’s experience with the project raised several important issues that are applicable much more broadly: how does one capture the design rationale that goes into evolving the system over time? How to show the benefits of HCIR interfaces when users prefer interfaces with which they are already familiar?
The third case study, based on the on-going Results Space project, explored HCIR issues related to surrogates (are there better ways to represent search results than lists of snippets), and touched on issues of awareness in collaborative search.
Having presented the case studies, Gary raised some broader issues for the HCIR field. While he admitted that most of the work that he has been involved with focused on lab studies, he recognized the need for field observations as well. In fact, he suggested that neither approach would be truly useful in isolation. A process that starts with qualitative observations to generate hypotheses that are then tested in the lab is one effective way to integrate the methods, but Gary also raised the possibility of proceeding in the opposite direction.
He also identified the following challenges:
1. Query quality: can we assess the quality of a query, which is often the first signal of user behavior? Can this be done pre-retrieval as well as post-retrieval? How can we elicit human judgments of query quality?
2. Behavior as evidence: what are the various ways in which searchers’ behavior can be converted into implicit relevance feedback? Can we match behaviors to queries to infer what people might be interested in?
3. How do we design and evaluate surrogates that help people perceive, recognize, gist, understand, interpret, analyze and evaluate search results?
4. How do the tools we build affect searchers’ cognitive load? As we add more tools, are we helping or hurting? Does collaboration simply increase the load further without offering tangible benefits, or is there a useful tradeoff?
5. How do we measure session quality and search quality? Recall and precision are point-based measures that don’t capture the process characteristic of session-based search. How do we adapt them to the session as a unit of interaction (as opposed to a query)? What other metrics are appropriate?
6. Finally, he touched on the tricky issue of recording and capturing experimental traces: annotated logs, video recordings, data sets, etc. Should researchers publish traces of system use to characterize participants’ search activities?
The talk also included an interactive interlude, in which Gary showed the audience the following graph:
The vertical axis was some kind of performance metric, and the horizontal axis represented queries over time. The dashed green line shows steady progress from lower-left to upper-right, while the solid blue line takes a more erratic route between the same starting and ending point. Gary’s question to the audience was which line represented better, or perhaps more desirable, system performance. A number of suggestions were offered, including the benefits of steady progress, of relieving frustration though surprise (the aha! moment reflected in a sharp upward spike of the erratic curve), of achieving maximum performance if only for a short time, etc. There was some consensus that the line that went up and down reflected a learning process that is often characteristic of exploratory search. Which one represents more desirable performance, however, may well depend on the task and the measure selected.
In short, the talk gave a great summary of some accomplishments of HCIR research and of some of the challenges ahead. We expect that next year’s presentations will offer some insights into these thorny issues!