Test-driven research

on

This has been a busy summer for the ReBoard project: Scott Carter, Jake Biehl and I spent a bunch of time building and debugging our code, and  Wunder-intern Stacy ran a great study for us, looking at how people use their office whiteboards before and after we deployed our system. We’ll be blogging more about some of the interesting details in the coming months, but I wanted to touch on a topic that occurred to me as we’re working on the CHI 2010 submission.

One of the things we did a bit differently this time (results TBD) was to start thinking about the details of the CHI submission early in the project. This of course affected our choices of design and evaluation methodology, but also put us on a better schedule to meet the submission deadline. So here we are in mid-August, and the system has been built, the data has been collected, ideas and insights are dancing in our heads. As we dive into details of analysis and presentation, that is, how to tell our story, we are trying to buttress some of the qualitative results with corresponding quantitative data.

When we started thinking about this deployment, we decided on a particular approach,  formulated some broad research hypotheses and questions we wanted to address, and designed our data collection methodology accordingly. But it’s difficult to predict the future. As our understanding of our users’ work practices improved, and as some constraints in system capabilities were identified, we adjusted our approach to collect useful data that we hope will tell a compelling story.

All of this preamble led me to the (perhaps obvious) connection between test-driven development and the kinds of formative and summative evaluations we are doing. Test-driven development argues for creating tests that exercise the code you are writing at the same time as that code is created, and for testing as you go. In combination with other agile programming techniques, this can lead to more robust and bug-free components, and to a plethora of other advantages.

How does this relate to HCI and experimental design?

For one thing, when the artifact under test consists of a significant software component, you wind up doing a lot of logging. This includes logging of system functions to help characterize system performance and to help with debugging things when they fail, and also to collect a useful record of users’ activities when using the system.

Logging and experimental design go hand-in-hand: You need to know what to log (and often in what format) to make your hypotheses-testing analyses possible. Pilot testing helps to some extent, but by the time you’re running pilot tests, a lot of decisions are hard to reverse. It is probably better to do unit-testing on your logging and user data analysis tools before the entire system is built to have a better handle on the kinds of analyses you want to run. Do this early en0ugh, and y0u might get a chance to iterate on your experimental design as well (if you’re running controlled experiments).

Having collected a bunch of observational and interview data, we are now looking at log analysis to quantify some of the effects we saw, and may need to write some additional log parsing and analytic tools to get at the data in the right way. The upshot is that in a perfect world, we’d have had enough time to do a real pilot study with different subjects, to refine our methodology, to build a set of analytic tools, and only then to run the real experiment. As it is, we are quite happy with having collected data in the month of August!

Some day I’ll blog about my experience with test-driven development.