Blog Category: Information seeking

HCIR intern, 2012 edition


Update: This intern slot has been filled.

It’s intern season again! I am looking for a PhD student well-versed in persuasive/affective computing/captology literature to participate in a research project related to improving the quality of interaction in information seeking environments. The goal of the project is to explore how to increase people’s engagement with systems while performing exploratory search. We would like to improve our current system to make it more usable and to explore some novel interaction techniques.

Applicants should be familiar with basic tactics of designing affective and engaging interfaces in a web-based environment. The internship will last three months, and will be structured to produce and evaluate research systems. As a further incentive, we expect to publish the results of this work at CHI 2013, which will be held in Paris. For more information on the intern process, please see the FXPAL web site, or contact me directly. I would like to fill this internship slot as soon as possible.

Collaborative search on the rise?


I am seeing an interesting not-quite-yet-a-trend on the emergence of collaborative search tools. I am not talking about research tools such as SearchTogether or Coagmento, but of real companies started for the purpose of putting out a search tool that supports explicit collaboration. The two recent entries in this category of which I am aware are SearchTeam and Searcheeze. While they share some similarities, they are actually quite different tools.

Continue Reading

A quick study of Scholar-ly Citation

on Comments (1)

Google recently unveiled Citations, its extension to Google Scholar that helps people to organize the papers and patents they wrote and to keep track of citations to them. You can edit metadata that wasn’t parsed correctly, merge or split references, connect to co-authors’ citation pages, etc. Cool stuff. When it comes to using this tool for information seeking, however, we’re back to that ol’ Google command line. Sigh.

Continue Reading

Recall vs. Precision

on Comments (3)

Stephen Robertson’s talk at the CIKM 2011 Industry event caused me to think about recall and precision again. Over the last decade precision-oriented searches have become synonymous with web searches, while recall has been relegated to narrow verticals. But is precision@5 or NCDG@1 really the right way to measure the effectiveness of interactive search? If you’re doing a known-item search, looking up a common factoid, etc., then perhaps it is. But for most searches, even ones that might be classified as precision-oriented ones, the searcher might wind up with several attempts to get at the answer. Dan Russell’s a Google a day lists exactly those kinds of challenges: find a fact that’s hard to find.

So how should we think about evaluating the kinds of searches that take more than one query, ones we might term session-based searches?

Continue Reading

HCIR 2011 keynote

on Comments (4)

HCIR 2011 took place almost three weeks ago, but I am just getting caught up after a week at CIKM 2011 and an actual almost-no-internet-access vacation. I wanted to start off my reflections on HCIR with a summary of Gary Marchionini‘s keynote, titled “HCIR: Now the Tricky Part.” Gary coined the term “HCIR” and has been a persuasive advocate of the concepts represented by the term. The talk used three case studies of HCIR projects as a lens to focus the audience’s attention on one of the main challenges of HCIR: how to evaluate the systems we build.

Continue Reading

Looking for volunteers for collaborative search study

on Comments (2)

We are about to deploy an experimental system for searching through CiteSeer data. The system, Querium, is designed to support collaborative, session-based search. This means that it will keep track of your searches, help you make sense of what you’ve already seen, and help you to collaborate with your colleagues. The short video shown below (recorded on a slightly older version of the system) will give you a hint about what it’s like to use Querium.

Continue Reading

How much does time weigh?

on Comments (3)

As Miles wrote yesterday, our paper was accepted to SIGIR 2011. The idea that time has an impact in ranking documents is not new; the problem seems to be to know when to take it into consideration. For example, while Li and Croft showed improvements in ranking when incorporating the notion of recency, we found that the algorithm degrades performance of non-temporal queries. (This is obvious, in a sense: if a ranking algorithm is biased toward more recent documents, and recency is not important for a given query, it will de-emphasize otherwise well-matching documents, thereby reducing MAP.)

Continue Reading

Released: Reverted Indexing source code

on Comments (1)

I am pleased to announce that we are releasing a version of the reverted indexing framework as open source software! The release includes the framework and an implementation in Lucene.

Reverted indexing is an information retrieval technique for query expansion, relevance feedback, and a variety of other operations. The details are described on our web site, in several posts on this blog, and in our CIKM 2010 paper. The source code and JAR file can be downloaded from Reverted Indexing page; see the Javadocs for details of the API.

Continue Reading

Looking for an HCIR intern

on Comments (1)

It’s intern time again! I am looking for someone to help me run an exploratory study of a collaborative, session-based search tool that I’ve been building over the last few months. Session-based search frames information seeking as an on-going activity, consisting of many queries on a particular topic, with searches conducted over the course of hours, days, or even longer. Collaborative search describes how people can coordinate their information-seeking activities in pursuit of a common goal.

The intern for this project will help frame a set of research questions around collaborative, session-based search, and then take the lead on an experiment to gain insight into this rich space and to help understand how to improve our search tool. The intern will also participate in writing up this work for publication at a major conference such as CHI, CSCW, JCDL, etc.

Continue Reading

When is one>two and seven==eight?

on Comments (1)

So Google recently released the Google books N-gram viewer along with the datasets.

There’s been plenty of press about it, and the Science article based on this data is an interesting read.

I was trying to come up with a simple, yet insightful query. My initial trial was modernism,postmodernism which immediately had me wondering about hyphenation or the lack thereof…  In any case, the upshot seems to be that the use of the term postmodernism started 1978ish. Neat, though I think I won’t need to clear space for my Nobel Prize anytime soon.

I toyed a little bit with other terms like generation X which has an odd sort of bump in the graph around 1970. Not sure what’s up with that, though perhaps there’s some data collection artifacting as discussed in this article.  I wasn’t inclined to deep end on this and was happy enough to have my prior knowledge confirmed by noting that the use of “generation X” took off in the mid 1990’s.

My final trial was a bit more on the minimal side: one,two,three,four,five,six,seven,eight,nine,ten. There shouldn’t be any surprise here that “one” is more common than “two” is more common than “three”, is more common than “four”. It probably shouldn’t be a surprise that each succeeding number is less frequent by roughly a factor of 2.

Occurence of numbers in google books N-gram viewer

Google books n-gram viewer for numbers

Less intuitive (to me anyway) is that “ten” squeezes in front of “seven” and “eight” (OK, so maybe it’s a round number), “seven” and “eight” are basically tied, but even more odd is that before 1790 or so, the putative occurrence of “six” and “seven” were virtually non-existent.

Detail on number occurrences

Turns out it appears to be the same issue with the “medial S” that Danny Sullivan describes in greater detail in his post. In other words, it’s an artifact of OCR and an indication of the evolution of typography rather than the evolution of language.

One mystery solved; now why are “seven” and “eight” tied in frequency?

Kudos to Google for releasing the viewer and data.