Blog Author: Jeremy Pickens

SIGIR Papers Announced


The complete list of accepted SIGIR papers were announced yesterday:

I think there is a much larger diversity this year in topics, a trend that has been growing in recent years.  In fact, the only topic with more than a single session is clustering. A couple of titles that personally look intriguing include:

Assessing the Scenic Route:  Measuring the Value of Search Trails in Web Logs
White Ryen (Microsoft Research Redmond), Huang Jeff (University of Washington)

Relevance and Ranking in Online Dating Systems
Diaz Fernando, Metzler Donald, Amer-Yahia Sihem (Yahoo! Labs)

Comparing User Preferences, for Relevance and Diversity with Test Collection Outputs
Sanderson Mark, Paramita Monica Lestari, Clough Paul, Kanoulas Evangelos (University of Sheffield)

Evaluating Verbose Query Processing Techniques
Huston Samuel, Croft Bruce (University of Massachusetts Amherst)

In particular, the question of “going the scenic route” is one that deserves much more study.  Information Retrieval is most often concerns with effectiveness and efficiency.  The straight path to relevance.  As well it should be.  But there are other valuable goals that are just as much a part of information seeking such as serendipity, diversity, and, well, scenery.  It becomes interesting, and difficult to evaluate, when the goal rather than the process is exploration.

When Web Apps Aren’t

on Comments (2)

One of the ongoing debates I have with some of my co-workers are whether web apps are going to take over the majority of applications that users interact with on a daily basis, or whether the future will remain in the hands of internet-enabled desktop apps. I maintain that desktop apps with integrated connectivity are the future.  Many of my co-workers place their trust in software that only runs in the cloud.

So what is a web app? Continue Reading

SIGIR Reviews as Pseudo-Relevance Feedback

on Comments (18)

Some ACM conferences such as CHI offer authors an opportunity to flag material misconceptions in reviewers’ perceptions of submitted papers prior to rendering a final accept/reject decision. SIGIR is not one of them. Its reviewers are free from any checks on their accuracy from the authors, and, to judge by the reviews of our submission, from the program committee as well.

Consider this: We wrote a paper on a novel IR framework which we believe has the potential to greatly increase the efficacy of interactive Information Retrieval systems. The topic we tackled is (not surprisingly) related to issues we often discuss on this and on the IRGupf blog, including HCIR, Interactive IR, Exploratory Search, and Collaborative Search.  In short, these are all areas that could be well served by an algorithmic framework that supports greater interactivity.

Continue Reading

What If Everyone Were Number One?


I’ve been doing a bit of thinking lately about search engines, algorithmic openness, and spammers.  I suppose this was all prompted by a blog post recently on the Meaning of Open:

In this post, it is claimed that openness is good: open systems, open source, open data.  This claim is held forth as true…for everything except for search algorithms.   In the case of algorithms, the secret sauce must be kept exactly that: secret.  Spammers would otherwise have too much power.

That claim makes me want to play around with a little thought experiment.  What if the search algorithm were indeed fully open?  What if everyone in the world knew exactly how rankings were done, and could modify their web pages so as to adapt themselves to whatever the ranking function is.  In short, what if everyone were number one?  Continue Reading

Should IR Objective Functions be Obfuscated?

on Comments (3)

I have a question. It’s a general question, directed at anyone and everyone.

When one is building an Information Retrieval system, one uses target objective function(s) that give an indication of the performance of the system, and designs the system (algorithms, interfaces, etc.) toward those targets.  Sometimes, those functions are open and well understood.  Other times, those functions are proprietary and hidden.

My question is: Does it do the users of an IR system a service or disservice to hide from them the function that is being optimized?  Or is it completely neutral?  In other words, does the user have to understand, or at least be given the chance to understand, what it is that the system is trying to do for them in order to get the best value out of that system?  Or can a user get results just as good without having to have a clear mental model of what the retrieval engine is trying to do?  In short, does it matter if the user does not understand what the system is trying to do for him or her?

Can someone point me to research that may have looked at this question?  If one were trying to publish original research on the topic, how would one go about designing an experiment in which both (1) this hypothesis is tested, and (2) done so in a way that generalizes, or at least hints at possible generalization?

Data Liberation: What do you Own?

on Comments (3)

Recently Google announced a new initiative: The Data Liberation Front:

The Data Liberation Front is an engineering team at Google whose singular goal is to make it easier for users to move their data in and out of Google products.  We do this because we believe that you should be able to export any data that you create in (or import into) a product. We help and consult other engineering teams within Google on how to “liberate” their products.  This is our mission statement: Users should be able to control the data they store any of Google’s products. Our team’s goal is to make it easier for them to move data in and out.

This is a fantastically worthy goal, and I whole-heartedly applaud it.  However, I am beginning to wonder: What data is yours to own, in the first place?

For example, consider web searching.  Continue Reading

Google Wave: Explicit Collaboration

on Comments (2)

Just announced is an interesting new platform from Google, around shared collaboration environments.  Explicitly-shared environments.

A “wave” is equal parts conversation and document, where people can communicate and work together with richly formatted text, photos, videos, maps, and more…Here’s how it works: In Google Wave you create a wave and add people to it. Everyone on your wave can use richly formatted text, photos, gadgets, and even feeds from other sources on the web. They can insert a reply or edit the wave directly. It’s concurrent rich-text editing, where you see on your screen nearly instantly what your fellow collaborators are typing in your wave. That means Google Wave is just as well suited for quick messages as for persistent content — it allows for both collaboration and communication. You can also use “playback” to rewind the wave and see how it evolved.

Now, add a search layer into this rich, shared space, and you’ll have something quite akin to Merrie Morris’ SearchTogether system, which combines real-time awareness with a collaboratively authored results and note set.  Put some algorithmic mediation under that, and you’ll have some of the projects that we’ve been working over the past few years here at FXPAL, which uses real-time actions and behaviors of multiple, explicitly collaborating team members to alter and inform the information that each individual sees.  We think that the ability to put jointly-relevant information on the same real-time page, but also let users explicitly work together in the finding and discovery of that information, is and will continue to be an extremely useful application.

Looks like this space is heating up.

Communicating about Collaboration: Depth of Mediation

on Comments (8)

Thus far in our series on Collaborative Information Seeking we have explored two dimensions: Intent and Synchronization. The next dimension is the Depth at which the mediation (aka support, facilitation) of the multi-user search process occurs.

We can talk about three levels of mediation: communications tools independent of the search engine (e.g., chat, e-mail, voice, etc.), UI-level mediation, and algorithmic mediation. The first level typifies most searching currently being performed on the web, whereas the other two are more commonly found in research prototypes. Continue Reading

Social Search Redux

on Comments (7)

A week or so ago, we wrote a post on Social Search, and how (we believe) it is different from Collaborative Search.  We have also begun laying out a taxonomy of the various factors or dimensions that characterize information seeking behaviors involving more than one person.  So far, we have listed two dimensions: Intent and Synchronization.  We will continue with two additional dimensions over the next few weeks: Depth and Location.

But in the meantime, we note that Intent and Synchronization already give us enough material to draw descriptive and discriminatory lines between various types of multi-user search.

Continue Reading