Citing Best Papers

on

Jeff Huang recently published a list of papers from several major conferences that won Best Paper awards. It’s a nice collection of papers, highlighted in a way that is difficult to obtain from the ACM Digital Library. (Why that should be the case is a different story.)

Clearly winning a Best Paper award is a significant achievement and authors of such papers should be proud of their work. But does this merit translate into impact? For example, do papers that win Best Paper awards get cited more frequently than other papers from the same conference?

I wish I had access to the ACM metadata for the papers in question, as it would make a larger-scale analysis of citation rates straightforward. As it was, I resigned myself (again) to some manual analysis: I recorded the within-ACM citation counts for all full papers published in the SIGIR 2001-2005 conferences, as shown in Figure 1. (The data itself is here.)

Figure 1. Citation counts for full papers for SIGIR 2001-2005

First, the general shape is not surprising: a few papers receive a lot of citations, and the rest trail off. Even though the number of papers accepted varied from year to year (46, 44, 46, 58, 71), the range of citations was quite comparable, so I didn’t bother normalizing the data.

So where do the best papers lie? For the years 2001-2005, the best paper winners were cited 39, 68, 41, 39, and 36 times, respectively. This corresponds to ranks 17.5, 3, 13, 9, and 9. This is summarized in the following table:

Year Papers Rank of best
paper
% of rank # citations % of max
2001 46 17.5 38% 39 17%
2002 44 3 6.8% 68 63%
2003 46 13 28% 41 28%
2004 58 9 16% 39 39%
2005 71 9 13% 36 23%

This is not enough data to make strong conclusions, but it seems like the award winners achieve on average about 1/3 the citation rate of the best-cited paper in a conference, and rank, on average in top quintile.

In another attempt to visualize the data, I subtracted the citation count of the best paper in each year from all the other papers published that year, to produce Figure 2.

Figure 2. Citation rates by year offset by the citation rate of the Best Paper award winner

In this chart, we can see that 2002 was significantly better than the other years in predicting the best paper award winner (most of the papers were cited less), but the other years were largely indistinguishable.

I suppose there are many reasons that a Best Paper award winner does not get high citation rates:

  • some of this is due to chance (perhaps the high-cited papers were also considered for the award)
  • some may be due to innovation (ground-breaking work may not get as much recognition as solid work in an established area)
  • some may be due to selection bias within the nominating committee that may consider certain papers to be flawed or otherwise unacceptable, while the public at large may recognize something important in the paper anyway
  • How well-attended a paper presentation is may also make a difference in generating buzz about the work.

The upshot, however, is that it’s quite difficult to predict the real impact of a paper at the time it is published. It might be interesting for specific communities to hold reviews of papers published five to ten years ago to assess the impact retrospectively. This should, of course, involve more than citation counts: it is important to look at the reasons papers get cited. A measurement that aggregates  the degree of importance of a paper to the papers the cite it might approximate this longer-term impact, although such a measure might be quite subjective.

6 Comments

  1. The best paper might not be the most widely cited.

    For example, flagship papers for a widely-used system will be cited a lot, even if that system is an incremental advance on a predecessor or simply takes well-understood principles and deploys them in a well-engineered package.

    At times, really important work appears in a paper that’s flawed. The Best Paper committee sees the flaws, but people writing the citation only remember the good part. Nobody cites [Bush 45] for the pages on photography, but rather for the paragraphs on hypertext.

    Sometimes, a bad paper will be cited a lot because it advocates a position that becomes popular. Consider various “electronic books are bad for you” papers we’ve all reviewed for program committees over the years; some that barely made it into the conference get cited a lot because the pundits cite them for political reasons and everyone else then needs to cite them in the course of refutations.

  2. Another reason for best papers not to get cited is when they “close” a field, rather than opening a new direction.

    I noticed a few best papers that essentially settle an open question, making any further incremental contribution rather meaningless. So, people move on to other directions.

    So, I suspect (but cannot qualify) that a first paper in an area gets a lot of citations because it is relatively easy to build better techniques on top of the idea, but an advanced paper will get fewer citations, even if the advanced is more technically sophisticated and better than the original paper.

    In a sense, my belief is that citations reward new ideas more than complete ones. Which has both upsides and downsides.

  3. I agree with both of these explanations. I would be good to be able to record in some manner the reasons for a paper’s popularity of citation, the fact that it won the best paper award, etc., in the ACM DL (and in other DLs) so that when people new to the field come across a paper, they can get a sense whether the community thinks this is valuable and/or important work, and why.

    For example, the Best Paper committee should write up a paragraph explaining its choice, and that should be published on the landing page for that paper. Similarly, papers that are in the two quintile or quartile with respect to citation rates probably warrant a short blurb that explains why the paper is important (or not).

    It might be possible to create this kind of content in a wiki, allowing the community to contribute these annotations. It is important, however, that the commentary be found with the paper, not on some other site that few people will see. I suppose citeseer could provide this capability as well.

  4. 1. ICML has a “best paper from 10 years ago” award.

    2. There’s another uncontrolled effect here. The best papers may very well be more widely cited just because they’re best papers. The reason is that they get linked from blogs and tweets, not to mention the awards ceremony at the conference, so more people are likely to see them. I don’t see program committees going for a random control best paper, though, so I don’t see how to estimate the effect.

    3. As they say, only publish the first or last paper on a topic.

    I don’t think these being “last papers” is what’s going on. I can’t think of a single paper that “closed a topic” in the sense that the whole area was no longer cited, but that’s probably just my lack of imagination. Panos — what examples were you thinking of?

    I’ll conjecture the exact opposite: last papers in a field are cited the most. Everyone knows what to cite for their work in this case, whereas all the stuff leading up to it is piecemeal and you don’t have room for dozens of citations.

    I think Mark’s comments are more interesting in that they tease apart different kinds of contributions. I’d add benchmark performance papers that everyone tries to beat (until they’re no longer the “benchmark”) [this is usually also a crime against statistics, but that’s another story].

    Going back to somewhat agreeing with Panos, I found when I wrote books that they started, much to my chagrin, sucking up references to the original works, which I thoroughly cited. Survey papers can play the same role.

  5. Stefano Mizzaro says:

    So, two different notions of “quality” measure two rather different things. This should be told to those people advocating citation-based quality measures — if only they listened ;)

  6. Gene,

    I might have the meta data that you’re looking for. I don’t have best papers recorded in it, but I’ve been using it along with some colleagues at UVA to look at the impact of author gender on citation. (See our upcoming CACM paper for a look at gender in authorship in the ACM, with any luck!)

    David Nyugen and I were talking about having a post-CHI-deadline beers at the Tide House on Villa St in Mountain View from 6pm onwards on Friday. Maybe see you there and we can see if the data I have are useful to you?

    Jofish

Comments are closed.