How to give up on reviewing

on

Angst turns to anger to acceptance (of your lot, if not of your paper). Yes, it’s the CHI 2010 rebuttal period. A short few days to try to address the reviewers’ misreading of your paper before the program committee throws it into the reject pile, requiring you to rewrite it for another conference. While it is easy to find fault with the process that puts one or more person-years’ of work into the hands of “three angry men” who may or may not be in a position to judge the work fairly, it is not clear how to improve the process. James Landay recently wrote about the frustrations of getting systems papers accepted, and in a comment on that post, jofish pointed out that the concerns apply more widely because CHI consists of many small communities with different methodological expectations that are not necessarily appreciated by reviewers.

So how could you make this process more reliable? How could you make reviewers more accountable? One possible solution is to publish the reviews in addition to the paper, thereby bringing public attention to an otherwise private process with poor accounting for performance. To avoid conflict over paper reviews from escalating beyond the paper in question (which may well happen when people’s tenures are involved), reviews would need to be anonymous. Unfortunately, it is unlikely that the reviewers would remain anonymous if enough people see the reviews. Thus it seems improbable that any systematic solution of publishing reviews as a side effect of the current review process is possible.

The crux of the problem, as I see it, is that there aren’t enough qualified, motivated reviewers for  most peer reviewed submissions at CHI, and elsewhere. One key reason for this, of course, is that the only reward for good reviewing is more reviewing. Thus instead of increasing the pool of qualified, competent reviewers, the existing system is forced to resort to large numbers of less-capable (and perhaps less motivated) reviewers, resulting in a rather noisy review process that often makes incorrect decisions, and  occasionally quite spectacularly bad ones.

There remains the possibility of public, signed (or anonymous) reviews that characterize the publish-then-filter model. The scheme works like this: a paper is published on some open-access site such as arxiv.org. People read it and comment on it. Other people rate the comments. In the end, you have a set of high-quality comments that comprise the review of the paper. Authors are free to add new versions that address some of the comments, thereby improving the paper. A subset of these papers that receive positive reviews can then be selected for presentation at conferences. Reviewers who consistently write good reviews should be rewarded by recognizing another class of contribution in the tenure review process. Applied more generally, this approach should also shift credit for contributing to the field from journal editors to journal reviewers, and may shift the “brand” associated with quality scholarship from the journal to the reviewer.

Another advantage of this approach is that it dissociates the authorship of an idea from the vetting process. It admits a more graduated notion of importance than the existing filter-then-publish model permits. The existing model rejects as unpublishable (and therefore un-credited) ideas that have certain real or perceived flaws. But while the flaws may have nothing to do with the core idea of the paper, lack of publication denies authors credit for those ideas, and denies the community an opportunity to correct inaccurate impressions. Finally, to address James Landay’s point, it allows different communities to recognize the value of specific papers without the overhead of having to establish separate peer review processes. This is particularly useful for a field that is characterized by so many discipline-bridging efforts.

28 Comments

  1. “People read it and comment on it. Other people rate the comments.” – this is IMO the weakest point of the proposed model. What if not so many people will get interested in your paper and thus will not read it? Say, papers containing complex formulae may demotivate people to read them. And how would you align the commenting activity with conference deadlines?

    I 100% agree that the current reviewing model is deeply flawed, and all said above on it is true. But it makes mandatory for reviewers to provide feedback, and this is something I don’t see in the proposed model. Probably a solution would be a sort of combiation of the two models…

  2. Twitter Comment


    RT @eastgate: RT @HCIR_GeneG: Posted “How to give up on reviewing” [link to post] on the CHI rebuttal period

    Posted using Chat Catcher

  3. Twitter Comment


    RT @HCIR_GeneG Posted “How to give up on reviewing” [link to post]

    Posted using Chat Catcher

  4. @Dmitry, I think to make the proposal practical, the academic community must provide incentives to reviewers just as they provide incentives to authors. Making reviewing an expected–rather than just accepted–part of the academic work practice should provide a larger pool of reviewers; making the reviews public (but not a mechanism for excluding papers) should improve the quality of reviewing.

  5. Another reason to publish reviews is because a lot of conversation can occur between reviewers and authors before a paper gets publishes—information that would add to the body of work. Why don’t reviews get archived like papers do?

  6. Twitter Comment


    Posted “How to give up on reviewing” [link to post]

    Posted using Chat Catcher

  7. Twitter Comment


    RT @HCIR_GeneG: Posted “How to give up on reviewing” [link to post] on the CHI rebuttal period

    Posted using Chat Catcher

  8. Twitter Comment


    Good comments on the academic publishing process. RT @HCIR_GeneG: Posted “How to give up on reviewing” [link to post]

    Posted using Chat Catcher

  9. I think that Gene’s suggestion is a great step in the right direction for the following reason:

    If the point of having review committees is to obtain the best possible work for a conference, then reviews shouldn’t just be about rejecting or accepting papers. Effort should be put into making papers better (obviously not a universal goal at this point, judging from a two-sentence review (including one sentence of summary) recently given on a paper.

    Gene’s suggestion of iteratively reviewing and improving papers starts to get at this – by soliciting feedback and allowing authors to choose from that feedback and utilize it to improve their papers, authors can create the best possible product (at least from their perspective – can’t optimize for everyone).

  10. @brynn, @sanjay: I agree that publishing reviews makes the publication richer. The challenge is to minimize injecting politics and (perceptions of) personal slight into the process. That’s why I think it’s important not to make publication contingent on reviews, but rather to use the reviews a way of recognizing good work.

  11. Twitter Comment


    Another good post on the (academic) review process: RT @HCIR_GeneG: “How to give up on reviewing” [link to post]

    Posted using Chat Catcher

  12. Twitter Comment


    More discussion on academic review process from @HCIR_GeneG ([link to post]) and @landay (http://bit.ly/S7ttr). /via @brynn, @msbernst

    Posted using Chat Catcher

  13. Twitter Comment


    Another good post on the (academic) review process: RT @HCIR_GeneG: “How to give up on reviewing” [link to post] < @brynn

    Posted using Chat Catcher

  14. Great ideas, Gene. I heard similar thoughts from Danah Boyd at HCIC ’09 and agree with the added benefit she cites: that paper submission quality will increase when authors cannot hide behind anonymity. I’d like to take this one step further (and disagree with you!) by suggesting that reviewers also be named. You mention the danger of grudges gained and tenure cases lost (the latter, I presume, less likely?), but what are the benefits? I think reviewer quality will also rise. And, we may discern mentors from students in a crowd. It would be quite similar to responses to blog posts or comments made after paper presentations at conference––in essence, an important opportunity for community-building.

  15. Stacy, my sense and that of a few people I talked to about this is that the review process is too emotionally-laden already; the risk of personal reactions from some (but by no means all) authors likely outweighs the advantages of openness when publication is predicated on positive review. If the two are dissociated, we stand a much better chance of having rational discourse about the merits of the work, and then your points about reviewer quality apply.

  16. James Landay says:

    “People read it and comment on it. Other people rate the comments. ” — what if the “people” aren’t qualified to comment? I worry the broader “public” doesn’t necessarily know what is good or not. That is the problem with CHI already. The values of what good research is are wrong IMHO.

  17. Twitter Comment


    I dislike anonymous review. One should stand by their name. Golvchinsky has right idea. [link to post]

    Posted using Chat Catcher

  18. James, as you point out, the current system of predicating publication on un(der)-qualified reviewers tends to reject certain kinds of work. This process makes false-negative errors. If, instead, we publish first and worry about whether the work is sound later, we will make some false-positive errors.

    Subsequent reviewing and commentary should reduce the false-positive rate by identifying promising (if imperfect) work from the less promising variety. The main challenge to the community is to accept the notion that good reviewing represents a significant intellectual and time commitment that needs to be rewarded.

  19. The point that James makes is one I’ve been considering, myself. Two of the four papers I reviewed/submitted this year received a pair of competing written reviews with a 2 point gap in score (2’s and 4’s, respectively). But, behind the mixed results were four very high quality reviews. I’ll admit that I’m a “newbie” (in Saul’s words) in the community, especially when it comes to evaluating reviews. Even so, each was considerate yet critical, reflected thorough reading, and provided specific examples and citations for guidance.

    So, how can two quality reviews have drastically different comments and scores? My hunch is “ideological incongruences.” One review called for more implications for design, when the accepting counterpart lauded the thick ethnographic descriptions as such. The former considered the software tool the core contribution, the latter the rich user data. Another reviewer wanted reports of a priori design rationale, the other found the designerly approach to a novel system adequate. This, in my opinion, is not a problem with recognition of “significant intellectual and time commitment” (although I can’t say the same for the other reviews–Gene is obviously right that this is a core problem). This situation, however, boils down to a matter of (scientific) values. And, it reflects a need for education, tolerance, and flexibility within the community.

  20. Twitter Comment


    RT @sharoda @landay’s post on CHI/UIST papers and reviewing and @HCIR_GeneG’s related post [link to post]

    Posted using Chat Catcher

  21. Twitter Comment


    .@landay’s post on CHI/UIST papers and reviewing and @HCIR_GeneG’s related post [link to post] (via @msbernst, @nirmalpatel)

    Posted using Chat Catcher

  22. There’s also a problem with calibrating numerical scales (also a huge problem for collaborative filtering, as seen in the Netflix Prize competition). I’ve seen some reviewers give almost everything a 5 and other reviewers give almost nothing a 5. When step 1 is to look at papers with high averages, this different calibration of scores leads to a huge bias.

    Another huge issue is background. Non-experts tend to rank a paper more highly than experts. To experts, papers in their own field tend to look less novel and have more aspects that can be nitpicked.

    SIGGraph has an amazing policy of taking papers and providing feedback before submission. I think image processing folks work harder than the rest of us.

  23. CHI and some other conferences have done feedback before submission through a mentoring program for first-time authors. I don’t know if it still exists.

    I agree that simply computing average scores is not appropriate. A more nuanced approach that considers reviewer variance and paper variance (e.g., ANOVA) for understanding the scores is warranted. I think this is a rich area for research. Starting to collect data may make it easier to test different algorithms over time. Furthermore, different applications of these judgments may chose different algorithms for aggregating them.

  24. A major frustration comes from how many unprofessional, uncommitted, and irresponsible reviewers are out there. Many reviewers start with a tendency of _rejecting_ a paper, rather than _evaluating_ a paper. I think this leads to many reviews that only focus on flaws, rather than contributions. How many reviewers out there spend as much time on reviewing a CHI submission as on comprehending and appreciating a published CHI paper? Just look at how many reviewers put aside the duty until the deadline. Maybe most only spend a hour on a paper before throwing their critiques. With such a short time, of course, it’s easier to find flaws than really interpreting what the authors want to contribute.

  25. Tao, I think this speaks again to the lack of incentives for quality reviewing, coupled with pressure from conferences based on space constraints at the venue, and based on low acceptance rates that conferences use to justify their existence. With the decline of paper proceedings (and its publishing constraints), it should be easier to relax acceptance rates.

  26. Gene, not only lack of incentives, but also lack of _penalties_ for doing bad/irresponsible/unprofessional reviews. I think I am with a previous suggestion that publishing reviewers’ identities and reviews. I totally understand why the double-blind policy is taking right now, but to some degree, it allows for the opportunity to be unprofessional.

  27. Lars Erik Holmquist says:

    Gene: You and others here propose various alternatives for refining the anonymous review system, and I’m kind of surprised nobody seems to have noticed this was already done at CHI and UBICOMP several years ago! Myself and Barry Brown introduced what we called the Open Session at UBICOMP 2006 and then re-used the concept at CHI 2007 under the alt.chi name. The idea was to have completely open submissions and reviews in a sort of Wiki model. We also required all submittors to review a certain amount of the other papers. It worked…… OK I guess. There certainly were several interesting papers presented that would never have got through the regular CHI process. But it also exposed a lot of the inherent weaknesses in this system, e.g. that the number of reviews is not necessarily an indicator of a paper’s quality, and that it is hard to get reviewers as well as contributors to commit to an untested format. Overall it was an interesting experiment but more useful for the ideas than the actual results. This year CHI has scrapped the open review process and apparently are experimenting with a “juried” process for alt.chi, which sounds like it could work just as well.

    However, from what I understand the open submission/review process is by now well established in some other fields, including physics. Even the venerable Nature got in on the game:
    http://www.nature.com/nature/peerreview/debate/nature04992.html

  28. Lars, thanks for pointing this out! Has anyone published an analysis of the strengths and weaknesses of the alt.chi review process as you and Barry ran it?

    Also, I don’t mean to claim credit for these ideas — there is nothing new under the sun. I am aware of the physics/math review processes and the notion of overlay journals. My thought is that the CHI community should take a good look at this means of vetting research, and that a lasting solution to these problems has to involve more formal recognition of reviewing, good and not so good.

Comments are closed.