Why is peer review so random?

Allure 08/13/2018. 9 answers, 8.053 views
peer-review

8 Scientific Papers That Were Rejected Before Going on to Win a Nobel Prize

Funding Analysis: Researchers Say NIH Grant Funding Allocation Seems No Better Than Lottery

The same paper resubmitted to the same journal after several years often ends up rejected due to 'serious methodological errors'

For people whose profession revolves around making order out of seemingly-random observations, scientists sure are inconsistent at judging the work of other scientists. Why? It certainly doesn't seem to be like this at all levels. For example according to the GRE's website,

For the Analytical Writing section, each essay receives a score from two trained raters, using a six-point holistic scale. In holistic scoring, raters are trained to assign scores on the basis of the overall quality of an essay in response to the assigned task. If the two assigned scores differ by more than one point on the scale, the discrepancy is adjudicated by a third GRE reader. Otherwise, the two scores on each essay are averaged.

This implies that it's uncommon for two assigned scores to differ by more than one point on the scale, i.e. GRE essay raters usually agree. Similarly, as far as I know, undergraduate thesis readers, MS thesis readers and even PhD thesis readers don't usually come to diametrically opposed judgments on the piece of work. Yet once it gets to research-level material, peer reviewers no longer seem to agree. Why?

9 Answers


Alice 08/14/2018.

Good question. Hard to answer. Some thoughts:

  • reviewers are not trained
  • reviewers are anonymous
  • reviewers receive minor feedback on their performance
  • reviewers are also authors, competing for the same funds/ prestige
  • reviewers are specialized in a narrow discipline
  • reviewers are volunteers
  • reviewers are scarce
  • the review system lacks an external (independent) control system (audit)
  • reviewers are humans, with their own personal interests, emotions, capabilities

Considering these observations, it is unlikely to expect two review reports to be aligned. Then the difficult decision transfers to the associate editor who is also a volunteer and not specialized in the author’s field.

Leaves the question why it is accepted while outside science this wouldn’t be. Honestly, I don’t know. Just some guesses:

  • Science is a powerful isolated sector with its own rules?
  • The current system works for established research groups?
  • Jourals do not have the funding to train and attract qualified professionals/scientists as reviewers?
  • There is no easy solution or alternative?

Added based on comment: - reviewers are busy scientists - reviewers are career-wise not rewarded for conducting reviews


Ian Sudbery 08/13/2018.

The biggest difference is that up to PhD thesis level, the person doing the assessing is more of an expert than the person being assessed. In almost all these cases there is an agreed set of standard skills, techniques and knowledge that any assessor can be expected to posses and any assessee is being measured against.

This isn't so true of a PhD thesis, but in the end once a supervisor/thesis committee has green lit a student, almost all PhD theses are passed.

It's definitely not true higher up. In almost all cases the person being reviewed will be more of an expert in their work than anyone doing the reviewing. The only exceptions will be direct competitors, and they will be excluded. We are talking right at the edge of human knowledge, different people have different knowledge and skill sets.

I'm quite surprised that the GRE scores are so consistent. Its long been known that essay marking is pretty arbitrary (see for example Diederich 1974[1]). Mind you 1 mark on a 6 mark scale is 15% - a pretty big difference. In our degree a 70 and above is a 1st class degree - the best mark there is, whereas 55 is a 2:2, a degree that won't get you an interview for most graduate jobs. Losing 15% on a grant assessment will almost certainly loose you the grant.

But even to obtain this level of consistency, the graders must have been given a pretty prescriptive grading rubric. In research, no such rubric exists, there are not pre-defined criteria against which a piece of research is measured, and any attempt to lay one down would more or less break the whole point of research.


Ray 08/13/2018.

With respect to the good papers being rejected problem, a factor that doesn't seem to have been mentioned yet is that the consequences of accepting a bogus paper are much worse than those of rejecting a good paper. If a good paper is rejected, it can always be resubmitted to a different journal. And if the authors first revise according to the reviewer comments, the version that ends up getting published may well be better written than the one that was rejected. All that's lost is time.

But if a bogus paper is accepted, other scientists may see it in the literature, assume its results to be valid, and build their own work upon it. This could result in significant lost time on their part, as experiments that depend on the bogus result don't work out as they should (which at least may lead to the bogus paper being retracted if the errors are bad enough). Or maybe they'll avoid researching along a line that would have worked, because the bogus paper implies it wouldn't, or worse, they'll end up with inaccurate results themselves and end up putting another paper with bad data into the literature. All of these are far worse outcomes than just needing to resubmit a paper, so false negatives are preferred to false positives when reviewing.


aeismail 08/13/2018.

Different tasks, different results.

All the GRE graders have to do is assign scores but they are doing so to dozens or hundreds of essays. They receive clear guidance and examples about what score given essays should probably receive. So it’s basically checking boxes to justify a small set of results.

A peer review analysis is fundamentally different since you’re asking for a much more technically difficult task. They have to evaluate if the analysis is accurate, not if it’s responsive to a prompt. There’s no set of examples to draw on either. So the focus of peer review can be very different for different reviewers who may have different sets of expertise and certainly will have their own points of view.


Pete L. Clark 08/13/2018.

To compare academic peer review to GRE grading -- that makes apples and oranges look all but identical. Let's step a little closer:

Similarly, as far as I know, undergraduate thesis readers, MS thesis readers and even PhD thesis readers don't usually come to diametrically opposed judgments on the piece of work.

That is certainly not always true and highly field dependent. In certain parts of academia it is a standard grad student horror story that Committee Member A insists that the thesis be cast in terms of Theoretical Perspective X, while Committee Member B insists that the thesis be cast in terms of Theoretical Perspective Y, where X and Y may be intellectually incompatible or sociologically incompatible: i.e., each theory has rejection of the other as a central tenet. This is more common in humanities where the nature of "theory" to the rest of the work is rather different, but it is not unheard of in the sciences either.

As a frequent committee member, I also happen to know that coming to a consensus judgment is a sociological phenomenon as well as an intellectual one -- i.e., some differences in judgment are limited only to the private discussion following the defense and other differences in judgment are never verbalized at all.

This is helpful in understanding the disparity in peer review: in peer review, the different referees are (in my experience, at least) never in direct communication with each other, and in fact may not be seeing each other's verdicts at all: as a referee, I believe that I have never been shown another referee report. In fact,

Who watches the watchmen?

There is no aspect of the academic process that makes me feel like a lone masked vigilante more than being a referee. Surely people who do GRE grading go through some lengthy training process of repeated practice evaluating, feedback on those evaluations, discussion of the larger goals, and so forth. There is nothing like this for academic referees. We get no practice, and there is very little evaluation of our work. If I turn in what is (I guess!) an unusually comprehensive report unusually quickly, I will often get a "Hey, thanks!" email from the editor. In the (thankfully rather small) number of instances where my referee reports were months overdue, I either heard nothing from the editors (I am ashamed to say that once I figured out on my own that a paper I thought I had had for a few months had actually been an entire year) or got carefully polite pleas for me to turn in the report. I have never gotten any negative feedback after the fact. Unlike GRE graders, referees are volunteers.

I find (again, in my experience and in my academic field of mathematics) that referees are almost never given instructions that amount to any more than "1) Use your best judgment. 2) We are a really good journal and want you to impose high standards." I also notice that 2) is said for journals of wildly differing quality. What does it mean to "impose high standards"? I take that directive seriously and fire my shots into the dark as carefully as I can, but....of course that is ridiculously, maximally subjective.


WBT 08/13/2018.

Contributing a point beyond other answers:

Different levels of effort going into the review leads to different outcomes.

Papers are often written such that on a first pass read, it's supposed to read "pretty good" even if a more critical deep read and/or check of references would expose gaping holes, serious methodological issues, and alternative explanations for the results observed. Sometimes, an even more-effortful review can find that these issues don't actually matter in the particular case applicable to that specific paper (though the author should generally add this to the paper text itself).

While reviewers are incentivized to do a good job by the general knowledge that the system depends on that, specific instances are generally not incentivized and reviews sometimes get left to the last minute with a reviewer who's short on sleep and long on other tasks, who doesn't put in the effort for a good review. Thus, the result could be very different than even the same paper getting reviewed by the same reviewer at a different time. With no visibility into the factors affecting that outcome, it seems random.


Anony-Mousse 08/13/2018.

Submission overload

We write more and more, and the typical submission quality seems to be going down. This has various reasons, including bad incentives in particular in China. If your salary directly on the papers accepted, quantity beats quality...

IMHO we are close to a tipping point now. Many of the expert reviewers refuse almost any reviewing request - because so many submissions are so sloppy, that it's quite annoying to review them. It should be different: most submissions should be so high quality that you enjoy reading that and can focus on the details. So more and more experts are just annoyed. They delegate more of the reviewing to students, or simply refuse. But that now means the remaining reviewers get more requests, and more bad papers. This can tip quickly, just like most ecosystems.

So the editors need to find other reviewers, and we get less and less expert reviewers. This also opens doors to scams and schemes. Multimedia Tools and Applications for example seems to have fallen prey to editor and reviewer manipulation scheme.

So what's the solution? I don't know.

  • Make the handling editor and the reviewer names public and thus accountable on accepted papers - this used to be quite common; expert reviewers tend to stand publicly to their reviews if they accept the paper (this used to be an actual endorsement of the work in some fields). This makes corruption and scheming easier to discover with modern analysis. But this will likely just make it harder to find expert reviewers...
  • Do the first review only with one reviewer (to reduce the load on the reviewers), possible outcomes "early reject" and "full review".
  • Require authors to do 5 reviews per accepted publication (so the experts cannot just stop reviewing completely, unless they retire - first publication is "free", but you cannot keep on rejecting review requests)?
  • Actually pay the expert reviewers! Once you are confident that a submission is worth it, that should be an option given Elseviers absurd profit margin. I believe the main problem here is the bureocracy involved here (who decides if a review is high quality) and the different wage costs in different countries. Nevertheless, combined with above pre-review, this will increase the "worth" of good reviews. But this, in turn, increases the risk of schemes to make money from this...
  • Review reviews, and give out best reviewer awards.
  • Punish repeated rejects with a delay to make spamming costly.
  • Put a limit on the number of submissions per author and year (probably will just mean the submissions go elsewhere - so it may help a particular journal, but not the entire community)?
  • Ban financial incentives - if your country/university has such a direct financial incentive, you aren't allowed to submit to certain top journals. Then these bad incentives will be quickly abolished because they tend to also ask for top journals. But it may just make the payments become delayed or hidden... I don't know. I am not an expert on policy. These are just some ideas.

Buffy 08/13/2018.

This won't really answer your question, I realize, but I'd like to address your first example - rejected papers that later led to Nobel prizes.

Sometimes a piece of work is Frame Breaking and it leads to a Paradigm Shift within a field. This has happened many times in history, since at least Copernicus and Galileo. Einstein's early work on relativity was rejected among the physics/astronomy hoi oligoi as it was too different from the belief in the Aether at the time. The most prominent members of the field reject a radically new idea and their students, who are pervasively represented usually go along.

It has been said that revolutions in physics require the death or retirement of the most respected researchers so that the ideas of the young can get a fair hearing and come to the fore.

That is in fact an explanation of at least some of the eight papers referenced in your first link.

I don't think that many of us write paradigm changing papers, but it occasionally happens. The truly brilliant (not guilty) among us often must labor in near silence and obscurity for most of a generation. The next generation may celebrate them, or it may take even longer.

When a reviewer is faced with a truly frame breaking paper they, by definition, have no frame of reference in which to evaluate it. It is orthogonal to their entire way of thinking. "This must be nonsense", is the too-natural response.


Wrzlprmft 08/14/2018.

To address the aspect of:

The same paper resubmitted to the same journal after several years often ends up rejected due to 'serious methodological errors'

In about one third of the papers I reviewed, I identified fundamental flaws that could not be addressed by revising the paper (you would have to write a new paper instead). Some examples just to give you a taste:

  • The entire analysis was a self-fulfilling prophecy, i.e., the result was an assumption.
  • An proposed characteristics was a roundabout way of measuring some much more trivial property. (This happened twice already.)
  • A proposed model ignores the dominant mechanism for what it is supposed to model.
  • The entire study was about understanding an artefact of a well-known beginner’s mistake (without identifying that mistake).

While I may have been wrong about these things, the authors never addressed my concerns, be it in a rebuttal or version of the paper published in another journal (which never happened in most of these cases).

Now these issues may seem like they should have been easy to spot, but evidently they weren’t: I only spotted some of these flaws during writing up the actual review, and I witnessed (and performed) quite a few jaw droppings when discussing papers with co-reviewing colleagues¹ whom I knew to be thorough. Also, in some cases I saw reports of other referees who were otherwise exhaustive but did not spot the issues.

So, to summarise: Even spotting fundamental flaws in a paper can be very difficult. A given reviewer only has a comparably small chance to spot a flaw in a given paper. Therefore there is a considerable chance that all of the reviewers fail.


¹ Yes, that’s a thing in my field and fully accepted by the journals.


HighResolutionMusic.com - Download Hi-Res Songs

1 The Chainsmokers

Beach House flac

The Chainsmokers. 2018. Writer: Andrew Taggart.
2 (G)I-DLE

POP/STARS flac

(G)I-DLE. 2018. Writer: Riot Music Team;Harloe.
3 Anne-Marie

Rewrite The Stars flac

Anne-Marie. 2018. Writer: Benj Pasek;Justin Paul.
4 Ariana Grande

​Thank U, Next flac

Ariana Grande. 2018. Writer: Crazy Mike;Scootie;Victoria Monét;Tayla Parx;TBHits;Ariana Grande.
5 Clean Bandit

Baby flac

Clean Bandit. 2018. Writer: Jack Patterson;Kamille;Jason Evigan;Matthew Knott;Marina;Luis Fonsi.
6 BTS

Waste It On Me flac

BTS. 2018. Writer: Steve Aoki;Jeff Halavacs;Ryan Ogren;Michael Gazzo;Nate Cyphert;Sean Foreman;RM.
7 Imagine Dragons

Bad Liar flac

Imagine Dragons. 2018. Writer: Jorgen Odegard;Daniel Platzman;Ben McKee;Wayne Sermon;Aja Volkman;Dan Reynolds.
8 Nicki Minaj

No Candle No Light flac

Nicki Minaj. 2018. Writer: Denisia “Blu June” Andrews;Kathryn Ostenberg;Brittany "Chi" Coney;Brian Lee;TJ Routon;Tushar Apte;ZAYN;Nicki Minaj.
9 Brooks

Limbo flac

Brooks. 2018.
10 BlackPink

Kiss And Make Up flac

BlackPink. 2018. Writer: Soke;Kny Factory;Billboard;Chelcee Grimes;Teddy Park;Marc Vincent;Dua Lipa.
11 Diplo

Close To Me flac

Diplo. 2018. Writer: Ellie Goulding;Savan Kotecha;Peter Svensson;Ilya;Swae Lee;Diplo.
12 Rita Ora

Velvet Rope flac

Rita Ora. 2018.
13 Fitz And The Tantrums

HandClap flac

Fitz And The Tantrums. 2017. Writer: Fitz And The Tantrums;Eric Frederic;Sam Hollander.
14 Little Mix

Woman Like Me flac

Little Mix. 2018. Writer: Nicki Minaj;Steve Mac;Ed Sheeran;Jess Glynne.
15 Billie Eilish

When The Party's Over flac

Billie Eilish. 2018. Writer: Billie Eilish;FINNEAS.
16 K-391

Mystery flac

K-391. 2018.
17 Cher Lloyd

None Of My Business flac

Cher Lloyd. 2018. Writer: ​iamBADDLUCK;Alexsej Vlasenko;Kate Morgan;Henrik Meinke;Jonas Kalisch;Jeremy Chacon.
18 Backstreet Boys

Chances flac

Backstreet Boys. 2018.
19 Lil Pump

Arms Around You flac

Lil Pump. 2018. Writer: Rio Santana;Lil Pump;Edgar Barrera;Mally Mall;Jon Fx;Skrillex;Maluma;Swae Lee;XXXTENTACION.
20 Kelly Clarkson

Never Enough flac

Kelly Clarkson. 2018. Writer: Benj Pasek;Justin Paul.

Related questions

Hot questions

Language

Popular Tags