Monday, April 25, 2016

Misperception of incentives for publication

There's been a lot of conversation lately about negative incentives in academic science. A good example of this is Xenia Schmalz's nice recent post. The basic argument is, professional success comes from publishing a lot and publishing quickly, but scientific values are best served by doing slower, more careful work. There's perhaps some truth to this argument, but it overstates the misalignment in incentives between scientific and professional success. I suspect that people think that quantity matters more than quality, even if the facts are the opposite.

Let's start with the (hopefully uncontroversial) observation that number of publications will be correlated at some magnitude with scientific progress. That's because for the most part, if you haven't done any research you're not likely to be able to publish, and if you have made a true advance it should be relatively easier to publish.* So there will be some correlation between publication record and theoretical advances.

Now consider professional success. When we talk about success, we're mostly talking about hiring decisions. Though there's something to be said about promotion, grants, and awards as well, I'll focus here on hiring.** Getting a postdoc requires the decision of a single PI, while faculty hiring generally depend on committee decisions. It seems to me that many people believe these hiring decisions comes down to the weight of the CV. That doesn't square with either my personal experience or the incentive structure of the situation. My experiences suggest that the quality and importance of the research is paramount, not the quantity of publications. And more substantively, the incentives surrounding hiring also often favor good work.***

At the level of hiring a postdoc, what I personally consider is the person's ideas, research potential, and skills. I will have to work with someone closely for the next several years, and the last person I want to hire is someone sloppy and concerned only with career success. Nearly all postdoc advisors that I know feel the same way, and that's because our incentive is to bring someone in who is a strong scientist. When a PI interviews for a postdoc, they talk to the person about ideas, listen to them present their own research, and read their papers. They may be impressed by the quantity of work the candidate has accomplished, but only in cases where that work is well-done and on an exciting topic. If you believe that PIs are motivated at all by scientific goals – and perhaps that's a question for some people at this cynical juncture, but it's certainly not one for me – then I think you have to believe that they will hire with those goals in mind.


At the level of faculty hiring, the argument is similar. I have never sat on a hiring committee whose actions or articulated values have been consistent with finding the person with the longest CV. In fact, the hires I've seen have typically had fewer publications than other competing candidates. What we are looking for is instead the most exciting, theoretically-deep work in a particular field. In the committees I've been on, we read people's papers and argue about them in depth; discussing things like whether we got excited, whether they were well-written, or whether they made us fall asleep. Could we read more? Definitely. But we do read, and that reading is the basis for our decision-making. That's because the person we hire will be our colleague, will teach classes for our students, and will be our collaborator. Again, the incentives are towards quality.****

A critic here could argue that the kind of exciting work that we respond to is more often false or wrong. I'd reply that evaluating the soundness of scientific work is precisely what we are trained to do when we watch a talk or read a paper by a candidate. Are there confounds? Are the constructs well-identified? Are the measures good? Is the work statistically sound? All of these are precisely the kinds of questions I and others ask when we see a job talk. When was the last time you walked out of a talk and said, that research was terrible, but I love that there was so much of it? Quality is prerequisite.

Now, the critical point about misperceptions: while hiring decisions are (often) made by people with deep stakes in the outcome, e.g. the potential postdoc advisor or colleagues, observers of the decision almost always have less at stake. So whatever level of engagement the original decision-makers have with the substance of the research – reading papers, reading the literature to get context, asking experts – observers will have less. But observers can still see the CV, which will have some correlation with the actual record of achievement. Hence, observers will likely be biased to use the knowledge the have – the number of publications – to understand the PI or committee's decision, even if the decision was made on the basis of independent, causally-prior criteria (namely the actual quality of the work)

In sum: from an external viewpoint, we see publication numbers as the driver of professional success, but that is – at least in part – because CVs are easy to observe, and scientific progress is hard to assess. But in many cases decision-makers tend to know more and care more about the candidate's actual work than external observers, and so tend to decide more on the substance.

Could we do better? Of course. There are plenty of biases and negative incentives! And we need to work to decrease them. For example, there was a recent twitter discussion of "n-best" evaluations. Such evaluations (considering only n papers) might help committees more explicitly focus on reading a few papers in depth and assessing their impact. What I've tried to argue here, though, is that counteracting the perception that quantity matters more than quality may be important as well. Quality really does matter; it's a shame more people don't know that.

---
* I'm not trying to suggest that scientific publication is perfect. It isn't. I'm not even arguing here that it's unbiased. Just that there is some signal in publication relative to scientific success. Hopefully, that shouldn't be a controversial claim, even for people who are quite skeptical of our current publication model.
** Actually, on this model, grants and awards might be much more biased by CV weight, since A) the consequences for the person doing the granting/awarding are more limited, and B) they are less likely to be expert in the area. And to the extent that these grants and awards are weighed in hiring decisions, this could be an additional source of bias. Hmm...
*** There's plenty to say here about people's ignorance of what good work is. That's a problem! But let's assume for a second that at least someone knows what good work looks like.
**** I actually think it's more or less a threshold model. If you can publish more than N papers (where N is small), then the focus of the committee is, "are they solid and exciting work?" If fewer than N, that typically means the person is not far enough along in their work to be able to evaluate their contributions.

10 comments:

  1. Thanks, Michael, for this interesting and encouraging blog post!

    I really like to hope that you’re right: that the majority of hiring (and funding) decisions are based on a careful evaluation of the candidates’ potential, rather than the number of papers. I am sure that this is how it works a lot of the time - in fact, I got my current post-doc position with “only” two published papers.

    However, there are clear examples that this is not always the case. It may differ, e.g., across countries, but there are definitely some cases where it was the number of papers that determined an (unfavourable) outcome:

    1) A friend of mine got rejected for a university Early Career Researchers fellowship (up to 3 years post PhD). She managed to get some insider info from the selection committee, and found out she was not even considered because she had less than 10 publications.
    2) A relative of mine did not publish during her PhD, mainly because of health problems (caused, at least in part, by working in a “pimp’s” lab). I know her well enough to confidently say that she’s exceptionally intelligent and hard-working. Now she cannot even get an RA job.
    3) Her “pimp” - though I have no sympathy for them - is under pressure, because if they don’t publish a certain amount of papers per year, they get demoted to a part-time position.
    4) Two colleagues from another university quit their permanent jobs in protest, because the university wanted to increase their teaching load, while expecting them to publish the same amount, to manipulate the university’s (calculated) research output.
    5) Taken from a blog I read just the other day (http://blogs.lse.ac.uk/impactofsocialsciences/2016/03/14/addicted-to-the-brand-the-hypocrisy-of-a-publishing-academic/; about impact factors rather than the number of publications): “And very often, the committee members will say something along the lines of “Well, Candidate X has got much better publications than Candidate Y”…without ever having read the papers of either candidate. The judgment of quality is lazily “outsourced” to the brand-name of the journal”.

    It is possible that these are just isolated instances - and I would be very happy if someone convinced me of that. However, even if they are, there is often the perceived pressure: I can’t think of any senior scientist who has given me career advice, who has’t told me that I need to publish a lot to get a job and funding in the future, and that my current output is probably not sufficient. And in many stories that I've heard about "pimps", ECRs are drilled with the credo "quantity matters, not quality". Whether or not this is justified, I think the perceived pressure alone causes a threat to quality of science.

    ReplyDelete
    Replies
    1. Thanks for the comment, Xenia. I'm sorry to hear about these negative experiences. There are always bad actors in any field, as well as some bad incentive structures (I am learning from comments that strict numerical cutoffs are more prevalent outside the US). The one consolation I have is that there tend to be fewer bad actors in academia than outside, in part because the incentives for success are far smaller. :) Most of us do this because it's fun.

      Delete
  2. Isn't there quite a bit of variability across academic systems? Some countries reportedly do require a specific number of publications for junior faculty positions, starting grants, promotion, and so on. My speculation is that that's most clearly the case in countries where bureaucrats or out-of-field faculty play a major role in the decision, but it's hard to know.

    ReplyDelete
    Replies
    1. Yes, agreed - I think this is true and lines up nicely with the idea that the more the decision-makers have a stake in the person, the less numbers matter.

      Delete
  3. This is an interesting question. I think at the extreme, nobody's saying that quantity is the only thing that matters. Of course quality matters, of course any decent hiring committee is going to read publications as part of its process. I think the more interesting argument is, how *much* does quantity matter? And is it getting more weight than it should? That's a harder question, because "how much weight should it get?" is a fairly subjective question.

    When I think about this issue, there are a few things that come to mind. One is how expectations of publication records have grown. When I was in grad school on a search committee, the expectation was that one first-authored paper in a good journal was good enough. My strong impression today is that it would be pretty damn hard to get a tenure-track job at an R1 university with just 1 publication. When I talk to folks who've been around longer than me they report having a similar impression - the bottom-line expectation of quantity has gone up. That is going to have a particularly strong effect on grad students.

    A second thing is that I think quantity probably matters less the further through any selection process you go - maybe because we get more sophisticated about how we make decisions, but also maybe because by that point we've already sucked out a lot of the quantity variance by selecting on it. The cut from 200 initial applications to a "closer read" list of 20 probably looks a lot different from the cut from 3 interviews to 1 job offer. Once you've already made your cuts on quantity, restriction of range and Berkson's paradox and survivorship bias are going to make it look relatively less important. It continues to matter, but its relative weight is probably greatest at the earliest stages of selection. But to someone coming out of grad school, they're (legitimately) worried about making it through that first cut.

    A third thing is that things vary. Like yours, my department also has a pretty strong ethos of reading and discussing papers. One of the things I discovered when I was putting out feelers about that N-best evaluation thing is that how departments make decisions varies widely. Tal Linzen above mentioned some other countries count publications, but I've heard about departments in the U.S. that do the same thing - like, literally having points systems based on numbers of publications in various tranches of journals. And less formally, I hear about people being told that they need X number of publications per year or something like that to have a safe chance at tenure.

    Again, none of this means that we are ignorant of quality. Of course we aren't. Nor should we ignore quantity altogether — as you say, people have to be productive. The question though is, are we giving quantity the right emphasis, and not too much? And maybe it's less a matter of yes vs. no, than a matter of what parts of the decision process we are talking about. Quantity is easier to detect and easier to agree on than quality. My own hunch is that the places where we're most at risk of overweighting it are where we are pressed for time and attention (like first cuts) or where decision-makers have less expertise to evaluate quality (like as you move from committee to department to dean's office). If you're further along in the process, or if you've got enough quantity to satisfy those decision-makers, then it can feel like quality matters more. But that may not be equally true for everyone at all times.

    ReplyDelete
  4. Hiring professors is something I've been involved in here in Germany, and one thing that should not be undervalued is the personality of the candidate---will people be able to work with them? A socially inept hire can cause havoc in a department even if he/she brings in millions of Euros in funding or tons of publications.

    In Germany, we get about 30-40 minutes with each candidate called in for a talk, and it's almost impossible to get a feeling for the person that quickly (however, I have evolved tricks to get a quick Stichprobe).

    Another thing worth mentioning is that the "branding" matters. A potential hire coming from Stanford will be taken much more seriously a priori than from a no-name university in the middle of nowhere. The priors are set by such cues and these are very hard to override with actual data on the person.

    I see this most dramatically in the reviewing process for funding; as a reviewer, I have seen funding agencies automatically fund a researcher who's too big to fail, even after a reviewer identified serious errors in their papers. Similarly, in hiring, after a certain threshold of achievement has been reached (and this can be as much a random walk to the top as a talent-driven rise), you will be golden no matter what. There can come a point when the science and quality ceases to matter.

    ReplyDelete
    Replies
    1. I agree about branding. That's a tough problem and there has been some interesting research on this:

      http://advances.sciencemag.org/content/1/1/e1400005.short

      Delete
  5. Glad to hear that the hiring committees that you served on were doing such a great job. Unfortunately, your experience may not be representative of other committees. I have talked to many senior researchers who regularly serve on hiring committees and almost all of them agree that quantity of publications and rank the journals published in are the most important factors determining whether you make it on the short list. Given that departments are typically inundated with large numbers of applications, that's actually not surprising because no one has the time to read articles from all those applicants to see how engaging their writing is and how deep their ideas. (How did you solve this problem?) Granted, once a candidate is on the shortlist, publications don't seem to matter so much but that's hardly comforting for those who didn't make it on that shortlist.

    This is how things work in the US, the UK is a different matter because of the REF evaluations. I've seen job ads from UK universities that made it perfectly explicit that publication metrics are the most important factor in the hiring decision. Given that publications in the UK more or less directly translate to money for the universities, this is not surprising either.

    ReplyDelete
    Replies
    1. Screening applicants is indeed a very hard problem. In the searches I've been involved in, they typically try to get two readers for every app, and those folks search for work that catches their interest. It's tough to do that for 40 candidates, but way easier than doing it for 150.

      I'm not trying to argue that quantity of publications and their outlets *doesn't* matter. That would be silly. I also think it's likely that those signals are over the entire field decent but highly imperfect *correlates* of quality research.

      What I'm arguing is that candidates often *overweight* quantity due to availability bias (even over its actual importance), and so they should try to correct for that bias in discussion/evaluation.

      Delete
  6. Isn't there quite a bit of variability across academic systems? Some countries reportedly do require a specific number of publications for junior faculty positions, starting grants, promotion, and so on. My speculation is that that's most clearly the case in countries where bureaucrats or out-of-field faculty play a major role in the decision, but it's hard to know.
    teefury|teefury coupon

    ReplyDelete