“The Natural Selection of Bad Science”

Posted on June 1, 2016 10:30 PM by Andrew

That’s the title of a new paper by Paul Smaldino and Richard McElreath which presents a sort of agent-based model that reproduces the growth in the publication of junk science that we’ve seen in recent decades.

Even before looking at this paper I was positively disposed toward it for two reasons. First because I do think there are incentives that encourage scientists to follow the forking paths toward statistical significance and that encourage journalists to publish this sort of thing. And I also see incentives for scientists and journals (and even the Harvard University public relations office; see the P.P.S. here) to simply refuse to even consider the possibility that published results are spurious. The second reason I liked this paper before even reading it is that the second author recently wrote an excellent textbook on Bayesian statistics which in fact I just happened to recommend to a student a few hours ago.

I have some problems with the details of Smaldino and McElreath’s model—in particular, I hate the whole “false positives” thing, and I’d much prefer a model in which effects are variable, as discussed in this recent paper. But overall I think this paper could have useful rhetorical value; I place it in the same category as Ioannidis’s famous “Why most published research findings are false” paper, in that I agree with its general message, even if it’s working within a framework that I don’t find congenial.

In short, I agree with Smaldino and McElreath that there are incentives pushing scientists to conduct, publish, promote, and defend bad science, and I think their model is valuable in demonstrating how that can happen. People like me who have problems with the particulars of the model can create their own models of the scientific process, and I think they (we) will come to similar conclusions.

I hope the Smaldino and McElreath paper gets lots of attention because (a) this can motivate more work in this area, and (b) by giving a systemic explanation for the spread of junk science, it lets individual scientists off the hook somewhat. This might encourage people to change the incentives and it also gives a sort of explanation for why all these well-meaning researchers can be doing so much bad work. One reason I’ve disliked discussions of “p-hacking” is that it makes the perpetrators of junk science out to be bad guys, which in turn leads individual researchers to think, Hey, I’m not a bad guy and my friends aren’t bad guys, we’re not p-hacking, therefore our work is ok. I’m hoping that ideas such as the garden of forking paths and this new paper will give researchers permission to critically examine their own work and the work of their friends and consider the possibility that they’re stuck in a dead end.

I fear it’s too late for everyone involved in himmicanes, beauty and sex ratios, ovulation and clothing, embodied cognition, power pose, etc.—but maybe not too late for the next generation of researchers, or for people who are a little less professionally committed to particular work.

46 thoughts on ““The Natural Selection of Bad Science””

Paul Smaldino on June 1, 2016 11:03 PM at 11:03 pm said:

Hi Andrew,

Thanks for the kind words! I agree (and I think Richard would as well) that true/false hypotheses and positive/negative results are ultimately dissatisfying dichotomies. Our intention was, as you suggest, to use it as a rhetorical device ala the Ioannidis paper. Perhaps in future work we’ll manage to switch over to using variable effects and continuum of associations more befitting a sort of Bayesian mindset.

-Paul

Reply ↓
- Andrew on June 1, 2016 11:07 PM at 11:07 pm said:
  
  Paul:
  
  It’s funny—I was quoting Ioannidis’s paper for years before I suddenly realized that, to me, the statement, “most findings are false” is meaningless! But, upon further reflection, I felt that Ioannidis’s general point was still valid. It would be worth doing some formal analysis connecting this false-positive, false-negative framework to a continuous model of varying effects, to make this correspondence more explicit. I’ve been thinking about this for awhile but have never gotten around to actually doing it, which is one reason I was happy to see your paper which is the product of real effort!
  
  Reply ↓
  - Paul Smaldino on June 1, 2016 11:17 PM at 11:17 pm said:
    
    I think I know what you mean. The philosopher in me reads Ioannidis (2005) and thinks “Wait, what does it mean for a hypothesis to be TRUE?” But I also find the idea useful. Our intention with this and the previous paper (in PLOS ONE) was to put the main idea of that 2005 paper into a population dynamics context and to consider the community-level consequences. I think there’s still a lot to do in terms of more realistic modeling of the acquisition of scientific knowledge. I’m glad you like the work so far!
    
    Reply ↓
  - LemmusLemmus on June 2, 2016 12:23 PM at 12:23 pm said:
    
    Would you think it’s correct to say “Most published research findings are overestimates”?
    
    Reply ↓
    - Andrew on June 2, 2016 12:41 PM at 12:41 pm said:
      
      Lemmus:
      
      Yeah, for sure. I can’t imagine anyone in a reasonable frame of mind objecting to that statement. Even under the most naive models of research practices, you’ll get this bias in the effect size estimates. After all, if E(theta_hat) = theta, then E(|theta_hat|) > |theta|, it’s just simple mathematics. The real point is not that there’s a bias but that the bias can be huge relative to the size of the underlying effect.
  - Martha (Smith) on June 2, 2016 5:21 PM at 5:21 pm said:
    
    I agree that the “True/False” dichotomy is meaningless. But in many practical situations, there may be a “meaningful effect/no meaningful effect” dichotomy, in which case “false positive” and “false negative” could be useful concepts. (i.e., bringing in the ideas involved in “practical significance” or “difference detectable by the measure being used”)
    
    Reply ↓
Rahul on June 2, 2016 5:08 AM at 5:08 am said:

Andrew:

Why is it a good thing to “let individual scientists off the hook”? Would *you* write, or even recommend publication, of something like the himmicanes paper?

If not, then why hold the himmicanes-authors to a lesser standard?

Reply ↓
- Shravan on June 2, 2016 6:43 AM at 6:43 am said:
  
  Because a lot of the share of the blame goes to the advisors, who led their students down the wrong path. And the advisors of the advisors. It’s hard to figure out who exactly to blame. Might as well focus on constructively trying to fix the broken world of research rather than berating individual researchers who may even not know what’s wrong (e.g. Amy Cuddy doesn’t seem to understand the criticism, probably in part due to the way statistics is taught).
  
  Reply ↓
  - Rahul on June 2, 2016 8:13 AM at 8:13 am said:
    
    I think we can & must hold individuals accountable *without* imputing bad motives.
    
    We should steer clear of accusations of malice or fraud but not shy away from holding individuals responsible for sloppy logic or erroneous methodology. Or for generating irresponsible hype or overly broad conclusions not justified by the data.
    
    Ignorantia juris non excusat. So also, “not having being taught good practices” should not be allowed as an excuse for sloppy scholarship & to let individual scientists off the hook.
    
    If we absolve individuals of responsibility we will find it very hard to ever resolve the problem.
    
    Reply ↓
    - Shravan on June 2, 2016 8:24 AM at 8:24 am said:
      
      The problem is that this doesn’t improve things. People dig in and defend defend defend. It may be more effective to just quietly lead by example, in the hope that eventually critical mass will build up. E.g., I put up my published data online once the paper is published. In psychology and linguistics, I know only one or two other people who do that. Similarly, we can try to do better experimental work, and hope that others will do that too.
      
      Although I can see that Andrew’s vocal criticism has a positive effect too.
  - Martha (Smith) on June 2, 2016 5:11 PM at 5:11 pm said:
    
    Shravan said, “Amy Cuddy doesn’t seem to understand the criticism, probably in part due to the way statistics is taught”
    
    I think a lot of people don’t understand the criticism, as a result of the poor (watered down to the point of being morphed to something very different than it really is) way statistics is often taught. But for some people there is another contributing factor, which came to mind after happening on a radio interview with Amy Cuddy the other day: Some researchers live in a state of denial — the interview sure sounded like Cuddy is in this category. She was essentially advocating power pose to help people “really believe” in what they were advocating.
    
    Reply ↓
    - Shravan on June 3, 2016 12:27 AM at 12:27 am said:
      
      Actually, her slogan in the Ted talk is “Fake it till you make it”, so, if you are unsure about yourself (in your job say), simulate feeling competent and you will eventually be competent, or at least successful. I guess she needs to read the “unskilled an unaware of it” paper. It would not have been so exciting to advocate hard work and diligent study to improve one’s ability; just fake the feeling of confidence.
      
      Maybe she’s just talking about the imposter syndrome; but as I understand it, the person feeling like an imposter is actually competent, they just don’t feel it. This is not the issue she seems to be targeting.
    - Andrew on June 3, 2016 7:51 AM at 7:51 am said:
      
      Sharavan:
      
      No joke, the power pose researchers and the embodied cognition researchers etc. seem to be applying the “Fake it till you make it” principle to their own careers. The difficulty comes when “Fake it till you make it” crashes into another principle: “Saying it don’t make it so.”
    - Martha (Smith) on June 5, 2016 4:25 PM at 4:25 pm said:
      
      “Fake it till you make it” may be OK if you are trying to “make it” in a field like acting or comedy. But for wider applicability, it needs a “Caution!” statement such as “Warning: Do not use if you are a parent, teacher, medical personnel, peace officer, vehicle operator, or in another position of responsibility; or if you care about intellectual honesty or ethics.”
Richard McElreath on June 2, 2016 6:04 AM at 6:04 am said:

Completely agree about the true/false dichotomy. Paul and I struggled with the same issue in our previous paper, where we cited Gelman & Loken on that point.

Also agree about resolving individuals of some of the blame. Most of the time, we shouldn’t impute bad motives to those using bad methods. They were possibly taught that those methods would discover lasting truths about the Universe. Sympathy melded with firm criticism is what’s called for, perhaps.

A related issue is the adequacy of research designs and general approaches for studying contemporary social science and natural science topics. That “Could a neuroscientist understand a microprocessor?” paper from last week (https://biorxiv.org/content/early/2016/05/26/055624) has gotten me thinking of attempting something similar with ordinary social science methods probing a simulated dynamical system. Could a social scientist understand a society, even in principle, using the methods we were all taught in PhD school?

I’m not sure how to approach that problem, but attempting it would help us move past the unsatisfying true/false framework. There can be many “true” and repeatable effects that just don’t teach us anything meaningful about the phenomena of interest, just like brain lesions may not teach us much about cognition.

Reply ↓
- jrc on June 2, 2016 5:35 PM at 5:35 pm said:
  
  What about taking samples from big datasets (Census, CPS, NHANES, DHS, whatever) and running “experiments” on that real-world data? I like papers that show that common methods fail in simulated data, but I think they are even more convincing when you show that they fail in real-world data.
  
  I also think research in this vein has been really influential, but that the field remains strangely immature in terms of what could be done with that kind of setup. One great example of what can be accomplished is the whole literature following Lalande* . Another is the literature on inference and coverage rates**, including some work by your colleagues over in the Death Star***. Neither of these are exactly what you are thinking about, but I think the ways they try to use real data and some “known” treatment effect can be a really convincing strategy for methods research.
  
  If you simulate something, you know the DGP, and you can make your models work or fail as you please. When you have real data, you can only argue for some DGP, and that is the world in which we social scientists actually operate.
  
  This also lets you explore not just “true/false” propositions, but questions of effect size and precision. Which of course you could also do on fully-simulated data by just picking the kind/size of “treatment effect(s)” in the data.
  
  * https://people.hbs.edu/nashraf/LaLonde_1986.pdf
  ** https://economics.mit.edu/files/750
  *** https://www.mitpressjournals.org/doi/abs/10.1162/rest.90.3.414?journalCode=rest#.V1Cj3avSO0E
  (note: “Death Star” is UCD-speak, not a general reference to the mythical place where economists work)
  
  Reply ↓
  - Daniel Lakeland on June 2, 2016 7:44 PM at 7:44 pm said:
    
    This is a really interesting idea. Maybe go farther, can we compare two different “styles” of research, one based on Bayesian mechanistic type models and one based on traditional NHST search for patterns, and see what the results are of the two groups of people, and then apply the models to a very large holdout dataset and see what “replicates” in the larger more accurate context.
    
    Reply ↓
    - jrc on June 2, 2016 8:18 PM at 8:18 pm said:
      
      So we allocate groups of people that have some similarities into 7 “placebo schools”….
      
      My interest in a research program like this is mostly to get an empirical understanding of the believability of the empirical evidence in various social science fields. I want some sort of a empirically-based *descriptive* claim about the state of knowledge.
      
      But you are right – there can be a *normative* component too by testing multiple methods on each dataset(up), and evaluating (based on various metrics) which ones “work best”.
      
      Including various kinds of frequentist and Bayesian methods would be an important part of either one of these research goals. This seems like a discussion that should be continued over meat patties.
    - Daniel Lakeland on June 3, 2016 10:57 AM at 10:57 am said:
      
      Can you think of some commonly discussed claims, not already based on these datasets, which could be examined within these datasets to determine how well the existing theories play out?
      
      My guess is it might be hard, because these datasets are already out there, people are probably basing their theories on patterns within these datasets right?
    - jrc on June 3, 2016 11:34 AM at 11:34 am said:
      
      Daniel,
      
      More like, randomly (arbitrarily) assign observations to “treatment” or “control”, and for the treatment group add into the data some effect. You could assign T however you wanted – individually, clustered, correlated with some variable that you do/don’t include in the regressions, and you could make its effect whatever you wanted (constant, varying, big, small). Then you see how well your methods recover the true treatment effect.
      
      Alternatively, you take a giant experiment, and then subsample T/C from that, and see how often your CI covers the “true” mean from the large experiment. The sub-sampling would mimic potential small-N/power=.06 experiments. If you had a large experiment with large number of cluster-assigned treatments, you could experiment with cluster size. You could test the effects of re-randomization schemes on inferential properties by simulating throwing out some potential T/C differences on baseline within the sub-samples. You could basically mimic whatever experimental conditions you wanted, and, with very large N in the real experiment, have a pretty good “gold standard” against which to judge the methods.
      
      Apologies if something doesn’t make sense, pre-coffee commenting is highly discouraged, but I wanted to get to this before work, because after work, it is USA USA USA!!!!1!
    - Daniel Lakeland on June 3, 2016 12:00 PM at 12:00 pm said:
      
      “Alternatively, you take a giant experiment, and then subsample T/C from that, and see how often your CI covers the “true” mean from the large experiment”
      
      This was what I was thinking of, except using real-world data without an artificial “effect” added in. Just try to look for some already “understood” effect in the literature, and then see if it plays out on both a small and full data set.
      
      And, in fact, it would be interesting to do different kinds of sub-selection. ie. random, uniform with an RNG vs selection by simulating some kind of “recruitment” (basically a much more complicated RNG which may have interaction with the data/outcomes) How well does statistical analysis work when the sub-selection isn’t uniform random RNG but instead complex and poorly understood (method withheld)
- Matthew Zefferman on June 2, 2016 11:05 PM at 11:05 pm said:
  
  Richard, you (and maybe Paul) might want to take a look at this paper that I quite like with Kyle Joyce and Phil Arena. They run an agent-based model of idealized versions of popular theories in international relations and then see if the standard statistical methods used to “test” those theories in the real world actually detect (or not) the process.
  
  Arena, Philip and Kyle Joyce. 2015. “Challenges to Inference in the Study of Crisis Bargaining.”
  Political Science Research and Methods, 3 (3): 569–587.
  
  Reply ↓
  - Paul Smaldino on June 3, 2016 1:37 PM at 1:37 pm said:
    
    Sounds interesting, Matt. White and Smith (2007) did something similar with field sampling methods in behavioral ecology that I’ve always thought was terrific.
    
    https://www.psych.upenn.edu/~whitedj/pubs/WS07.pdf
    
    Reply ↓
- Keith O'Rourke on June 6, 2016 2:30 PM at 2:30 pm said:
  
  Richard: Don’t know if its too late for a comment but – the “Could a neuroscientist understand a microprocessor?” paper is an interesting paper but is this not just “we never get to see/access reality directly” but only indirectly through fallible representations that reality might complain about (not fit) or not.
  
  You make an artificial reality (microprocessor) that you do understand fully (can access directly) but if you try learning about it just with empirical methods – then rather than being surprised that you end up just with representations that are consistent with reality (those that reality did not complain about) rather than a literal representation (true representation) should it not be the other way around?
  
  For instance in this animation https://galtonbayesianmachine.shinyapps.io/GaltonBayesianMachine/ on the right the representation literally represents a true prior and data generating model that I have decided to make up (Reality unseen and true parameter unseen but data outcome seen).
  
  On the left is a guessed at representation of what the prior and data generating model actually were (here a perfect guess) that generated the data on the right and by running the representation on the right you get a posterior for the true parameter drawn from the true prior.
  
  Now there is nothing on the left side from which you can directly access the right side.
  
  Though if you guessed at a prior and data model that never could or seldom would generate the data outcome on the right side you could set that representation aside.
  
  (I only went with a perfect guess as it simplified the programming – but that was mistake – should have made it slightly different.
  Then you get a posterior that you would take for the true parameter but drawn from the wrong prior and or data generating model – which is always the case in actual analyses.)
  
  Reply ↓
  - Keith O'Rourke on June 6, 2016 2:32 PM at 2:32 pm said:
    
    Arggh – generated the data on the right and by running the representation on the LEFT [right]
    
    Reply ↓
AJ on June 2, 2016 11:59 AM at 11:59 am said:

We hear a lot about dubious statistical practices (r(df), garden forks), but publication demands diminish research quality in other ways as well–incomplete literature reviews (e.g., find only those articles that I need to back my argument), relying on known methods rather than learning ones better suited to the problem. It’s about the time constraint; the time spent making any one publication truly thorough or excellent is essentially time “wasted” when the incentivized goal is quantity.

Reply ↓
- Andrew on June 2, 2016 12:06 PM at 12:06 pm said:
  
  AJ:
  
  I don’t disagree with your general point but I do disagree that the incentivized goal is quantity. I think the incentivized goal is total impact, whether measured by citations or journal quality or Ted talks or whatever. Researchers’s first preference is to publish in the tabloids (Science, Nature, PPNAS, etc.) or in a top journal in their field (Cell, Psych Science, APSR, etc.), next best is in a field journal, next best is in lower-tier journal. It’s not just quantity: just about anyone would prefer 1 paper in a top journal to 5 papers in lower-tier journals. One problem, though, is that top tier journals aren’t always looking for “thorough” or “excellent.” The tabloids have very strict space limitations which works against thoroughness. And top journals are often pretty explicit that they want pathbreaking new ideas, not merely science-as-usual. But new and exciting is often counterintuitive and false, so the push toward novelty and excitement can directly work against the goal of excellence.
  
  Reply ↓
  - AJ on June 2, 2016 2:18 PM at 2:18 pm said:
    
    Yes, “quantity, given a certain level of journal prestige” I think would have fit the bill better. I can also see editors being caught in an interesting quality/novelty tradeoff: “how many methodological shortcomings am I willing to tolerate given the novelty in the findings?” Strong pressures on their part to keep the IF up.
    
    Reply ↓
Anonymous baker on June 5, 2016 11:03 AM at 11:03 am said:

I am a baker, and in our profession we too have to deal with bad incentives, which has a negative effect on the quality of our bread.

Here is my story, perhaps it can help with coming up with solutions to improving things in your occupation:

I started out making healthy, nutritional bread, and selling it but people didn’t like it, didn’t recommend it to their friends, and it wasn’t tasty, and didn’t look nice so it didn’t receive any baking-prizes. I wasn’t making enough money as a result of this all. There were other bakers who made truly healthy and nutritional bread as well, but they made and sold more of it then i did because they were better bakers than me, and their bread looked nicer and tasted better than my bread and they received awards for it.

This made me think: Whoever thought of a system in which you need to sell bread that is being recommended by people, that is tasty, and that looks nice, and maybe even receive an award in order to make money?! That’s just a system with bad incentives. Such a system just asks for bakers to start using fake stuff to make “bread”. So, because of these bad incentives in our system, I then started to replace flower, and other ingredients, with cheaper fake stuff. This “bread” didn’t really have any nutritional value anymore, but i still sold it as bread. People bought my “bread” because it now was tasty, and looked nice, and i was able to make enough money.

Soon, some of my colleagues also started to use fake ingredients for their “bread”. Now, other bakers and the general public became aware of now wide-spread use of these “questionable baking practices” (QBP’s), and the fact that there was now a lot of non-nutritional “bread” on the market. This was a major problem. So how could bakers make all of their “bread” healthy, and nutritional bread again? How could the bakers fix this tremendously difficult problem?

You might think: “Hey, why not simply ban the use of cheap and fake ingredients because they make the bread non-nutritional, and this is bad for the general public. Doing this would simply make all “bread”, healthy, and nutritional bread again which is good for the general public. This would automatically also lead to having a fair playing field where the best bakers would thrive in, and the tastiest bread would be sold and recommended and be given awards”, but we decided that that would go too far.

I hope you can come up with some real solutions to your problems in your occupation. Good luck with that!

Reply ↓
- fred on June 6, 2016 4:34 PM at 4:34 pm said:
  
  I then started to replace flower, and other ingredients, with cheaper fake stuff.
  
  Like flour maybe? But hey, you’re the “baker”.
  
  Reply ↓
  - Anonymous baker on June 6, 2016 5:37 PM at 5:37 pm said:
    
    Thank you for the correction, i indeed meant “flour” ! Although flowers are sometimes used in cooking/baking, and i feel inspired to develop a whole new line of flower-bread! I don’t know about the nutritional value but at least it will be novel, which sells big time!
    
    I’ll leave science for the scientists now. I thought it could be helpful to think about how other professions deal with incentives and how they relate to improving their output/work, but maybe science is special, and unique, and deserves it’s own solutions. I just hope scientists will come up with the right ones, that’s what is most important i think.
    
    At least i had a revelation because if this all: I’m not going to make flower-bread simply because it is novel. I am going to make myself some healthy, nutritional bread instead! I’ve decided to leave my QBP’s for what they are, and reasoned it is more important to provide the people with healthy and nutritional bread. In the end, they pay for it so i reason they are entitled to the best quality bread possible. I reason that providing the people with healthy, and nutritional bread is the most important thing in the end. At least it is for me.
    
    Reply ↓
    - Daniel Lakeland on June 6, 2016 8:59 PM at 8:59 pm said:
      
      See, the problem is, when you give people what they want (Wunderbar Bread (TM)) (contains no food ingredients), they pay through the nose for it, and then they get obese and they pay through the nose to their doctors for long lasting insulin medicines to counteract the bad effects of buying the crap that sells really well. We measure how well we’re doing by GDP and so this outcome is “Optimal (TM)”. Oh sure, there are a few hippies in the suburbs of Portlandia that buy Dave’s Killer Bread, but even that is not just because it’s actually tasty and relatively healthy, but also in big part because it tells a story right there on the package about what a good little person you are for buying this bread invented by this guy who turned his whole life around from prison and drugs to baking bread… what’s that you say, Dave is back in jail on reckless driving and assaulting a police officer charges? well yeah, he does have Bipolar syndrome, and the company is actually owned by Flower Foods who has consolidated the baking industry and now owns the rights to both Sara Lee and Hostess, so Dave’s Killer Bread is really made by the people who might be about to revive the Twinkie which is after all the reason why Dan White killed Harvey Milk and George Moscone in San Francisco in the 1970’s. So I guess it makes sense that Dave’s Killer Bread really is Killer.
      
      https://modernfarmer.com/2014/04/illustrated-odd-true-tale-dave-killers-bread/
      
      Wait, what were we talking about again?
    - Daniel Lakeland on June 6, 2016 9:02 PM at 9:02 pm said:
      
      Because somehow the take-away message seems to be ban the use of p values and forking paths or whatever just like you can’t actually put marble dust into the bread flower anymore the way that they did when my father in law was in Mussolini’s Italy. But, I’m not quite following how that’s going to fix academia.
- Andrew on June 6, 2016 9:16 PM at 9:16 pm said:
  
  As an amateur baker myself, I love the direction of this thread!
  
  Reply ↓
- Daniel Lakeland on June 7, 2016 12:40 PM at 12:40 pm said:
  
  Ok, let’s go back to the Baker analogy. Here’s some additional thoughts I’ve had.
  
  One of the issues here is asymmetry of information. Specifically, if a baker knows he’s putting marble dust in the flour but the buyers don’t, then his lower price seems like a deal to the buyer.
  
  By analogy, if a scientist produces regular “discoveries” but nobody knows that they’re unreproducible crap, then the lower price of discoveries seems like a deal.
  
  The partial solution in Baking is to require that people list all their ingredients. The problem with applying that to science is that baking ingredients are an objective easy to quantify thing, whereas science quality isn’t.
  
  Next comes the problem of agency. If you pay someone $500 to buy your groceries for the week, and you tell them “buy 100 pounds of food and spend all the money” and they come back with 100 pounds of soybean oil and tell you they spent all the money, you pretty much know you got screwed. But if they come back with 100 pounds of mixed groceries but every time they had a chance to buy something healthy they bought something cheap that looked healthy instead and pocketed the difference, you’d be getting screwed. Similarly if they went to their brother-in-law’s store and bought stuff at high prices, splitting the profits with their brother in law and giving you lower quality than you would have gotten if they’d gone to a competitor store… you’re getting screwed.
  
  When it comes to science, the way things get funded is to allocate a pot of money to essentially the brother-in-laws of the scientists applying for funding and then ask this group to choose the best things to fund. These brother in laws choose “best” based on criteria like “has a lot of publications in sexy journals” and “is doing similar work to what I do and I met them at a conference and I had a great time talking to them about my own work while drinking Tequila in Cancun” (and if you think that’s not really how it’s done, then you’re deluded). Furthermore anyone who is doing innovative work that threatens the established way of doing things will not get funded because it’s “not achievable” or they have “too few publications” or whatever.
  
  So, how do we fight this system? The first thing is to acknowledge that we have little information about what is and what isn’t good. The second thing is to acknowledge that we have an agency problem that perpetuates low quality research in a “scratch my back” cycle. One solution is to simply limit the effect that those things can have. A-la randomized selection of research grants using the expert feedback as only a small bias on the probability of being chosen.
  
  https://models.street-artists.org/2015/10/19/a-much-better-grant-evaluation-algorithm/
  
  The incentive then becomes to put in as many grants as possible, so it might make sense to further limit the rate at which you can apply.
  
  Reply ↓
  - Anoneuoid on June 7, 2016 1:37 PM at 1:37 pm said:
    
    “The problem with applying that to science is that baking ingredients are an objective easy to quantify thing, whereas science quality isn’t.”
    
    Is it that hard?
    
    1) Can multiple other people often get similar results? Good job, you have figured out a reliable way to measure some relatively stable aspect of the world.
    
    2) Can your theory/model make precise predictions about something other than what motivated the theory? Great, your theory/model looks like it will be useful to predict the future.
    
    I don’t think there is anything novel about either criteria. For sure, many research fields (like psychology, medical research) seem to have abandoned the quest to meet them, which can lead to a fair amount of cognitive dissonance.
    
    Instead it is popular to claim it is “too expensive to replicate” or the subject matter is “too complicated to make precise predictions”, along with other excuses/insults for those who actually try to do science (“research parasites” for people who try to come up with mathematical models, “replication bullies” for those who try to get enough info to replicate these studies).
    
    Reply ↓
  - Rahul on June 7, 2016 1:41 PM at 1:41 pm said:
    
    Isn’t what you described just the Principal Agent problem?
    
    Reply ↓
    - Daniel Lakeland on June 7, 2016 1:46 PM at 1:46 pm said:
      
      two problems, the first being asymmetry of information (what constitutes true and useful science is tricky), and the second being Principal-Agent (who gives out the money vs who pays). They’re well known problems, but they don’t have a single easy fix.
    - Rahul on June 7, 2016 1:48 PM at 1:48 pm said:
      
      Correct. I’m just waiting for this to turn into a Market for Lemons. Perhaps Psych already has?
    - Daniel Lakeland on June 7, 2016 1:59 PM at 1:59 pm said:
      
      I love it. Wasn’t familiar with that paper until now, but in my opinion all of US academic science is on that route, different subjects have gone different distances down that path, but they’re all going down the path. People have been using “quality of the journal” as an information signal for a long time, but it’s becoming a poorer and poorer signal as the incentives at Nature and Science and PNAS and soforth are also becoming skewed. A feedback loop between who gets hired based on where they publish and who gets published based on where they were hired maintains the reputation of these journals even now that they publish crap on a stick regularly.
    - Rahul on June 7, 2016 2:05 PM at 2:05 pm said:
      
      The best will eventually start leaving, or already have, lest they get tarred with the same brush.
      
      e.g. Even if I was really good & passionate about Soc. Psych. do I really want to be associated with that field right now? It’s like having Cold Fusion on your Resume.
    - Daniel Lakeland on June 7, 2016 3:26 PM at 3:26 pm said:
      
      This isn’t just happening in social psych, I see it in biology, people graduating with PhDs and postdocs are being told by their PIs that the PIs don’t recommend that they go into academia. I see it in Engineering. If you aren’t willing to write engineering grants to run massive climate simulations on $500M worth of computing hardware (because with that much resources it MUST be important right?) or write papers about how you’re going to solve pollution problems by millions of remote sensors installed in everyone’s vehicles and connected to the internet by cell phones… then you might as well get out. Not so much because your reputation will suffer, but because you aren’t going to get funding, you won’t get tenure and then you’ll be out on your ear in 4 or 5 years anyway.
      
      In Biology, you’d better be writing grants on how you’re going to measure the bacterial cultures in the seats of airplanes and detect bio-terrorism agents, or design cancer drugs by sequencing the genomes of thousands of tumors and doing “Big Data” genomic analyses, or whatever. If it’s expensive, fancy sounding, and produces lots of probably meaningless papers it can get funding much easier than carefully controlled experiments on mice that take 5 to 7 years of colony management and breeding and developing a surgical technique and actually finding out how to prevent kidney damage or whatever.
Apple on June 6, 2016 7:27 AM at 7:27 am said:

“An incentive structure that rewards publication quantity will, in the absence of countervailing forces, select for methods that produce the greatest number of publishable results. This in turn will lead to the natural selection of poor methods and increasingly high false discovery rates.”

Here is what i don’t understand:

1) What if there were stricter rules by journals (e.g. demand power of .80, pre-registration), so you could say that the output of this system is of “high quality”.

2) What if the incentive structure of rewarding publication quantity remains in place.

Then a result of this system might be that researchers start collaborating more in order to achieve high-powered research and in order to maximize the quantity of their output. For science, this can be considered a “good” thing i would say (?).

So, my point is that i don’t understand all the focus on “incentives” like quantity. This becomes only a problem when the quality is low. So, the solution seems really simple to me: enforce rules which automatically increase the quality of all the work produced. Then all the incentives that are supposed to be having an effect at the moment can remain the same, and perhaps the case can even be made that they are useful incentives (e.g. why not reward a researcher whose output of high quality work is 10x larger than his/her colleague? why not reward a researcher who comes up with novel high quality work?).

Where am i going wrong in my reasoning?

Reply ↓
- Keith O'Rourke on June 6, 2016 9:33 AM at 9:33 am said:
  
  > enforce rules which automatically increase the quality of all the work produced
  Try to work through something that would actually increase the quality (the first use of published study quality guidelines/indexes seemed to be by authors of poor studies learning what to say they did in order to increase the probability of publication).
  
  Assessing the quality of research without access to the raw data, study protocol and amendments (or actually doing replications) – is close to hopeless – some discussion of that here https://biostatistics.oxfordjournals.org/content/2/4/463.full.pdf
  
  Reply ↓
  - Apple on June 6, 2016 10:19 AM at 10:19 am said:
    
    I am under the assumption that there are ways to increase the quality of your work and for others to easily assess this. I thought increasing power and pre-registering your analyses were two examples of this.
    
    If it is not possible to increase the quality of your work via examples like these, then my reasoning makes no sense of course.
    
    If there is some consensus regarding ways to increase and assess the quality of your work (be it my two examples or other ones), my above reasoning still stands for now: it’s not the incentives that are the problem, it’s the standards of publication.
    
    Reply ↓
Michael Koksharov on January 2, 2026 7:16 PM at 7:16 pm said:

Thanks for the nice summary!

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

“The Natural Selection of Bad Science”

46 thoughts on ““The Natural Selection of Bad Science””

Leave a Reply Cancel reply