Gresham’s Law of experimental methods

A cognitive scientist writes:

You’ll be interested to see a comment from one of my students, who’s trying to follow all your advice:

It’s hard to see all this bullshit in top journals, while I see that if I do things right, it takes a long time, and I don’t have the beautiful results these top journals want, even that I did a ton of experiments…

It’s an interesting situation, we either have to essentially fake our results or be doomed to taking back-seats in the scientific debates because our papers won’t come out in top journals because they don’t have crystal clear results. It doesn’t matter to me any more, but it will matter to young people starting out.

Almost every data I have reanalyzed from other people’s published work has not panned out; in all cases, there was p-value hacking and forking paths of one sort or another.

Indeed, if people can get published in a top journal by conducting a two-month exercise involving a Mechanical Turk survey, a burst of data analysis, and some well-placed theory, then why conduct a long research project involving careful data collection?

The answer has to be that you have to have a deeper motivation. Your aim has to be to do the best possible work. If you get published in Psychological Science, fine—indeed, a lot of excellent work does get published in these journals—but publication in those journals can’t be your goal.

My colleague continues:

If you have any suggestions on how these people starting out their careers can move forward without doing all this unethical stuff with their data, it’d be a big step forward.

To me, the hard part about doing things right is not the analysis, it’s the data collection. When it comes to design and analyses of studies, I recommend moving to a paradigm in which researchers seek to push their models hard and find problems with their theories, rather than the currently standard approach in which researchers try to find confirming evidence for their theories by rejecting straw-man null hypotheses.

P.S. Six years later, an update from our anonymous correspondent:

– it is absolutely possible to avoid any of the usual bad practices and still have a career and a good publication record, but one has to work harder and be persistent and patient (many more rejections)

– the main reason people don’t go down this route is that they don’t know what the issue is; this comes from a broken education system and poorly trained (in data interpretation) advisors and editors. Once people become senior they can’t turn around and say, wait a minute, everything I have been doing so far is wrong; this leads to entrenchment effects

– one price to be paid for telling the story in a paper without embellishment and creative wording is that one sometimes is unable to publish in prestige journals; I am comfortable with that but it can cost early career postdocs cushy jobs, they may have to settle for less prestigious universities

– the adversarial (towards one’s own ideas) approach that Andrew suggested in his blog post is very important and really works well, but it does mean that most of one’s theories are going to come out wrong; I don’t know many people (actually, I can’t think of a single person in my field) who are willing to give up on one of their own scientific proposals in the face of counterevidence (side conditions are created that allow the theory to live)

– however, one does have to be publishing steadily (say one paper a year on average for an early career postdoc without a big lab or huge resources) to stand any chance of getting a tenured job—that reality is impossible to change

58 thoughts on “Gresham’s Law of experimental methods

  1. One major problem I see here is that data collection is time consuming when done properly, as well as harder to get money to do.

    Graduate students (like myself,) post-docs, and anyone on tenure track, have fairly constrained timelines. Being early career and less well funded, they also have less clout to be able to do expensive projects that collect data well. Publish or perish is a real problem specifically because pumping out analyses based on smaller, quickly gathered samples or pre-existing data is much faster and cheaper than trying to collect data properly.

    • David:

      Yes, that’s one way that Science, Nature, PPNAS, Psych Science, and all the other “letters”-style journals are poisoning the well by offering big rewards to small, cute studies. If you can do a Mechanical Turk experiment and write it up in 2 weeks and there’s a 10% chance, say, of getting it published in a top journal, the cost-benefit analysis almost forces you to go for it.

      And then when you’re out there looking for a job, you have to compete with whoever has recently released a PPNAS paper on power pose or himmicanes or whatever.

      • I think the biggest challenge is how to get into the academic arena and stay there.

        I did get pushed out of my last position in clinical research for various reasons, but if had gamed for more publications at the time, it would have been much harder for them to push me out.

        If you can get in and obtained some staying power, you can then do some good work.

    • This avoids the real world aspect of the problem: How to get all the layers of the current academic hierarchy to change their mindset/usual practices.

  2. The real problem is that there are far too many grad students/post docs compared to faculty positions, and grant applications compared to grant money. So you have the same problem that you have with any job that has a lot of applications: you have to stand out somehow to get your foot in the door. Right now you do that by publishing a lot or in big journals (or both!). So students could do good work properly, but if it isn’t “sexy”, then they have to still do it quickly enough to pile up publications and hope that gets their application through the CV cut. Even then there’s no guarantee; I worked with a post doc that literally had a first-author Science paper (on solid work) and couldn’t get an academic job.

    Rahul’s suggestion sounds good, but then how do universities make hiring decisions? Referral letters, that are likely almost universally glowing? I think that having less of an emphasis on publications would be great, but it’s a practical problem.

  3. At least in my field (molecular biology), publishing in top journals is not as important for an academic career as most junior scientists think it is. Plenty of people get faculty jobs or tenure without a paper in one of the “best” journals. Reminding junior scientists of this might help.

    PS to Alex — the people making the hiring decisions should actually read the candidate’s papers! Although that does take a lot of time.

    • You’re right in principle, and I’m very pleased that my department in general does evaluate “content” rather than “glamor” in publications, but:
      200 applications x 3 papers each x 30 minutes per paper = 300 hours of reading for a search committee member. How exactly is this going to happen?

      • +1 in principle — but it’s actually not quite that bad; the 200 (or whatever the initial number of applications is) can usually be pared down to a smaller number of applicants whose papers need to be read. Or only applications with an advocate ( i.e., someone who has read the candidate’s papers and given a brief review of them) will be considered by the committee. (Even then there’s a lot of total time spent reading the papers.)

        • I’ve been on hiring committees in Germany, and we usually ask for 2-4 representative papers from each candidate after paring the list down to a first short list. Then we divide up the labor between the committee members; it’s completely doable. Of course, we don’t get as many applicants as in physics say.

      • Why does every committee member have to read all applications? Division of labor?

        Also, *must* we read all papers? You could ask candidates to identify what they feel is their best work. Or pick one at random. Or assign different papers to different committee members. Or something.

        It doesn’t seem to be and insurmountable problem.

        • People use the name of the journal a paper is published in as a filter. So if you published a couple of Nature papers you get filtered into the FANCY category. Only THEN, after the 200 applicants are filtered down to the 5 or 10 fanciest, do they go through and start considering what to read.

          it’s broken, but so is getting 200 applicants for every job.

          is it insurmountable? no, but it requires a lot of political realignment to accomplish something better.

          Also, people don’t just want a good filter, they also want a filter they can game… the postdocs and things are aiming at Nature papers because it’s a hurdle they can game, like HS Juniors taking SAT prep courses.

        • Rahul:

          I don’t know but I’m guessing that Science, Nature, and PPNAS are pretty good when it comes to their core competence in biology and maybe in other fields like chemistry and physics. But when they go for the social and behavioral sciences they have a weakness for simplistic “breakthroughs.”

          To put it another way, they give too much respect to statistical findings and are too ready to believe anything that has a “p less than .05” attached to it.

          But the big problem, I think, is that their criterion for acceptance of papers is “importance” rather than “correctness.”

        • What should be the criteria for using a paper to “accept” a faculty candidate? Importance? Correctness? Something else?

        • So, we are saying that they do an unusually bad job at being a filter for the Soc. Sci. papers.

          Fine. So, if this opinion is widely held, why do the faculty in the Soc. Sci. Depts. persist in giving Nat Sci papers undue importance? Can’t they switch to using “filters” that do a good job? Which Journals may those be?

        • Rahul said: “So, if this opinion is widely held, why do the faculty in the Soc. Sci. Depts. persist in giving Nat Sci papers undue importance? Can’t they switch to using “filters” that do a good job?”

          Many of the faculty in the social sciences may not accept that anything is wrong with the current system; they may be tied into it and like it the way it is.

        • They’re not, it’s just that the criterion being used are:

          1) Can a person get funding?
          2) Are they doing something that sounds fancy and can produce good press releases?
          3) Do they have the support of your best buddies in the field?
          4) Will they make you look bad?
          5) Will they play along to get along at the academic game?

          All of those are signaled pretty well by Nature/Science/PPNAS papers.

        • @Daniel:

          In which case, what are we agonizing over?

          It sounds like Soc. Sci. Depts. are getting the sort of candidates they want / deserve.

        • @Rahul: at your and my expense!

          I don’t mind paying for actual scientific research, but I really mind paying for stuff that we know has no chance of furthering our knowledge even before the experiment / analysis is performed.

          I also mind a lot that people doing poor science are out-competing people who might do good science because the evaluation criteria are all wrong from the customer point of view (us, taxpayers).

          If you were a philanthropist giving money to the trustees of your foundation to further a reduction in breast cancer deaths and they were spending the money on early detection techniques for 30 years and we were doing a great job of early detection, but deaths had stayed constant for decades… and by the way the trustees were all developers of early detection techniques…. would you feel cheated?

          The situation is similar in academia. Academics are supposed to further real knowledge, they apply for grants and evaluate each other’s grants, but if they are evaluating on the criteria I listed… it wouldn’t be furthering knowledge, it’d be feathering the nests of the departments.

          There are a lot of academics, so we can’t apply the average case to each individual, but we can sure as heck apply it to the system as a whole. It fleeces people. Did it ALWAYS fleece people? Maybe not, I don’t know.

          We just got finished discussing how incentives may have strongly affected hospital performance. How are incentives affecting research performance?

    • Morgan – that would be great, but it seems unlikely at best. When I was on the market a couple years ago, the psychology positions I was applying to were getting about 100 applicants, on average. If each applicant had as few as 5 papers, which seems like a small number, then the department would have to drum up time to read 500 papers on top of all the demands already placed on a research professor. Maybe you could read some or all of them after you’ve cut your list to the top 5 or 10, but at that point you’ve likely already made decisions based on publishing records.

  4. The thing I think you’re missing Andrew is that in some fields, a LOT of money is required to do good science. 5 years of housing 200 cages of mice in a colony at $1/day/cage is $365,000 and I’d guess that’s about 1/4 of the cost of doing a good biology project, maybe as little as 1/10 for some projects.

    How do you compare that to someone who hacks up some microarray data with 2 biological replicates in a petri dish and claims to have found “the cause for cancer” or some garbage like that?

  5. When I’ve advised students to do the right thing the near inevitable result has been at least a small meta-analysis in the introduction to their paper. These are very valuable and make getting published in top journals much easier for them. This “trick” won’t work for long because you’ll just build on someone else’s and it will be common place. But for now, and several years to come advising grad students to do that as part of their paper is a great way to increase the paper’s publishability, impact, and citation rate.

  6. I’d advise to think outside the box. Academic \neq Scientist.

    Work on something important, get grants, Kickstarter, private labs, etc…

    You’ll probably end up doing much more meaningful research people care about.

  7. well said dr gelman.

    dr friston, dr calhoun, and jovo (josh vogelstein) also hammered home these salient points a few years ago.

    was hard to accept at the time, given my results and the utility of roweis&i’s contribution, but so be it.

    to be fair, my case may have been the “baroque” writing style, which i have no control over. nonetheless, i felt that the results and contribution were sufficiently useful that they should have overcome any disposition to my prose.

    the sad thing is i cannot write any other way. in canada we weren’t really taught how to write a certain way. i’m one of many who has a distinct writing style. i guess a sophisticated writing style can clash with high-impact journals given that a non-trivial majority of their readership probably did not learn english as their first language.

    patience, perseverance, and some stoicism are crucial to seeing the entire process through.

  8. As a junior faculty member who a) refuses to publish work they don’t believe in (because at that point why would I want this job) and b) would like to some day be tenured (because I like doing this job when I can do it ethically and with self-respect), I’ve been thinking quite a bit about this problem. My current strategy involves trying to focus on projects that are interesting regardless of the result.

    For me, that often means my papers are not written so as to provide evidence for some theory, but instead to provide a new estimate on some parameter (a parameter and estimate that is interesting regardless of the specific theory one holds), or to illuminate some new perspective on the world that allows researchers to harness some new type of identifying variation (and then apply that thinking to a particular problem).

    This keeps me from feeling pressure to get a particular result, and puts pressure on me to do the best I can to estimate the thing I’m interested in. I’ll let you all know in a few years how this strategy works out, tenure-wise. But at the very least I get to enjoy my job while I have it and do work I find meaningful and (hopefully) useful to the field. It just isn’t worth it to me to do work I don’t believe in.

  9. The Paris Catacombs are an example of combining two problems to solve both. 1. The cemeteries are full and unhealthy. 2. There are all these limestone mines going God-knows-where that are starting to collapse. Solution: Shore up the limestone tunnels and move the cemeteries’ contents down there. Two problems solved!

    In looking for a complementary problem to the problem of sloppy research being published in order to chase tenure/grants, perhaps we should look back in time, to the time when lots of science was done by British clergymen. “Early 19th-century science was dominated by clergymen-scientists” (http://www.theguardian.com/commentisfree/belief/2009/apr/24/religion-science-creationism-reiss). Recall that Bayes was a Presbyterian minister. In many instances, being a clergyman provided a reasonable income, and the intellectually curious were much better advised to investigate math or natural science than to poke around in theology. Doing science as a hobby meant that in many cases you could take years to refine your thought and perform your observations.

    This time hasn’t necessarily passed. The Catholic Church is very short of priests and nuns, for example, and is reasonably tolerant of science (evolution is OK, for example). There’s the issue of celibacy and/or vows of poverty and chastity, but these are surely minor things to people who are dedicated to scientific inquiry.

    • Interesting, if unusual :), way to separate conflicting goals – doing good research and needing an income to survive. Of course the arts has patronage, and even a modern social version via patreon.com whereby an artist can gather a reasonable income from a large group of supporters. Think of it like an ongoing kickstarter or gofundme.

      Of course, there’s other options like my alma mater, Hampshire College, that doesn’t offer tenure and is more focused on teaching than research. Of course, the professors still do research, but there’s not a publish-or-perish mentality.

      • Yes, Adam, you’ve caught the gist of the idea. We have many posts about how the incentive system is currently screwed up, but it’s hard to see how the current system is likely to change from within. In brief, this is because there are incentives for the institutions to keep it the way it currently works. So the western academic model of knowledge generation has developed what seems to be inherent problems. And research done in commercial settings with inevitably be aimed mostly at narrower commercial issues (unless, perhaps, it’s part of a tech monopoly like Bell Labs was).

        For truly radical change within your lifetime (assuming the statistical likelihood that you are decades younger than I am) some alternative/competing source of scientific knowledge generation needs to be there, IMO.

        I’m not really suggesting chastity for scientists (although as I write this, it is April Fools Day) just pointing out that different models of scientific support have led to some pretty good science in the past — so some alternative model to Big Academics might emerge to improve the future.

        Thought experiment: how good would “On the Origin of Species” have been if Darwin had been up for tenure review in 1837, after returning from the 1831-1836 voyage of the Beagle? Or would he have just churned out of a bunch of papers on the individual characteristics of various finches?

        • Perhaps the expectations of the system are wrong? That is, in my role, I assume among all software developers that I deal with that given a randomly selected developer they are “average”, even though developers as a whole are likely generally brighter than say the random person in the US. And it ought to be true for academics as well… lots of smart people (when compared to the overall population), who compared to their peer group are likely to be “average.” The result would be, as it is with programmers, is we all assume we’re the ones who are above average and disparage each other’s work yet lack the capability to see the mediocrity within.

          Supposing that’s true, then I’d expect even top journals to mostly be filled with on average, average contributions to the field… perhaps little tidbits… a nudge in this direction or the other, but punctuated by breakthroughs from time to time. It’s a distribution, so sure, assuming variation in performance both within and between individuals, there’s going to be some “below average” stuff that gets published. It’s easy to find and pick on, but its just a reality of any system with variation in it. And if journals upped their game to include better stuff (or exclude bad stuff), the average would move up and what was once average would now be below average. And we could still complain about it.

          The alternative being proposed, that the system is set up to create dysfunctional behavior, would generate the outcome we see, but is it necessary to achieve the current state or could a less nefarious reason be the case?

        • Adam:

          Yes, I discussed this expectations thing here. Part of the problem is that the statistical-significance system can result in pure noise being published, over and over.

        • I don’t disagree that will happen. Yet, it strikes me a bit of some form of declinism – that we imagine today’s science is so much worse than that practiced before this academic system or NHST or whatever was in place. The counter-evidence is in what mankind has continued to accomplish – in medicine, in physics, etc. despite the flawed system we imagine it to be. Perhaps a better comparison (totally off the cuff) would be does academia or private industry produce better progress in your field? And the day it’s private industry then I’d say you have a problem with the academic system.

          What I interpret from others comments is that if you do good work, for the personal benefit of doing good work, then your contributions will (hopefully) raise the bar in your field of study. If so, the expected standard of work in the field will improve itself when new, useful findings arise from good research and today’s below average work will fall to the bottom. And then the cycle repeats. But the average people will always be there relative to the current standard, doing work that we deride for countless reasons – from falsification to poor methodology to well-intentioned-but-just-kooky and everything in between.

          Maybe I’m just too much of an optimist.

        • Fair question… I guess that depends on the time period. :) Since Darwin… average life expectancy, vaccinations, chemotherapy, pacemakers, and on and on. In a shorter time period, 15 years ago what required a major open heart surgery for my father was replaced by a minor procedure and overnight stay for my mother-in-law for (from what I can tell as a layperson) a problem that was substantially similar. My coworker just had a heart valve replaced but the valve was grown from animal cells rather than being human made.

          I suspect when you start solving major problems (like reducing infections through hand washing) that the classes of remaining problems to solve gets harder to solve and thus I’d expect progress to slow. But again, as an outsider I’m still impressed.

        • Thanks, a lot of that was developed quite awhile ago. From my reading, NHST-thinking really didn’t take hold in medicine until the 1980s, although it depends exactly what area you look at.

          I’m always interested in how successful research is done, if you could share the specific names of the procedures you mention that would be great. If it is a hassle don’t bother though.

        • On the exact technical names for the procedures, I have no idea. Both my father and mother in law had several mostly (90% or so if I recall) blocked arteries. My father had a quadruple bypass, my mother in law had stents put in. But again, that’s the laypersons definition of what happened. Not sure on the valve replacement.

          I guess from my perspective I see things happening that improve my life and assume they’re the result of gains in research in many fields. Perhaps I’m mistaken and some measure of progress would indicate that it has indeed stalled.

        • Commenting on Adam’s mention of blocked arteries and bypass vs stents: A friend recently was diagnosed with congestive heart failure. From what the cardiologist said, I got the impression that whether or not a stent could be used depended on how much blockage there was in the artery; but just what degree of blockage is treatable with a stent, I don’t know.
          But my impression from my brother (who was diagnosed with CHF more than twenty years ago, and is still alive), treatment options have improved considerably since then; he expected not to live as long as he has.

  10. The closest thing I have to advice is similar to jrc’s. I’ve recently made the jump from postdoc to junior faculty and share (a) and (b) of jrc’s values/goals. Sticking to (a) hasn’t always been easy but again – why would you want the job otherwise?

    More generally – be willing to fail, or at least lose some battles, and focus on the process itself. Accept that an experiment might not work, that you might not get a nature paper, that you might not get a tenured position, that you might spend 30 years on a theory that turns out to be wrong, that you might not have the ‘right’ metrics etc etc.

    Just focus on the process and doing your best work. Then see what happens.

      • Hi David, Thanks for the link. I’m not sure I understand it though. The implication seems to be that if I choose the right rational way of thinking then I will a) accept the many worlds QM interpretation and b) avoid doing useless things for decades.

        But if I accept (a) then surely there are plenty of worlds in which I think correctly but do useless things for decades? Perhaps I’m just a pessimist but I actually have reconciled myself with a’) the Copenhagen interpretation of QM (actually I’ve become quite fond of it) and b’) the possibility of failure.

        Interesting read though – I might be wrong after all!

        • The point isn’t that many worlds is correct, it’s that it illustrates why convention science can lead to dead ends, simply because it doesn’t appreciate a better method that exists – namely, Bayesian Confirmation Theory, and the idea that you should pre-select the most promising avenues (which are those that are most likely to update your priors significantly,) rather than just do useless things for decades.

          So I’m unclear why you’re OK with doing useless things for decades, given that you could do better if you’re willing to change your viewpoint.

  11. Solution: grants-giving committees should have claw-back provisions for studies that fail to replicate. Once historical grants become personal liabilities to bodgy researchers, the problem will vanish.

    • Jim:

      The funny thing is, I get a lot of push-back even against the idea of dinging someone’s reputation for hyping something that didn’t replicate. We had this discussion the replication group listserv last year: someone said we should be sure to make it clear that in trying to run replications, that we have no intention of hurting anyone’s career. And I said: But it’s a 2-way street, right? If somebody gets a publication in a top journal, that helps their career, right? So if that publication is revealed to be seriously flawed, that should be a minus. Otherwise the incentives are all wrong. As I wrote somewhere else on this thread: think of the careers of all the people who don’t do sloppy research, out of some combination of scruple and statistical understanding?

  12. How far can this analogy be taken? Here is a quick sketch:

    1) Publications are the currency (coins) of academia, they are supposed to contain a minimum amount of value but this has been gradually debased with NHST.

    2) Just as the population of coins are of heterogeneous intrinsic value because they may be clipped, counterfeited, or produced in different eras, so is the research literature. Assaying the actual contents of either is not free.

    3) Coins have different nominal denominations, which may (often enough to use as a heuristic but not necessarily) coincide with the actual value. Likewise, publications appear in journals with various levels of prestige.

    4) The nominal value of a NHST-driven publication is much, much higher than the actual value.

    5) Lots (how much?) of useful research gets classified or locked up as intellectual property or trade secrets. Also, people may keep their best ideas/data secret for other reasons: why publish that necessary methodological trick that helps the competition if you don’t need to? Just get some stars by your p-values and write it up.

Leave a Reply to Rahul Cancel reply

Your email address will not be published. Required fields are marked *