Last month we discussed an opinion piece by Mina Bissell, a nationally-recognized leader in cancer biology. Bissell argued that there was too much of a push to replicate scientific findings. I disagreed, arguing that scientists should want others to be able to replicate their research, that it’s in everyone’s interest if replication can be done as fast and reliably as possible, and that if a published finding cannot be easily replicated, this is at best a failure of communication (in that the conditions for successful replication have not clearly been expressed), or possibly a fragile finding (that is, a phenomenon that appears under some conditions but not others), or at worst a plain old mistake (possibly associated with lab error or maybe with statistical error of some sort, such as jumping to certainty based on a statistically significant claim that arose from multiple comparisons).
So we disagreed. Fair enough. But I got to thinking about a possible source of our differences, arising from the different social and economic structures of our scientific fields. I thought about this after receiving the following in an email from a colleague:
The people who dominate both the natural and social sciences primarily think in terms of reputation and career. They think that the point of making a scientific discovery is to publish a paper and further your career.
Such people are, basically, impatient, and empathize with impatience. So instead of celebrating the fact that it ONLY takes one year to replicate a ten-year study (and take as a given that before you proceed as though the ten year study is true you put in the 10-9 years of work to make sure that you understand the result) they lament the additional year it’ll take just to confirm that something is true.
She obviously identifies strongly with people “who were perhaps operating under a multi-year federal grant and aiming for a high-profile publication” and is just irritated with the idea that ordinary scientists should be able to replicate easily, several times, what they’ve done once, for the first time in human history. But the whole point of expensive, high-profile research is to save those who don’t have the same funding the trouble of making the discoveries themselves. The discovery is precisely supposed to be something you can demonstrate in an ordinary workaday lab … or it just ain’t yet scientifically “demonstrated”.
I agree, and I think I reverb to this because I’m that way too. (Consider, for example, my hobby-like goal of publishing papers in over 100 journals, or my habit of repeatedly googling myself, etc etc.) In fairness, bio apparently is a more hierarchical, pyramid-shaped field than statistics: Bigger rewards at the apex (the top academic biologists have budgets far exceeding that of the top academic statisticians) while at the bottom I think stat PhD graduates get higher salaries and more opportunities than the comparable graduates in biology. So, in bio, status counts for more, also perhaps there’s more insecurity, even at the top, that you might slip down if you’re not careful. Also perhaps more motivation for people of lower ranks to make a reputation by tangling with someone higher up. Put it all together and you have some toxic politics, much different than what you’ll see in a flatter field such as statistics. Or so I speculate.
I think the mistake you may be making is to think of Mina Bissell as representative of biologists as a profession. My model is that there are similar Mina Bissell’s in almost every field: the high-profile, highly-funded elite who think it annoying for their findings to be asked to be replicated or in other ways scrutinized.
I’d make this more of a “Mina Bissell vs. the rest of us” dichotomy rather than Andrew’s Statistics-vs-Biology.
I do think biology and statistics are different. The top biologists have huge labs, they have factories. Even asst profs in biology typically have a bunch of postdocs and a lab named after themselves (e.g., Prof. Smith will have something that’s actually called “the Smith Lab”). These are little empires.
In contrast, sure, some statisticians hire postdocs and are involved in big projects. But a lot of statistics professors, even the most successful and important ones, just work on their own or with a few students. Brad Efron’s managed to make a contribution or two over the years without needing an “Efron Lab” with an army of assistants.
My point is the opposite actually: Many of the biologists (even the ones with huge factory labs) do not agree with the sentiments expressed by Mina Bissell.
Let’s not consider Mina Bissell to be the norm for the field. There might indeed be a scale difference between Stat. vs. Bio. but this scale disparity does not have to necessarily translate into a replication aversion.
Again, I don’t think the opinions expressed by Mina Bissell are typical for the field & some biologists I’ve discussed this with are quite appalled by her article.
I agree with Rahul that many or most biologists are not of the same opinion as Mina Bissell. Most of my bio friends are very interested in replication. most of them actually do replicate their own findings several times before publishing.
That being said, you’re absolutely right, in biology “the Smith Lab” is how it works. You can’t do biology experiments hands on all day long and also run a lab. It’s critically important for biology PIs to find people with “good hands”. People who can make biology experiments happen: running reactions, sectioning samples, doing microsurgery, staining things, diluting appropriately etc.
Imagine if as a Statistics professor you first had to actually personally go door to door across the entire country collecting survey responses before you could even think about the models and software and things you are going to use to explain how voting works in the US.,,, that’s a little like running a biology lab.
The reality is that it is important to be identified as a winner rather than an also ran in one’s career. There are often many investigations in a given field in medicine. If one study out of dozens shows a ” statistically significant” result, some might say it is practically inevitable that some study will give good results given the large number of studies and should be duplicated. The producers of the “good” results will claim that it represents a big step in the right direction. We see this all the time in oncology when a study produces a few weeks of improvement in a background of dozen of negative studies. Somebody won the Powerball today for $425 million; that person will congratulate themselves on their wisdom and not calling for a redo of that drawing.
Replication in bio is often more like cooking than like implementing a regression (despite the rather false analogy often made between the two kinds of recipe). A world-class chef can publish a recipe or take a video of him or herself whisking egg whites, but that doesn’t mean I can walk in off the street and make a great soufflé by following the recipe (I’ve tried and know this for a fact). You often have to go to the kitchen and learn how they do it. And spend a lot of time practicing and being critiqued. Isn’t that why you (Andrew) have people over to your kitchen to show you how to make things? (Maybe we should all take time out to watch Jiro Dreams of Sushi or read The Making of a Chef.)
Let’s take an example of that tarte alsacienne you tried to make the other night. You tried to follow the recipe and it didn’t work. Had you not diagnosed the problem with your scale, you might’ve tried to publish a reply to the recipe that said it wasn’t replicable. Or you might’ve made it and it came out worse than the original chef’s — the dough might not have risen as high, or the onions might not have been as wonderfully carmelized, whereas someone else might be able to follow the same recipe and get a great result. Sure, you could say it’s a fault of the recipe, but no matter how good the recipe is, it’s not enough by itself. O
Let’s take a second example. You spent forever trying to implement EP and VB based on a bunch of different published algorithms. It wasn’t straightforward for you, with all your knowledge of programming and exponential families and ability to diagnose failures of fits. I might have just said the recipes weren’t good enough (they certainly weren’t good enough for me). Now is this a failure of the science? I think part of the worries scientists have is that someone who’s not good at it will try to replicate their result and get a false failure to replicate (type R error?).
Now a third example. Derek Jeter takes a video of himself batting and fielding. An athletic 16 year old can watch it all he wants, but he’s not likely to be able to replicate Jeter’s performance and wind up a shortstop for the Yankees. It doesn’t mean his technique doesn’t work for him or even that his technique is wrong, just that replication is a subtle and tricky business.
In the example of the world class chef, we know they know what they’re doing because the food tastes great. The proof is in the pudding so to speak. That is, the whole point of the procedure is to produce an end product, the food. It strikes me that biology is different in that the goal is not just to produce some compound at the end of a process but to make inferences about what steps of the process caused the compound to appear. (In case you haven’t guessed from my vague language, I’m not a biologist.) It’s as if the chef claimed not only that he knows how to make a good souffle but that a particular ingredient is responsible for the goodness. If so much finesse is necessary to produce the souffle in addition to the ingredient, can we have confidence in the chef’s claim that the ingredient is important?
I agree that replication can fail because of shortcomings in craftsmanship. And failure to replicate does not necessarily falsify a result. But can we really claim to understand well what was going on in an experiment if we can’t replicate it?
I think that depends what we count as understanding. I’d say that if you can invite someone to your lab for a six-month stage and teach them how to do it, then it’s replicable. The proof is in the pudding, as you say.
I completely agree it’d be better to have a nice reduction to step-by-step instructions, each of which is so crystal clear that a first-year grad student can follow them. That’s the ideal that everyone’s aiming for.
Even if it were possible, it might not be worth the effort. But as I understand biology and cooking, it’s just not possible yet. It’s going to take a breakthrough in cookbook writing and lab manual instructions for it to happen, which I think we could all agree would be a great thing for the chefs (and diners) and biologists (and patients).
It occurred to me that the science 101 definition of findings being scientific that I recall is that they _should_ be replicable by _qualified_ others with adequate resources. Some of the commenters seem to think it’s must be always be replicated and or by anyone. Bob’s comments clarify some difficulties with what are qualified others and perhaps adequate resources (a scale that works). What is meant by should is also tricky to pin down exactly (19 out of 20).
But math is probably where this “[proof] _should_ be replicable [steps verified] by _qualified_ others with adequate resources is clearest” – always (except for careless mistakes) by mathematicians adequately trained in that area with access to pen and paper and protected time (perhaps today software and computer resources.)
Now think about the problems caused, even in this clearer area, of Fermat’s trust me I am a real mathematician, I have found this wonderful proof but there is not space to write it out in the margin…
Just for the record, I’ve made tarte flambée lots of times and it comes out great. What happened the time we had you over for dinner was that the kitchen scale was broken and the proportions of the ingredients came out all wrong.
I know. That’s why the comment said “Had you not diagnosed the problem with your scale, …”.
What I meant was that this kind of thing can go wrong in a replication and therefore the replication can itself be in error. In other words, we need to replicate the failure to replicate. Turtles all the way down, since I seem to be stuck on metaphors.
And now I’m in a terminological bind. I tried to look up the difference between an alsacienne and a flambée, and they seem to be synonymous.
As the saying goes: Don’t make virtue the enemy of the good.
PS I have great success with Cooks Illustrated and their Test Kitchen recipies.
(a) I’m not trying to say don’t replicate, just that it’s a concern for people having their work replicated, and (b) this was Andrew cooking, not me, and it wasn an equipment failure, not a recipe failure in this particular case.
I really don’t like Cooks Illustrated, because I think their approach of 1 1/8 cups of this and 3/8 tablespoon of that is totally misguided. You need to learn what a medium-rare steak feels, how a dough is supposed to feel, the right rate of simmer for a braise and the texture of a short rib when it’s done, not how many microseconds to cook it for or how many grams of flour to add to the bread dough. And lest this seem irrelevant, I think this is a similar problem to that found in replicating some bio experiments.
All those cooks who say “I don’t use a recipe, I just throw a little of this and a little of that in” are really saying that they know how to steer. Once you get the basic technique for a soup or a stir-fry or stew or a braise (or whatever) down, you really don’t need a recipe. Following a strict recipe with times and heat levels is like trying to plan a driving trip from New York to L.A. with dead reckoning. You just don’t know what’s going to come up. If you’re brown-frying onions, you don’t know how dry they are to start or their sugar level or what the stickiness properties of your pan are, so you can’t give directions like “10 minutes on medium-high heat, stir frequently” and expect them to work — you need to describe the end product and give the cook tips on how to steer their way to it correcting as they go. A reference I really like, that I think can be applied to cooking as well as programming is The Pragmatic Programmer, which has tips like “ready, fire, aim, aim, aim”.
When it comes to Spanish or Sichuan food I don’t need recipes.
I need recipes when I don’t know what I am doing. I don’t make stuffed chicken every day. Most recipes I only cook once. So I love cooks illustrated bc when I do something I have never done before, and will likely never do again, it takes me there. Like Google maps.
Back to science. I would say replicating someone else’s work is like reverse engineering the recipe from the final dish, and then comunicating it precisely.
PS To carry the last analogy further. In cooking Andrew’s failure to cook a tart flambée was directly observable and obvious. In science it is different. You see a wonderful tart but the question is: Is it kosher, or organic, or whatever it claims to be? That requires going through the whole process of how they got to the wonderful kosher tart.
Notice that what is important here is not that the tart is wonderful, so much as that it is kosher. To say biology is a craft that cannot be replicated is to imply that all their tarts are kosher but unverifiably so. This defeats the whole point of science, for the science is in the method, the kosher part, not the flavor of the tart.
Do all commentators get to taste your tarte flambée? :)
Only the ones that can spell his name right ;-)
What rotten luck! :)
Visit us in Paris! Or in the Fall, when we’re back in NY. Both Andrew and I love to cook.
Thanks Bob! :) Someday!
One last response. Derek Jeter is a great hitter, but that doesn’t mean he understands any principles of hitting. An advance in the science of hitting (as opposed to the art of hitting, I guess) would be some explanation of the effect of approaching a swing or an at bat one way compared to another. Derek Jeter probably wouldn’t be great at predicting whether an at bat was successful or not given a bunch of information about the at bat excluding the outcome.
Even if one is not narcissistic, funding agencies also look in the reflecting pool that is Google and the “published record”. As do tenure and promotion committees, assuming you want to find a permanent academic job on the tenure track.
It is a bit absurd to read opinions like this. I’m in biology and let me say this: A major part of our lab activity (i.e., my advisor’s lab) is to develop/design tools that can be simply purchased outside the lab so that most of our results can be replicated easily without hassle (all are open and we don’t get any monetary credit, though some things depends on institute policy). My PI spends tons’ of money to design a very small bit of system and he just opens and releases it (sure, after publication).
I don’t agree Mina Bissell if she really meant what you meant. But look at the problem from a different point of view. “Replication” in biology is difficult and expensive. I mean, replicating the result of “other biologist” is a huge investment. Do you want to waste your time replicating other people’s result? In stat, it is quick and easy with computers. In biology, even the most simple result needs huge effort and time (like 1~2yr, 1~3people, at least $50,000/yr budget etc…). Where can I get grant for such thing?? That’s why our lab spends a lot of resource to develop/open/cheaply commercialize/standardize/modularize our experimental systems, because we do NOT want for other people to waste their time. We want to minimize such time. But for ordinary labs, such investment (like simplifying things) itself is a huge huge huge investment. Also, some systems are just plain impossible for a single lab to make simpler. Poor labs’ have to wait a decade to own such systems. Remember, the human genome sequencer has become manageable now, but it is not for a single lab yet, and it still took more than 10yrs.
I agree that there might be some cultural issues. But we’re scientists and we want our results to be replicated over and over again. That helps our reputation. But sometimes (especially the ones from prominent labs), it is just plain impossible. And that’s not rare especially in the bleeding edge field requiring multi-million dollars-equipments. For rich labs that own such expensive tools don’t want to waste the time slice of it to replicate other lab’s result. (Of course, some results are so enticing that replication is initiated within a day after publication. But that’s really rare.)
I think across scientific disciplines, even psychology, there are many scientists who work more carefully and don’t just publish “findings”. Instead, they might develop procedures, and so their work is more like a recipe or a proof. Or they might study phenomena in depth and under a variety of different contexts, so that their work already contains multiple replications. In short there are scientists who might even pretend to know or care very little about fancy new statistical methods, and yet they are diligent about collecting evidence and extremely careful with their interpretations. Their data often meets the eyeball test and their claims are not wild extrapolations. Scientists like these, whether from biology, physics, or psychology may well be puzzled about the degree of fear that science is not effectively self-correcting. And they might also think that funding replications is less necessary in a reduced funding environment. (To go back to our other analogy – if you’re not over-leveraging, robo-signing, passing the buck or waiving all borrowing requirements, then you might be a bit irked by increased regulatory oversight designed to prevent you doing these things.)
Put more simply, I think a lot of honest and careful scientists are just tuning out the noise and doing good work.
But replication is not about regulation, command and control, or a scientific police state.
Is simply about a change in attitudes. Its about taking scientific findings more seriously. About questioning authority. About truth.
Also, biologists, in general, are scientists doing science. That could factor into their views on science a little, maybe.
I don’t really understand what you’re saying. I think you’re being dryly witty and saying something like that we should agree with the biologists because they’re real scientists. On the other hand, I have a few commenters above saying that I shouldn’t take Mina Bissell as representative of biologists. So I don’t know what to make of your comment.
In any case, when I’m doing political science, I’m a scientist doing science. And, as a scientist doing science, I’m happy to see people criticizing and replicating my work. Indeed, it puzzles and bothers me to see scientists who don’t welcome replication and criticism.
Fair response Andrew. Actually you yourself seem generally quite careful and recognize how easy it is to criticize compared to how hard it is to do science.
Sometimes I just find the community of statisticians and computer science types so arrogant and quick to hold forth on how others should do science. Perhaps this meant I fell victim to my own point above, re criticism.
However, regarding the commentators above, I take their main point to be that biologists etc do reflect on science, do strive to do better. Furthermore, if the point is that one particular scientist such as Mina Bissell doesn’t represent the community as a whole then that seems inconsistent with the general sentiment of the post that their their views are shaped to a large extent by a toxic environment mainly concerned with status.
So, no, I don’t think we should believe everything they say because they are ‘real’ scientists doing difficult work, but we should take them seriously and give them some more credit.
hjk: “So, no, I don’t think we should believe everything they say because they are ‘real’ scientists doing difficult work, but we should take them seriously and give them some more credit.”
You make it sound as if scientists are a persecuted minority and we, the critics, their tormentors. I’d say it is we who are in the minority. In my experience “Give me interesting ‘findings’, truth be damned” is alive and kicking in social science. As for replications, good luck getting stuff published.
To give you an example. The American Journal of Political Science, one of the top journals in the field, has an explicit anti replication policy. As stated in its Guideline for Manuscripts: “The American Journal of Political Science does not review manuscripts that: … Are unsolicited rejoinders to recently published articles”.
As a poli sic refugee (to the much happier for me waters of psychometrics/mathematical psychology), I can comment on that a bit.
In the late ’90s/early ’00s there was a spate of mean-spirited “gotcha” articles that grad students mostly coming from elite institutions with lots of new statistics skills were using to get cheap publications, often based on reanalyses from class projects. Unsurprisingly there were outright mistakes, and, more typically, suboptimal choices that probably were considered reasonable when the analysis was done found in a number of published articles. This really poisoned the well for honest reanalyses. Several journals got burned and editors said, effectively, “I’m not going to get in a food fight to defend some punk against a senior professor somewhere….”
I’m not saying that the policy is right and indeed think there would be a lot of useful room for thoughtful reanalysis, but to assume that editors don’t think carefully about their journals as a brand is to be willfully ignorant.
What exactly is a mean-spirited “gotcha” article?
I wrote some below, but maybe a slight elaboration is in order:
(1) Request data from someone.
(2) Analyze it using newfangledtechnique that the prior scholar didn’t know about. Likely find some errors as well. (All papers have them, even mine.)
(3) Write this up and say “see how much better the world is now that we have newfangledtechnique?”
On the surface, that’s fine. It’s progress, after all.
I am totally in agreement that poorly analyzed datasets and bad data management abound in science, that more openness is very much in order, and that reanalysis helps quite a bit. Political science has actually been a leader in this regard because top journals require data to be made available. This is very much not how things work in other disciplines. However, the way that it was done then was very much in the spirit of affluent white kids showing up in a poor neighborhood telling the inhabitants what to do. The subtext is right out of “Mean Girls”: “All the cool kids are using newfangledtechnique….”
You write, “grad students mostly coming from elite institutions with lots of new statistics skills were using to get cheap publications.”
There’s something about this framing that bothers me. First, I used to be a grad student from an elite institution, and I don’t think there’s anything wrong with that. Second, I don’t like the talk of “cheap publications.” If someone publishes an article revealing “suboptimal choices that probably were considered reasonable when the analysis was done,” I think that’s a good thing to do! One way to advance knowledge is to reexamine past data analysis choices. I don’t like to see this activity disparaged.
Ah well my phrasing managed to capture the issue as to how things were viewed then. Inadvertent, but perhaps telling!
I totally agree with you that re-analysis (which is what was really being done, not a replication) is a good thing. However, if it’s done in a way that comes off as being contemptuous of the efforts of prior scholars, that’s going to be a recipe for discord. When I say “cheap publications” I mean doing something like asking someone for their dataset, finding some mistakes or reanalyzing using a slightly better model, and then writing that up. This is an incredibly difficult thing to do without pissing people off. There were a number of divisive battles around that. I think that editors just decided they didn’t want to “go there” again, have their time wasted, and have to deal with the fallout next time they decided to ask one of the senior folks for a review, AE, etc. I’m not condoning it, but dealing with senior folks when you’re junior is very tricky.
Past that, if you’ve never actually *been* at a non-elite program, the way that places like Ivies, Stanford, Berkeley, etc., are viewed needs to be taken into account as well. My psychometrics PhD is from a top 3 program (Illinois Psychology), so I too have the elite (if not Ivy) degree, but I’ve had some experience not in an elite environment, too. The elite schools are viewed about the same way as rich kids are by everybody else.
The final thing I’ll note is that, at least at that time, people in poll sci had a very defensive mindset. Many people felt that all anyone did was to take ideas developed by economists or econometricians and figure out a way to apply them. There were a lot of fads going around, too. I believe that that’s changed, but it needs to be born in mind. That mindset was one of the reasons I got out (though the big one was that I was simply not interested in scholarship in that area, ultimately).
> how easy it is to criticize
Purposeful, productive criticism is very hard with few rewards.
What is meant by replications is highly specific to the claim being made and how the support for it was obtained.
In the no-genius area where support is obtained from straight forward randomized studies – it some overlap of confidence intervals that 19 out 20 times. Other ares are much harder but it is definitely not trust me I am a real scientist (music to ghost busters playing in the background.)
In college I had one foot in the world of chemistry and the other in the world of biology. In the former the business of labeling and numbering everything made sense. Benzene was benzene (yes, we could tritiate it, etc but we could still tell the difference) and the results of various experiments were robust because we knew what we were dealing with so that the numbers subsequently crunched related precisely to the labels. Yet in the latter world, that of biology, the labeling was always the result of some compromise that meant that the result of number crunching didn’t permit an inference from the particular to the general but rather an inference from the particular to some property of an arbitrary category of incompletely understood things.
By way of trivial example say we’re looking at some intervention on a crop, e.g. fruit trees. The apples vs oranges issue appears obvious and readily avoided merely by appending “-apples” to “fruit trees” but the harder you look the more it becomes clear that it’s apples and oranges all the way down. Which variety of apple? What sort of soil is it grown in? What’s its pH? How and when is it fertilized, and with what? At what latitude is your farm and how many days of sun do you have each year? On and on it goes.
In the Ag business nobody’s searching for “The Law of Apples”. They just want 10% more yield. So when someone says “If you’re growing such and so variety in a sandy loam and can regulate your pH to x and fertilize to y and have 180+ sunny days then our micronutrient z will increase your yield by 10% on average” a farmer knows whether it applies to her operation and if it does how to attempt to replicate the enticing claims about z. Farmers perhaps more than maybe any other group get the concept, and the impossibility, of ceteris paribus.
No so with many researchers in the bio-medical sciences. Perhaps they don’t get it or perhaps they know that those who pay their mortgages don’t get it. Buyers of cancer research are in the market for “The Law of Cancer” or at least “One of the Laws of Cancer” so that’s what some researchers are trying to sell. The resultant product/finding however is seldom robust and is often utterly dependent upon conditions which are foreign to, and unreproducible in, most patients. Nevertheless, the result is sold as a discovery about cancer when in fact it’s just a discovery about a particular in vitro system that is exquisitely dependent upon a multitude of human interventions. Maybe the real question is “Even if it can be replicated, so what?”
In any event my view is that replication is important not only as a sort of confirmation but also as a test of the robustness, or perhaps generality, of the claim being made. If minute and seemingly insignificant changes in methods and materials are enough to dramatically alter the outcome of an experiment it ought to make one wonder whether the scientist has mislabeled her subject and oversold her product.
Being an economist I think in terms of costs and benefits. The types of trials she would be replicating are costly in terms of money and time. Basically it is as costly to replicate the experiment as it was to pursue the original study. So the the obvious benefits of replication are as you said robustness. There is no one size fits all answer. Important, influential finding would seemingly pass the test of replication. But in science most studies probably don’t pass that test given the cost involved and the finding are small nibbles on the edge. In an ideal world everything would be replicated, but we live in a world of budget constraints and doing so isn’t really practical.
“Basically it is as costly to replicate the experiment as it was to pursue the original study.”
I think that’s wrong. Imagine if someone said, “Basically it’s as costly to go to the moon a second time as it was to go the first time.” Sure, the mission costs may be the same. But going the first time required the research and development to do so. Replication is always cheaper than the original study because you now know exactly what you’re looking for and how to find it. Or, rather, you’re no longer searching for it at all, you just do what the original study did and see whether or not you find what the original study says it found. I.e., you repeat the mission and see if you end up on the moon.
It’s all the trial and error and invention of new processes (sometimes also equipment) in the original study that costs something. After those costs have been absorbed by the original research grant (which is a kind of venture capital that we should also expect, and accept, sometimes leads to no useful, publishable result), the basic operating budgets of our research institutions should cover the costs of replication.
Indeed, replication is just the means by which another lab incorporates the original study into its own operations. In an important sense, we don’t know what the original study’s discovery means until we’ve reproduced their result according to the same procedure as described in the write-up. It should not be seen as a venture, but as an operating cost.
The ‘Us vs Them’, ‘Statisticians vs Biologists’, ‘Level Playing Field vs Toxic Politics’, ‘Reproducibility And Truth vs Impatient Career Optimizers’ is just lazy thinking. Why not read Dr Bissell’s paper and discuss her arguments?
1) You would find that she speaks a lot about reproducing results – in her own lab and collaborating with others. She is certainly not an enemy of reproducibility.
2) And her arguments are not half bad. For example (I quote from her paper): “[W]ho will evaluate the evaluators?” I can fail to reproduce YOUR results just by being sloppy and lazy. Does that make you a bad scientist?
3) “[I]t is sometimes much easier not to replicate than to replicate studies, because the techniques and reagents are sophisticated, time-consuming and difficult to master.”
In my field, computational biology, the statistical/computational/data analysis part of a paper can often attempted to be reproduced by an undergrad student with knowledge of R and the right set of Bioconductor packages. Statistics is the easy bit, because it can be communicated in equations and code – the actual experiments are often much harder, especially in Dr Bissell’s field. If it takes years to hone 3D cell culturing skills then naturally only a small number of people are out there who could potentially replicate results and most others will fail.
4) Dr Bissell is indeed not a good ‘representative of biologists as a profession’, but for different reasons than you might think. For decades she was an outsider and was being ignored by the biology establishment (the genome people). What they thought of her was (I quote from her life story linked to from her webpage): ‘Oh, it’s cute, there is this little excitable Persian woman over there screaming about whatever.’
If she is successful now, it is after years and decades of her and her work being scrutinized and by now I’d assume she knows a thing or two about reproducibility. Some of the comments read like: ‘Oh, she only says that because she is an elitist career scientists who is more interested in publishing papers than truth’. Dismissing her views in this way is lazy.
It’s ridiculous for you to write, “Why not read Dr Bissell’s paper and discuss her arguments?”
Of course I read Dr. Bissell’s paper and discussed her arguments; just see the linked post. And nowhere in that post or this one did I say “vs.” or “versus”. That’s coming from you. Bissell is a successful researcher and got a megaphone to publish her views in a widely read journal. I don’t think it’s so bad for us on this little blog to express some disagreement with her attitudes.
P.S. You write that her arguments “are not half bad.” Maybe so. I assume her article would not have been published if she had nothing to say at all. Nonetheless, I’m bothered by the 1/2 or 1/3 or whatever of her arguments that I think are bad. I am disturbed by the defensiveness of our comment: I reacted negatively to certain very specific parts of Bissell’s article, and your reaction is to impute all sorts of conflict without addressing what I actually wrote. Why can’t it be simultaneously true that (a) Bissell is an accomplished researcher who’s overcome a lot of adversity, (b) Bissell has important things to say about research practice in her lab, and (c) Bissell makes some recommendations that would be bad ideas if applied in general? To put it another way, consider the many many scientists (including me and the commenters in this thread) who think that criticism and replication are increasingly important: maybe we have something relevant to say here too. Even if you judge Bissell makes some good points in her article, that does not mean you should assume she’s 100% correct.
Your #2: Does your cliched “who-will-evaluate-the-evaluators” stop you from using all forms of evaluation, in general? I might fail to asses YOUR exam fairly just by being sloppy and lazy. Does that stop us from having exams?
The replicator does not supersede the original; in case of disagreement, well, we just go for another replicator. If multiple replicators produce disparate findings with no internal consensus it’s usually a good hint that we may be missing something. A productive undertaking nevertheless.
Your #3 is even more ridiculous: Just to save reagents & time we should skip verification? What’s the alternative? Blind faith? Do you have seem deep reason to believe Mina Bissell can never be wrong? If a result is really important surely it is worth replicating?
I trust your allusion to Mina Bissell’s Persian ethnicity, excitability, short stature, loudness etc. are irrelevant to the discussion at hand? Interesting asides, no doubt.
Point (2) is what I was getting at with the tart example in my comment above, especially as it relates to comment (3).
As to “us vs. them”, I don’t think it’s “versus,” but I do think as you point out, different fields have different costs and different degrees of difficulty for replication. It’s relatively easy in stats to replicate someone’s statistical analysis, though much harder if there’s serious data munging involved, or if you need to spider the whole web and run 10,000 computers for a week to do it. Not documenting the steps you followed to get your results could be seen as laziness, but it’s also partly computer expertise in scripting and a time-benefit tradeoff. You could be doing more research rather than polishing the scripts you used to run the last experiments.
So a question for Andrew: do you think it’s a good idea for someone to replicate the American National Election Surveys? It would be very expensive, like a medium-large bio experiment. And it would be unlikely to replicate exactly, unless you used exactly the same wording for questions and the same sampling procedure, and even then, it’d be off by sampling differences. But if you can’t change the questions a bit, is a notion of “happiness” or “satisfaction” really meaningful? And the question is what would you de-fund in order to fund it (assuming poli sci funding is a zero-sum game, which it obviously isn’t)?
Bob I think we are talking different notions of replication. I discuss some of these here http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2256286
The ANES case would be a statistical replication, which does not require the exact same responses for success.
PS Most replications in social science take data as given, focus on procedures.
In biology they can do that too, if labs share data, but my sense is they often try statistical replication by taking a fresh sample of data. Typically I would do the former first.
I do think that NES and other surveys should give out the exact wording and order of their questions along with their exact sampling procedure. One of my problems with many of the Psychological Science papers is that they don’t describe exactly what is done in their survey. I don’t think NES gives out all the details of their design but they do give their survey questions in order, and indeed other survey organizations do use these questions in their surveys. Same with GSS, maybe even more so: it’s used as a template for other surveys.
Psych Sciences paper deserve http://instantrimshot.com.
Pingback: Replication of results in statistics and biology | Scientific B-sides
Pingback: The myth of the scientist as a disinterested individual
all three editions of BDA stare at me reproachfully while I write this.
This ‘little blog’ is a widely read megaphone of its own. So it’s good to have this discussion here:
1) In the title of your post you claim ‘diverging attitudes’ and further imply that statisticians are pro reproducibility while biologists are against it. From working in both fields I can say this is not true.
My own work has often been constrained by badly documented statistical analyses, unavailability of code and badly discussed parameter choices.
At the same time all the biologists I work with routinely replicate other people’s work and benchmark new methods. My own wetlab, as small as it is, does the same.
I see no evidence whatsoever that biologists lack awareness of replication (as some commentators believe) or need to be lectured on it.
2) Basing the claim of ‘diverging attitudes’ on Mina Bissell’s paper would be wrong, because she actually does not argue against reproducing results. Her article is about risks and limitations – this is different from saying ‘it’s a waste of time, don’t do it’.
The first and third paragraph of her paper describe how much effort her own group puts into reproducing their previous results (I assume it’s a way to train newcomers to the lab). In the last paragraph she even proposes peer-reviewed reports to settle reproducibility problems. Nowhere does she say reproducibility was a bad thing. To conclude like commentators here do that she proposes to ‘skip verification’ is wrong.
3) No one disagrees with your statement: “it’s in everyone’s interest if replication can be done as fast and reliably as possible”. I certainly don’t; all my big papers have a data package on Bioconductor for exactly this reason. And if Mina Bissell disagrees with this statement she did not write it in her article.
The problem is your general claim: “[I]f a published finding cannot be _easily_ replicated, this is at best a failure of communication” (my emphasis).
Depending on what you mean by ‘easily’, your claim might be true in a field where all relevant information can be communicated in writng (equations and code) and where execution is cheap and fast.
Biologists are not counting peas anymore. In modern experimental sciences replication is much harder (no one says ‘impossible’ or ‘do not try’ or ‘it’s not important’) because reproducing complicated experiments in highly specialized fields needs lots of experience, lots of training, lots of technical skills, lots of time, lots of money, lots of reagents and lots of other resources. No matter how good your communication skills are, these are real limiting factors.
This is different from statistics, where I see my friends having careers with nothing more than a whiteboard, a laptop and sometimes a cluster, ie much less specialized commodities. Proposing to use ‘multiple replicators’ (as one commentator did) severely underestimates the complexity of many biological replication tasks.
4) In summary, the technical and training requirements in highly-specialized fields provide limitations for reproducibility that need to be acknowledged. Often replication can not be done easily, fast or reliably. This is not due to the lack of awareness of a group of scientists, but a feature of the work they do.
(That said, many of the technological limitations are not permanent and what was impossible to easily reproduce 10 years ago can suddenly be done over night – sequencing a human genome for example.)
So while there is no evidence that the attitudes regarding criticism and replication of scientific claims are any different in biology compared to statistics, the differences in the scientific work being done (not necessarily the social or economic structure) lead to constraints on how much reproducibility is easily possible.
As I wrote in the linked post, I think that where Bissell went wrong is by thinking of replication in a defensive way, and thinking of the result being to “damage the reputations of careful, meticulous scientists.” Instead, I recommend she take a forward-looking view, and think of replicability as a way of moving science forward faster. If other researchers can’t replicate what you did, they might well have problems extending your results. The easier you make it for them to replicate, indeed the more replications that people have done of your work, the more they will be able, and motivated, to carry on the torch.
In the comments here, both you and Bob Carpenter make the very reasonable point that biology studies are harder to replicate than studies in statistics or political science. That makes sense. I still am disturbed by Bissell’s defensive attitude, her seeming assumption that it is right and proper to require that replication requires phone calls and lab visits. After all, a paper that is published in Cell or whatever is reviewed by a bunch of referees who only read the written material. If published research can’t be replicated (and, remember, in fields like psychology, much high-profile research is simply misguided and really can’t be replicated because it’s just noise), I think a more useful stance for influential leaders such as Bissell would be to move to make replication easier, not more difficult.
Sometimes replication just takes a ton of time and resources no matter how well you communicate it. Consider generating mutant mice with multiple combined allele knockouts, some of which are lethal in certain combinations. Perhaps from a random assortment standpoint you need to cross type A with type B to get type C, and cross type D and type E to get type F and then finally cross type C and type F to get type X. Suppose AxB -> C p=1/8, and DxE -> F p=1/8 and CxF -> X p=1/8
the strains of mice produce litters of size 8 or so every month or so. Animals become sexually mature after 3 months… and you need say 40 of type X to get a decent sample to evaluate the claim about type X… this could easily take 2 or 3 years just breeding mice, paying a technician say $50k/yr to monitor the whole breeding operation, and then paying a postdoc $50k/yr to analyze the type X mice that you do get eventually… using sophisticated staining and sectioning and imaging techniques…
I think you can see here that replicating this experiment, while desirable, would have much less reward/cost than say replicating some bioinformatics procedure involving running software on a cluster combing through public data banks for a weekend.
As Bayesians I am going to guess that we are both comfortable with the idea of doing some kind of decision theory over the value of replication vs the probability that the original work was flawed etc. Bissell may naturally hang out in the corner of biology where the original experiments are long, expensive, and involve a lot of resources, and therefore the original research is tightly controlled, well documented, and involves internal replications. The rewards/cost ratio in her corner of biology could be extremely low while still not invalidating your basic point which I take to be something like “replication should be considered, possibly in some kind of decision theoretical context”
I does not matter if there are practical obstacles to replication. The default position based on hundreds of years of experience is that unreplicated results are unreliable. If there is a strong theory capable of making precise predictions and the outcome of the experiment matches these predictions then this requirement can be waived somewhat. However this is not the case in medicine/biology where there is often no quantitative model. All we have to work in is whether effect sizes are similar when coming from independent researchers. Have the alternative approaches (tightly controlled, well documented, etc) been validated against the gold standard of replication, or is this assumptions? These appear to be precursors to replication but leaving out the final step!
While I think the costs of replicating will always be lower than the cost of the original study (when the original procedure is properly described, of course, so that the replicating lab doesn’t have learn everything by trial and error that the original lab learned by trial and error), I agree with Q that the costs shouldn’t really matter, or, perhaps better, should just be taken in stride. Until the result is independently replicated it can be considered reliable.
I think I said in the original discussion of Bissell that we have to remember that in the case of medical discoveries the result will ultimately be “tested” in practice. That is, we take a lab discovery and being to convert it into a drug. All of these translations into applications are costly, too. And the final development and approval of the drug is still more expensive. If the end product is to “work”, however, the original discovery has to be “true”. And replication is part of the process that ensures (not perfectly, to be sure, but still) truth.
In the perspective of this much larger R&D process, the cost of a replication or two, is a drop in the bucket. This is why it’s correct to say that the argument against replication is impatient and somewhat parochial. If we’re talking about a chunk of funding and a one year delay before a result is taken to have been reliably established (and possibility that its reliability actually be questioned) then it’s insignificant in the “big picture”. But it’s certainly noticeable from inside the smaller frame of a single lab’s research budget, and a single researcher’s career. But it’s work that simple has to get done.
I meant it canNOT be considered reliable in the last sentence of the first para.
Thomas, Yes I think we agree now. Another way of putting it is that the benefit of unreplicated science is zero, so the cost of the experiment is irrelevant. What is the benefit of a result that did not use proper controls, or is based on data entry errors, etc just because it would be more expensive? Independent replication is just as important as these aspects. I would go as far to call science without replication cargo cult science (see Feynman quote in my post below). Unreplicated experiments are perhaps still superior to just making up explanations (guessing), but it is not the science that has led to the benefits we all enjoy today.
Thomas, no one here can calculate the benefit of a researcher’s claim nor the cost of the same claim being wrong. That requires seeing how others use that result in the future. The cost-benefit approach to this problem is rife with endless arguments outside the realm of science.
I think we agree on that. There will always be a cost-benefit argument in regards to funding the original research (instead of funding another piece of research or some social program), of course. But the cost of replication, like I say, should be considered a drop in the bucket. Its benefits are immeasurable, having to do with essence of science, not some future extra-scientific outcome.
This is definitely not the view in chemistry. You should be able to mix the reactants and make the new compound. I think this is some bio-fluffiness. “Nanoscience” has some of this same attitude. Stripey nanoparticles, Schon, etc.
No doubt replicating biological studies is more difficult, though presumable this is what biologists are trained to do. And yes, biologist do a lot of replication.
But the bottom line is this. If nobody can replicate a finding we will never know whether the finding is true — and we are all less skilled than the original researcher — or we are all performing fine, and the original researcher made an inadvertent error. That is, we will never be able to tell fact, from artifact.
In light of this, an the fact that biology experiments are hard to replicate, biologists need to pay more attention, not less, to replication. This includes making it as easy as possible for others to replicate.
Florian did a good job listing the defensible excuses for non-replication. OTOH sometimes replications are hard because elite labs keep protocols secret or will only share them with a select few collaborators. Often crucial details are left out or access to a certain line of cells or reagents.
In short, not all problems are scientific, often it has to do with retaining the first-publisher advantage in a competitive field.
We should consider what different fields mean by replication. Here is the usual biomedical method:
1) Generate two groups that you have only purposefully varied by one factor. Assume everything else of importance is held the same (this is the infamous nil null hypothesis)
2) Compare small samples of these groups. For in vitro work usually n=3, for animal studies n~5-15. Answer the question: are the averages of the two groups significantly different (p<.05)? If the results are almost significant, perform the experiment a few more times to see if you can get that magic p value. Also, you may think of reasons that some results may be considered outliers (there will always be many reasons) and can be ignored in some "fair" way. However, DO NOT perform this last part if the results are already consistent with your theory.
3) If the averages are significantly different in the direction predicted by your theory, accept that this difference is due to the factor you varied. If they are not significantly different, either do not publish because this is an uninteresting negative finding, or publish and claim that your intervention does nothing. If they are significantly different in the opposite direction, sometimes alter the theory, other times come up with ad hoc explanations for this that save the initial theory. Either way you do attempt to publish in the latter case.
4) If published, others read your work and use it to support their own theory if they like it, often ignoring the result if they do not like it. Sometimes the latter will get caught in peer review.
5) If it is cheap to do an exact direct replication, someone may do so. ***The original claim is considered replicated if the result is significant in the same direction as the original study and not replicated otherwise. There is no consideration of whether the effect size is similar.*** If it is not cheap to do an exact replication, someone may instead attempt to "build upon" your finding and perform a similar study where some related variable is altered or the experiment is performed in a different system (animal strain, cell line). In this case, a failure to get significant results in the same direction is considered as "strain differences", while otherwise it is support for the original publication.
Please note that in this entire process we have never gotten any idea of what the distribution of individual results looks like due to the small sample sizes and presentation of results as dynamite plots (mean +/- error bars). All other sources of variation under than the one in question are treated as noise and assumed to balance out perfectly. It is impossible to build a quantitative theory that would allow us to escape the nill null hypothesis out of this type of "increases/decreases" evidence.
Bissel is a careerist scientist with high status. Of course she has disdain for the more ethically pure desire for replication.
While I’m sure there are some population differences of population opinions of different fields (E.g. math> physics> stats> chemistry>> biology>psychology>education), I think you will find the bigger differentiator is one of ethics compromisers versus non compromisers…and that many people are unhappy with the careerists.
P.s. And all the careerists will deny that they are sleazy. But it’s like girl(boy)friends, judge them by their actions, not their words.
P.s.s. You can find lots of classic support for the need for honesty, replication etc. from E. Bright Wilson, Feynmann, etc. (and comments about careerists)
+1 on Feynman:
“I think the educational and psychological studies I mentioned are examples of what I would like to call cargo cult science. In the South Seas there is a cargo cult of people. During the war they saw airplanes with lots of good materials, and they want the same thing to happen now. So they’ve arranged to make things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head to headphones and bars of bamboo sticking out like antennas–he’s the controller–and they wait for the airplanes to land. They’re doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn’t work. No airplanes land. So I call these things cargo cult science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land. ”
“Other kinds of errors are more characteristic of poor science. When I was at Cornell, I often talked to the people in the psychology department. One of the students told me she wanted to do an experiment that went something like this–it had been found by others that under certain circumstances, X, rats did something, A. She was curious as to whether, if she changed the circumstances to Y, they would still do A. So her proposal was to do the experiment under circumstances Y and see if they still did A.
I explained to her that it was necessary first to repeat in her laboratory the experiment of the other person–to do it under condition X to see if she could also get result A, and then change to Y and see if A changed. Then she would know the the real difference was the thing she thought she had under control.
She was very delighted with this new idea, and went to her professor. And his reply was, no, you cannot do that, because the experiment has already been done and you would be wasting time. This was in about 1947 or so, and it seems to have been the general policy then to not try to repeat psychological experiments, but only to change the conditions and see what happened.
Nowadays, there’s a certain danger of the same thing happening, even in the famous field of physics. I was shocked to hear of an experiment being done at the big accelerator at the National Accelerator Laboratory, where a person used deuterium. In order to compare his heavy hydrogen results to what might happen with light hydrogen, he had to use data from someone else’s experiment on light hydrogen, which was done on different apparatus. When asked why, he said it was because he couldn’t get time on the program (because there’s so little time and it’s such expensive apparatus) to do the experiment with light hydrogen on this apparatus because there wouldn’t be any new result. And so the men in charge of programs at NAL are so anxious for new results, in order to get more money to keep the thing going for public relations purposes, they are destroying–possibly–the value of the experiments themselves, which is the whole purpose of the thing. It is often hard for the experimenters there to complete their work as their scientific integrity demands. ”
So articulate. This is exactly the stuff people like Mary Bissell don’t get. Sad.
“So I have just one wish for you–the good luck to be somewhere where you are free to maintain the kind of integrity I have described, and where you do not feel forced by a need to maintain your position in the organization, or financial support, or so on, to lose your integrity. May you have that freedom.” (Feynman)
I do think this is the biggest problem to overcome and too bad it is so hard to assess in the published papers.
Sturgeon’s law applys to physics as well as to art, but people think, for some odd reason, that it shouldn’t.
That is, we are not surprised when 99% of , say, lets pick a low status field, macdonalds employees, do a bad job.
We are surprised when 99% of astrophysicisits, or psychologists are.
kinda about expectations and social snobbery more then anything else.
As a molecular biologist, I was trained to regard the peer reviewed , decent journal literature with great suspicion: only a few professors were considered trustworthy in general; as for the 3rd and 4th rate journals, so fas as i know, noone even reads them (average citation index of something like <1)
Good point about Sturgeon’s Law!
I think it applies to one’s own work as well.
One of the reasons why we might want LESS science funding, especially in bio. There are diminishing returns. Not clear at all that throwing more money materially advances innovation speed overall.
How many people here have actually met Mina Bissel or spent time around a wet lab? How many of you know anything at all about oncology, or anything remotely related to health science? It sounds like a bunch of economists in here…
Mina Bissel isn’t a careerist, huxtering, self-promulgating member of the scientific elite (she’s actually a pretty nice person and a damned smart lady). As I recall she pretty much forced LBNL to remove her from a high position in the Life Sciences Department (maybe it was “head of”, I don’t remember exactly) because she hated politicking and just wanted to do science. LBNL doesn’t cater all that well to people who bullshit their way around the scientific community. They usually end up getting kicked out.
Believe it or not Bissel’s drive isn’t the result of some selfish desire to be famous and she actually knows what she’s talking about when it comes to actually doing science. Your colleague wrote you a letter which was absolutely nothing except for supposition fueled by the fact that he didn’t like the sentiments of Mina’s op-ed. I’d also venture the guess that he’s a little bit jealous of her grant money, though I couldn’t say for sure even if I met him/her.
I’ve met plenty of huxtering doucebags (specifically in molecular bio type fields) whose only goal in life is to recieve honors and be fawned over by everyone who doesn’t know any better. I will say again that Bissel is not one of them. I’ll also add that I’ve never heard an economist say anything about science or scientific practice which wasn’t frighteningly stupid.
There are quite a few of us here in the physical sciences who aren’t economists or social scientists. This is a rather lazy way to dismiss a point of view.
I don’t have an opinion on Mina Bissel, but I certainly don’t think there is no way in hell that there currently exists “too much of a push to replicate scientific findings”.
Andrew said: “So, in bio, status counts for more, also perhaps there’s more insecurity, even at the top, that you might slip down if you’re not careful. Also perhaps more motivation for people of lower ranks to make a reputation by tangling with someone higher up. Put it all together and you have some toxic politics, much different than what you’ll see in a flatter field such as statistics. Or so I speculate.”
Biology is a broad field. I am guessing that your speculation may be based on just parts of it. Your speculation does not seem to fit the biologists I know, who are mostly in evolutionary or conservation biology. These are areas where “running experiments” is often impossible, or at best subject to unpredictable complications. Possibly that brings a greater humility than in areas where one can run tightly controlled experiments.