I think that there’s some more formal way to say all of this, but informally it goes something like this: I’ve generally thought of junk science as being produced by sincere, well-meaning researchers who think that the way to do science is to have a vague idea, conduct an experiment (or, in many areas of social science, a “natural experiment”), and then find some p-values to support their claim. That is, I’ve been assuming the problem is incompetence rather than fraud (with the incompetence, crucially, being not just in analysis but in research design: Fixing so-called “questionable research practices” would not turn bad science into good science; at best, it would just make it harder for researchers to fool themselves with bad science), but now I’m thinking that a big driver of the problem is . . . not quite “fraud,” exactly, but people producing work that is motivated primarily by politics, or scholarly politics, or career motivations . . . There’s so much of this, that a lot gets published.
It’s not a sharp line: I guess that the authors of political hackwork think they’re on the side of the angels . . .
Ummm, let me put it this way. I think that many or most researchers think of statistical tests as a kind of annoying paperwork, a set of forms they need to fill out in order to get their work published. That’s an impression I’ve had of researchers for a long time: they feel they already know the truth–they conducted the damn experiment already!–so then they find whatever p-values are necessary to satisfy the reviewers. My new thought (or, new to me, at least) is that many researchers also think of research itself as a kind of annoying paperwork. They already know what they want to say about disparities, or climate change, or evolutionary psychology, or the 2020 election, and, yeah, they’ll do an experiment or a data analysis or whatever, but that’s just a means to an existing end. They’re not doing science. My thought in the above-linked post is that there maybe is so much of this hackwork that it overwhelms the system.
I sent the above to Joe Bak-Coleman, who wrote:
Putting my ecologist hat on, but I think this emerges as a result of the context in which studies like this occur. A perfect storm for garbage research is (a) a researcher who wants to find something, (b) statistical methods that aren’t up to the task given (c) data which has signal << noise and confounds abound. On motivations, sometimes it’s for political reasons (Ioanndis, the cited papers), because there’s an information void (the Al-Aly stuff that someone really needs to dig into…) or because someone has built a career studying the same topic and needs to find and answer. (Ariely, the Protzko paper). I think each of these are contexts where it’s hard to imagine the researchers coming to the opposite conclusion and, like you say, the science is a formality to enable rhetoric. On the stats, when you’re asking a question as complex as gun violence and trying to do it with effectively death ~ lm(kitchen sink)... all you get from your model are coefficients, p-values and CIs. Appropriately dealing with causal structure, or (especially in this case) the underlying generative process over time is just something researchers don’t spend time on. Why would they? There’s enough noise in the system to dredge up significance especially when you’ve got large N. Which gets to the final problem of signal << noise and/or confounds abounding. Most/all of the big effects in gun violence researc have probably been identified. Countries with lots of guns will have more gun violence than countries with no guns. People who own guns will be more likely to get shot than those that don’t. Etc. etc.. Disentangling effects of policy which may be time-lagged or experience hysteresis because the state *already has a lot of guns floating around* wades in complexity, confounds, and a host of other things that if not properly dealt with will result in GIGO. I think there’s probably good work done and to be done in gun violence, but there’s just so much noise, confounding and heterogeneity to huff. That said, where I might depart from some of the folks here is that I just don’t know that this perfect storm is universal in science and is probably more characteristic of disciplines or areas of research than Science. I think these fields that tend to draw the attention of folks’ like us. The engineers, physicists, and many ecologist I’ve worked usually have really difficult technical challenges to deal with but once they’re worked out the effect sizes often just scream at you and scarcely need stats. Much more often than in the troublesome areas of science there are explicit (perhaps multiple) mathematical models that are either worked out analytically or used to simulate expected results before digging into a multi-year project. Sometimes there are clear motives to find something (e.g. conservation) and sometimes this goes awry . . . but it’s harder to finagle those from the noise when the signal is huge. And in many cases, “null” findings are also interesting because established theory verified in a different context might predict something quite different. Perhaps where I might also depart from some here is that the context sensitivity makes me wonder if there is any real universal solution or set of solutions to these problems. The stand out example for me is nutrition science. A lot of the big, obvious effects have been picked through and now so much of it is simmering in noise with strong incentives to find various different things by getting significance. Alcohol/chocolate/coffee does, doesn’t, does, doesn’t, does, doesn’t cause increased mortality. I don’t know how we could expect that discipline to turn around. There is good work being done there here and there, but so much of it is GIGO. I have a paper in the works trying to sort out how we can know if a field is producing knowledge or just chasing ghosts . . .
I see what he’s saying about causal inference. One problem I’ve seen is that researchers have been taught that causal identification is the #1 issue, so they do a randomized experiment or they use a so-called identification strategy (difference-in-differences, instrumental variables, regression discontinuity, whatever) and then they think they’re done.
Regarding nutrition science: yeah, this is another field where there’s endless crap being hyped. Also related areas in health science such as that stupid cold-shower study or all the crappy sleep research. I don’t have any sense of an escape route for all this. On one hand, nutrition, health behavior, exercise, sleep, etc., are hugely important and worth scientific study. On the other hand, these fields are so rotten, with really incompetent or unethical people deeply embedded within the system of academic publication and news media promotion, that sometimes it just seems entirely hopeless.
I talk about this a lot with my students. I have referred to this issue as those who play the academic game to get tenure or other status, and those who legitimately do the hard science because they believe in the research and science they do. Both groups succeed in academia, but the ones that make the headlines are often the ones that play the game (run quick studies, publish quickly, clickbait headlines of their studies). I want my students to be skeptical of ideas, including mine, and think deeply about what their question is related to previous literature, but also to the real lives of individuals–not what we have made up in our theories about human behavior or learning. As I have learned from my economist colleagues-assume any finding you have is spurious and do everything you can to knock it out. If it is still there after you have tried your best to remove it-then maybe you have a signal. As a psychologist, I am interested in all those variables that might knock at the effect–because it helps me begin to draw the picture of what might be going on with complex behavior and learning in context.
Yup. Spot on. Some also play the industry consultant game and learn how to produce results for clients. They have no interest in science.
To prove that I have actually read today’s stuff, I found this typo:
“gun violence researc have”
Also, this sentence, “On the stats, when you’re asking a question as complex as gun violence and trying to do it with effectively death ~ lm(kitchen sink)… all you get from your model are coefficients, p-values and CIs.”
I have lived in several foreign countries(Norway, Netherlands, England) and gun violence is not that complex because guns are not easily available. So to speak, “packing heat” is not a way of life in those countries.
Some may be incompetence, but I suspect most misdeeds are the result of poor incentives and even poorer quality control. Any issue that has political, economic, or popular importance (which covers most of economics, medicine, psychology, ecology, and probably everything else, though perhaps not to the same extent) can get the attention of interested parties. This often translates to money and influence: grants, consulting fees, promotions. Given abundant and diverse data (often not readily available to others), this leaves plenty of room for shoddy work and deceitful practices. I recall one intense regulatory proceeding where the consultant sitting next to me (working for the other side) turned to me and happily said, “we’re getting paid for this,” even as his testimony was being shredded. It is a lucrative game. And if you can’t garner dollars for your work, there is always your vitae you can pad.
Given the real world of uncertainty and complex causation, I doubt much can be done to remove such incentives. Quality control is a different matter. If publication practices were reformed, and if political processes (including judicial ones) were reformed, I think there would be more progress. I believe (at least it is my recollection) that Canadian regulatory processes often involve experts who serve the court and not any side in a dispute. That seems like a superior system to me. And something like that in the political process would be welcome. Unfortunately, the US political process does have third party “unbiased” organizations that serve such roles – but they have all be rejected as biased if they conclude anything that one party does not like. Even the judiciary as a whole has been rejected as biased (both both parties). The more polarized the environment, the more I would expect to see intentionally distorted work proliferate. Can I propose that as “the theorem of shoddy statistical work?”
“motivated primarily by politics, or scholarly politics, or career motivations”
Man, I can’t tell you how disheartening I found this when I started grad school. I was expecting to find like-minded people, being a curious nerd. But the overwhelming majority of them were either careerists who just wanted to do what it took to get status and a good job, or advocates who just wanted to support their predetermined worldview and conclusions (political or otherwise). There was disappointingly little in terms of actual curiosity and frankly it soured me on academia (even though I still work there).
not understanding statistics, plays a big role.
but I might be biased.
statistics are taught technically but without requiring understanding, I think.
this is what I feel deeply.
but! what % of the causation belongs to this Vs the multiple other causes??
my argument is that lack of empathy makes someone a likelier murderer. and innumeracy makes someone a likelier bad science peddler
Andrew said “My new thought (or, new to me, at least) is that many researchers also think of research itself as a kind of annoying paperwork”.
As another ecologist primarily working in conservation, I would say this is doubly true, in that the paperwork is aimed at “making the world better” (the underlying values are assumed to be universal and unquestionably good), so it hardly matters if the data and analyses demonstrating declines or whatever are slightly dubious (nonprobability samples with changing coverage every year anyone?). Scientists insisting on good analyses or honesty around problematic biases are just seen as hindering the dissemination of the message (and/or funding acquisition/personal status).
This is doubly true in social sciences where it involves any sort of social advocacy. This has been a sea change in the last 10 years that I witnessed first hand. 10 years ago I could stand in front of an audience and say that I am trying to research questions about inequity and unequal treatment with a dispassionate view and see nods from the audience. When I talked about this a couple of years ago I got a completely blank look from the graduate students in the audience.
In private conversations with them, they make it clear that advocacy as an important goal (although thankfully not the overriding goal) of their work. These are smart, capable students, so it is especially sad but also alarming to see the lack of appreciation for how the two can at times be in conflict.
I’ve seen much of the same. What I find strange is the idea that one can be politically effective even though one’s thinking is mistaken or at least over- or underweighting relevant factors. Persuasion takes precedence over figuring stuff out. Was this less the case in decades past? It’s possible that there were critical mass effects involving gatekeepers, deans etc.
Sociology is the most corrupted in this respect. An expose is
http://Www.ihatesociology.com
The president of Wellesley college made the possibility fair point that some/many students at Harvard dismiss this nonsense. But the indoctrination is effective at the most other schools
My perspective is informed by having taught both statistics and case study methodology. I think case studies, properly done (and that’s a big, complicated if), can go a long way toward weeding out unwarranted expectations from large-n work. If you approach them in a way designed to observe potential causal mechanism at work (again the big if), you can also see what models do or don’t make sense in a sampling context. Of course, cases can be undertaken dishonestly or in a haphazard way, and they can be cherry-picked for confirmation of a favored theory. But what I’m trying to say is, done right, each can offset some of the shortcomings of the other.
One of the things that struck me when I first waded into the quantitative literature on compensating wage differentials (value of statistical life) was the absolute dearth of case studies. That was a big warning sign. One of the first things I did was simply to look through curricula and textbooks for training safety managers to see if there was discussion of the possibility the company might have to jack up wages if there were more accidents on the job. Sometimes it can be very simple — the thing is to look for a plausible causal mechanism.
Peter,
I’m curious, can you point to well done case studies? Topic doesn’t matter, so long it is social science. I’m a quantitative sociologist and I am sceptical of case studies. I feel if a field is dominated by case studies you have a whole bunch of ideas / theories and effects floating around but the bigger picture is missing. Also, I can not remember having read a good one.
This is an excellent post. I pointed to it just this morning when a colleague, involved in teacher education, pointed out how awful the majority of the education research literature is. Bak-Coleman’s statement that “There’s enough noise in the system to dredge up significance especially when you’ve got large N” is often very true. Reading yesterday an awful paper on gut microbes and performance on memory tests — the gut microbiome being another field replete with garbage, though not to the extent of education, nutrition, and others — I thought to myself that it would be hard *not* to put together at least one “p < 0.05" conclusion from the dataset. Unfortunately, many researchers simply do not understand this, and we don't have mathematical / logical / statistical competence as a requirement in many fields of science.
It's especially sad, though, to be writing this now. On the one hand, science is beset by many self-inflicted problems, like these. On the other, we have the sudden and chaotic bombardment of externally-inflected problems in the U.S., with the government hacking away at student training, research funding, and more. "Clowns to the left of me, jokers to the right," as they say…
I can’t help but respond to this. I was always bothered by overt politicization in academia, and from time to time I would say to speakers/colleagues/etc. some version of “Do you understand how real politics works? Do you think you can congratulate each other on how subversive you all are and not get pushback from the people you’re trying to subvert? Are you prepared for this battle?” None of this justifies the anti-intellectual backlash we’re seeing now, but naivete among the professorial set sure didn’t help.
Absolutely. There has been a real naivete and triumphalism at work. It also irks me that the politicization has been largely, though certainly not all, driven by humanities faculty, but the costs have now disproportionately fallen on us STEM faculty. At the moment, it’s definitely more important to unite together, but I do wish there was an acknowledgement of this by the folks who spearheaded the politicization. Unfortunately I have about as much trust in them as I do in the current administration.
Perhaps the one bright spot is that now we have a sense of how damaging politicization can be, those of us in the “center” can push back with real force in the future. Of course, saying this is like saying that some things improved after the black death.
Peter:
You write, “Do you understand how real politics works? Do you think you can congratulate each other on how subversive you all are and not get pushback from the people you’re trying to subvert? Are you prepared for this battle?”
That’s me, I guess, in that I point out the problems with junk science and I try to do good science myself, but I don’t fight the political battle. I’m not out there trying to defund Ted talks or to push the charlatans out of the National Academy of Sciences, etc. I’ve been myself the victim of academic politics, and that did not lead me to want to get political myself; it left me disgusted with politics.
I agree with you 100% that politics is necessary, that when we criticize bad work we should not be surprised when some of the criticized people respond ungraciously, even to the extent of misrepresenting or flat-out lying, and I’ve heard that lots worse has happened to other people. Yes, this is politics, which I hate. I’m just hoping there can be some division of labor between the people such as me who point out the problems and the people who fight the political battles. I don’t know that my path–research and criticism not backed up by political action–is “naive” so much as hopeful.
Just by analogy, some professors do research and some write textbooks. I happen to do both. But I don’t object that some researchers don’t write textbooks at all. They’re relying on a division of labor.
We obviously agree, but I’d like to emphasize that there’s a false consciousness (so to speak — I hate this term) at work in much academic “subversion”. In certain circles, they *think* they are doing politics when they publish stuff that attacks those in power, but they confuse analysis and criticism with political action. They really want to think they’re changing the world, but they don’t understand what that would mean concretely. (Tangent: this reflects the “cultural turn” taken by much of the Left after the collapse of Marxism, where politics is downstream from culture, and the cultural critic is manning [personning?] the true barricades.) You and I both know when we’re doing politics and when we’re simply examining things carefully and as objectively as we can. I wish this difference had been clearer to others too.
Thanks, that perspective hadn’t occurred to me before. There is a commonality between replication crisis deniers and those of us who want to separate advocacy from the science. Both can invoke the “bigger picture” when justifying their actions.
What techniques have you checked? The ones that worked in the past are independent replication and making “otherwise surprising” (typically meaning precise quantitative) predictions.
And I have asked multiple people I knew personally why they (obviously) cut the corners they did after a presentation. The answer was always along the lines of “I did what I needed to survive”.
To me, that shows the problem is low standards. Eg, if people are willing to pay for NNT ~100 medical treatments, that is what they will get. Placing the blame the researchers for following incentives doesn’t work. To fix the problem the blame needs to be placed on the source of funding.
“I have a paper in the works trying to sort out how we can know if a field is producing knowledge or just chasing ghosts” – Echoes of Lakatos’ ‘progressive and degenerative programs’ here :)
Causal inference is of particular value to policy-makers, as it predicts the effects of interventions. Policy-makers with a legal or political background might be impressed by an adversarial process, and this has been suggested. In “Unsettled” by Koonin, he argues for a so-called Red Team approach to investigate climate change.
… In essence, a qualified adversarial group would be asked “What’s wrong with this argument?” And, of course the “Blue Team” (presumably the report’s authors) would have the opportunity to reubit the Red Team’s findings…
In an article in The Guardian newspaper “Ten lockdown lessons to learn for next time”, epidemiologist Mark Woolhouse is quoted as follows:
…“Its network of scientific advisers got it wrong,” writes Woolhouse in his book, The Year the World Went Mad. These events led him to call for a reorganisation of the UK’s scientific advisory system. Britain needs to separate evidence generation from evidence evaluation, and that was not happening during the pandemic, he argues, adding: “Next time, we need to have proper independent scrutiny of evidence from modelling or any other source.”…
Alas, I do not see how academia could assemble an adversarial team when the argument has progressed to a stage that the team would be regarded as XXX-deniers. As a second best, perhaps, before taking scientific advice, policy-makers should request that their intelligence services provide a report on the likely bias, susceptibility to political influence, and general reliability of the organisations producing that advice. :-)
One aspect of this is that it isn’t so clear how to do statistics “correctly”. Of course we are the “experts” and we like to think that we understand how to do it and those idiots in different fields don’t (or they are not idiots but villains and understand how to do it right but don’t do it anyway). But if an informed non-statistician looks at statistics, even at work done by “the experts”, they may well get the impression that it’s a holy mess.
Just to illustrate: Statistical methods have formal assumptions, these assumptions are taught, and we also teach to care about them. But then statistics is applied all the time in situations in which model assumptions are violated (beginning from the fact that we assume a continuous distribution for data that are discrete). Statisticians seem to somehow say “model assumptions are violated so you can’t do this, do something else instead”, and sometimes “model assumptions are violated but I don’t care” (for example applying something that assumes continuous data to discrete). Occasionally a statistician explains why one violation is harmful and another one isn’t, but this happens very rarely and isn’t always clear cut, and people may well never be exposed to such a discussion before they have a permanent position in research.
Furthermore, some intelligent statisticians ;-) have warned about the “garden of forking paths” issue, but professional intelligent statisticians look at data informally to make decisions about what analysis they run all the time despite the fact that these analyses are based on the assumption that no data dependent selection has taken place, and that there is no formal way to analyse the implications of such informal decisions.
Then there’s the Bayes-frequentist divide where group A of statisticians says “don’t do what group B tells you” and the other way round. I can go on…
We need to have some understanding for people who look at our discipline and think that either “anything goes”, or what goes and what doesn’t is extremely cryptic and nobody really understands it (or at least those who do don’t communicate it in a comprehensible manner). Well and then… they do all kinds of stuff that we don’t like… but surely we like what we do ourselves!?
I wish I had a solution for all this but I don’t.
Christian:
I agree with you. Just one thing about the forking paths. I’ve “warned” about the implication of the garden of forking paths for p-values. I don’t think forking paths are a bad thing. I follow forking paths all the time; indeed, my colleagues and I are writing a book, Bayesian Workflow, that is all about forking paths in data analysis and computation. As Loken and I discuss in our article, forking paths are a particular concern for p-values because the p-value is explicitly defined based on what would have been done had the data come out differently. Forking paths have relevance to all statistical analyses, but I think they’re much more of an issue for p-values than for other inferential summaries.
In general I recommend displaying and analyzing all comparisons of potential interest rather than selecting among them using statistical significance or Bayes factors or anything like that. Indeed, just a few days ago I read a paper and I recommended that the author use a table or graph to display an array of comparisons in one place, rather than having lots of little tables and jumping around saying this is significant, this is not, etc. The fundamental problem was not “forking paths” but rather that summarizing in that jumpy way adds noise and makes it harder to learn from the data.
Yeah, I agree, even though my wording may not have suggested it. That’s the beauty and the difficulty of statistics. I do however think that we need to better explain why, how and when it’s a good thing, and when it isn’t, on what basis (strongly connected by the way to the objectivity/subjectivity issue).
I think the same thing is true about economics. Economists disagree about “how to do economics correctly.” The subject has its own forking paths where theoretical assumptions are made or ignored and recommendations diverge as a result. I think those outside the profession (and many inside) have the same reaction you state:
“We need to have some understanding for people who look at our discipline and think that either “anything goes”, or what goes and what doesn’t is extremely cryptic and nobody really understands it (or at least those who do don’t communicate it in a comprehensible manner). Well and then… they do all kinds of stuff that we don’t like… but surely we like what we do ourselves!?”
I wish it was solely a matter of not communicating effectively with non-economists (though that is part of the problem). But much of the reason has to do with a recent post about junk science. No amount of training is going to get economists to agree on policy recommendations. Unfortunately, the result is often dismissal of anything economists say. Another unfortunate result is that non-economists end up with zero understanding of the most basic idea (international trade is a prime example – the economics is quite clear, the politics is not).
Sorry about the reference to a recent post about junk science – I didn’t realize this comment was in that post!
A good example of religious ideology infiltrating/underpinning social science research:
https://www.sciencedirect.com/science/article/pii/S2666560324000823
“This essay highlights the normative theological commitments that inform HFP’s measures and findings, and it foregrounds the risks of introducing theological views into the design and conduct of social scientific research. If religious commitments shape concept formulation and measurement, then those commitments will inevitably influence the findings and policy recommendations that result. Researchers, policymakers, public health professionals, and others interested in engaging with HFP’s instruments, findings, and recommendations need to be aware of the context of their emergence as well as the normative assumptions upon which they rest.”
What I was hoping to convey, more than anything, is not that we’re drowning in garbage science but that there are local ecologies where puglistic noise merchants thrive. The conditions under which that occurs are the fascinating part, and effectively boil down to whether there’s a large enough ladder to pick whatever fruit is left on the tree—and demand for more fruit. If not, someone will be willing to bring back a basket of rocks and weigh it as apples. It isn’t interesting, really, who that person is.
In the case of the Protzko paper, for example, there (apparently) isn’t enough of a collective epistemic ladder in metascience to realize that causal claims require variation in treatment; or to reliably check a prereg… and plenty of demand for evidence that various reforms fix the replication crisis which motivated the reforms in the first place. Of course someone is going to cobble together a paper touting ‘the reforms are working’, without doing the careful work of understanding their data and appraising it earnestly. Here, if the field had more of a ladder (Devezer’s formal methodology) and a bit less demand for one type of result (Rubin’s QMPs) , one couldn’t make such claims and hope to get published. There’s plenty of fruit on the tree but it’s just out of reach of collecting whatever apple-shaped objects are on the ground below. Emerging fields, like “Mturk studies but use LLMs instead” are bound to produce quite a bit of nonsense, not for lack of interesting questions but for poor methods and theory. If significance is your inference criteria; don’t be surprised if your infinte population of LLM responses is significantly different from humans in some way (or not-different, then sell that).
In metascience the tree hasn’t been plucked, but in some fields it has. We have a pretty good understanding of the proximate impacts of nutrition—have someone survive on Mountain Dew, potato chips, and pizza for a year and I’m gonna go ahead and guess their health is worse off. Big effects at this scale have been found, but what we need are the interventions in the complex systems that define diet; relegating some people to have enough food in terms of calories but few options for avoid the dew/pringles/pizza diet. Data from import-heavy, poorer island nations screams signal. Other cases, we actually have the science that matters mostly figured out, but ladders keep getting longer and good work is going on. Still there’s enough noise and incentive structure to be a contrarian (Climate, vaccines), and challenge consensus.
But this isn’t all, or even most of science! Lencapavir trials had as big of an effect size as you’d ever hope to see on an issue that couldn’t be more important (preventing HIV infection). Doubting that it’s a worthwhile intervention, or the product of p-hacking would be *absurd*. What’s more interesting is that progress in developing something like Lencapavir followed decades of tiny effects and failures, disapointments. Years ago we might have mistaken the field for one stuck in the mud like many others. I think it would have been wrong to say it was drowning in junk because the tendency of those studies was to find promising avenues; which often floundered. It also isn’t abundantly clear it was waste, because claiming that requires a fraught assumption that had those studies not been conducted we’d have wound up Lencapavir. Even if the interevening years were entirely junk, they would have served the purpose of keeping the field and hope alive until the stars aligned to produce Lencapavir. Zooming out, it’s not clear that a literature of disappointing findings is drowning, perhaps its treading water.
And of course there are still other fields doing just fine, or stuck for reasons that seem, well, reasonable. Hubble tension isn’t because of bias, p-hacking, or anyone trying to be political contrarians. It’s a *real* scientific problem and answering it would almost certainly help us understand the universe better. Turns out it’s just hard to measure the expansion rate of the universe and develop a theory that explains, well, everything. I’m OK with science getting a little stuck because the questions are hard. Overall, I think I’d push back on the notion that we’re drowning in junk science and instead frame it as interesting questions about the ecologies under which science becomes messy (metascience, nutrition); whether and when this is bad (HIV prophylaxis); where science works well; and what can be done in terms of ladder-building to kick a field from messy to good. Good to avoid getting blinkered by various piles of garbage and the people who litter, so we can ask these broader questions.
“puglistic”?
Pugilistic*
Boxers of sorts, the kind of scientists who treat others how Fisher, Neyman and Pearson treated Jeffreys. You know the type.
I had been wondering if that was typo for “pugilistic,” but that didn’t seem quite right, because my impression is that noise merchants are looking for happy harmony and an unending stream of publications, corporate and government contracts, Ted talks, Freakonomics podcasts, and NPR profiles. They’re not pugilistic at all; indeed, they’re kind of puzzled that skeptics keep getting on their case.
Jeffreys was well-regarded by Fisher, and they got along personally despite their statistical disagreements. This was unusual for Fisher. I don’t know how Neyman and Pearson treated Jeffreys.
Depends on which noise merchants you’re talking about. Kemmedy, Rogan, et al. sell noise by “fighting” the science “establishment” (with reams of bullshit).
I don’t know about that case, but have a non-absurd example:
It seems every cancer drug has “side-effects” of nausea, vomiting, etc. Then caloric restriction for slowing tumor growth has a century of evidence behind it. For the sake of argument*, say the actual mechanism by which a cancer drug “works” is caloric restriction. Wouldn’t the existence of the much cheaper and safer intervention (caloric restriction) that accomplished the same thing render that drug pretty worthless?
Point being that the value of the intervention is relative to the alternatives.
* I have never seen a cancer trial check for this Eg, have a caloric restriction control group, or even compare a rough proxy like weight loss to the tumor growth rates.
https://www.science.org/content/blog-post/starving-cancer-cells-where-they-live
Cancer drugs target a lot of different pathways in tumour cells and there are other therapies like CAR-T that kick the immune system into action to destroy tumours. The link above has some interesting info on the calories-and-cancer issue. However, slowing tumour growth is ok as far as it buys some time but what you really want is tumour destruction. There are a lot of different types of cancer and not all are going to be sensitive to any one therapy but if this nutrient competition approach works for some cancers, that would be great.
Even if Andrew was right that researchers [in real fields] think “yeah, they’ll do an experiment or a data analysis or whatever, but that’s just a means to an existing end. They’re not doing science.” It’s not necessarily a problem for the field – not everyone needs to want to do science. That is, scientific progress doesn’t need to occur in individual minds in order for there to be collective progress. I see this as similar to efficient markets, where even if a majority of money and a majority of trades are “dumb,” the fact that some traders are using information and trying to profit means that the aggregate movement is towards correct prices, and the noise actually provides useful friction. Bad papers can be falsified, and push towards better overall understanding; except that unlike finance, in science, we have a much stronger bias towards reality, since reality gives us actual feedback every time anyone runs a test, regardless of their epistemic virtue. Moving too far from actual scientific ideas makes it increasingly difficult to torture your data into supporting it. So as long as there are some people actually checking and looking for truth, it seems science as a field should progress.
But you made a narrower claim – it’s “not that we’re drowning in garbage science but that there are local ecologies where puglistic noise merchants thrive.” This seems to push against my claims, in that those local ecologies could go crazy permanently, without correction. I think this is plausible and very unfortunate, but even diverted effort leading nowhere doesn’t hurt the useful areas, except by stealing otherwise useful funding. Subfields that don’t have substantive predictive value are unfortunate, but aren’t particularly destructive of the broader way that science evolves. (Astrology didn’t need to stop for progress to happen in astronomy, even though it’s garbage.)
Wrote a response to this! https://open.substack.com/pub/stepstophaeacia/p/does-science-require-truth-seeking?
tl;dr is that I don’t think you can hope for unbiased researchers. You have to design the system with the dogmatism of researchers in mind.
Ben:
In your post, you summarize my argument as:
And you continue:
I see two problems with your argument. First, you seem to be confusing scientific practice with morality. As I’ve said many times, honesty and transparency are not enough. This goes both ways: (1) just because someone is doing bad statistics–bad enough to invalidate their research claims–that doesn’t mean they are bad people; (2) you can a good person and trying your best to be good science but still be stuck using useless methods. To flip this around, I don’t care how wonderful a scientist is, it doesn’t matter if, in your words, that scientist “is a beacon of hope and light, always ready to sacrifice their personal beliefs at the altar of truth”; if that wonderful person is trying to answer questions that can’t be answered with the data and hand, it’s not going to work.
The other thing is that I think you’re framing this as a false dichotomy, between what you’re calling “bias” and what you’re calling being “engaged in a fearless pursuit of objectivity.” I think that most scientists are doing their best, making use of the tools available to them. It’s fine to have research goals: if you think treatment X should work, it makes sense that you study X and not Y or Z, and it makes sense that you will try your best to design an experiment to prove the efficacy of X. You can be “biased” in this sense but still not see research as a “kind of annoying paperwork.”
To put it another way, if I express the hope that scientists will not see research as a “kind of annoying paperwork,” I’m not asking for some sort of moral regeneration; it’s enough for scientists–including those with strong views and goals–to recognize that research is itself a valuable tool to help you proceed. Even if you’ve already convinced yourself that X is effective and important, careful research on X should enable you to understand the conditions under which X will be more effective.
Ten years ago, I published an article, The connection between varying treatment effects and the crisis of unreplicable research: A Bayesian perspective, making the connection between better scientific practice and the technical idea of treatment interactions. If a researcher is naively following the traditional statistical paradigm of trying to estimate “the treatment effect,” and the researcher is already convinced that the treatment works, then, yeah, it almost makes some kind of sense to think of research as “paperwork,” a lot of effort undertaken just to “prove” something that is already “known.” But once the researcher opens the door to the idea that the treatment works under some conditions but not others, and indeed sometimes has negative impacts, then there’s a clear motivation for research. It has a purpose.
Excellent; I updated the post with a link to your response.
I won’t retaliate here, I’ll think it over and see if this changes my mind :)
Ben:
Please don’t “retaliate”; I’d rather discuss! I do think you make valuable points in your post, and I expect there will be further value in clarifying how your points and mine are consistent (and maybe pointing out things I got wrong).
I would argue science as a semiotic process already does work this way – fields may only progress slowly, but as long as there are empirical tests that can be performed using good methods, science is generally robust to bad work in the limit. That doesn’t make this optimal, but I think it does point to real progress amidst the current “crises” – we don’t actually throw out decades of bad work, we learn from them even if they are invalid. (So, for example, power posing and lots of framing effects work have been usefully thrown out, and the narrower effects that do exist are better understood. That’s not a great use of resources, but it was figured out and the actual science slowly emerged.)
That’s an interesting introductory sentence to your blog post, Ben. It’s understandable that you portray a dismal picture of empirical science on a blog since that helps attract eyeballs, but those seem lazy, second-hand notions to me. Here’s an example of what one might want to find out if seriously interested in replication/reproducibility in empirical science:
In the cancer replication project that I assume you are referring to, a bunch of experiments in several studies were not replicated at a particular point of time quite a few years after the original studies were published (you can search for “cancer replication project” for details). That doesn’t necessarily have much to say about the progression of a scientific field (we’re ultimately much more interested in reproducibility), and in fact the authors of the replication efforts were honest and helpful about the significance of their analysis in a way that has been lost in a deluge of (IMO) agenda-led science bashing of which reference to “replication crises” is a common theme.
I have a problem with the extent to which the somewhat manufactured “crisis” POV is uncritically accepted when in fact it wouldn’t be so difficult to establish whether (referring to the cancer replication project) a non-replicated paper was in fact usefully reproducible. I had a go with just one of the papers in the cancer project (this was one of their “Partial Replication”s): Sharma, S. V. et al. A chromatin-mediated reversible drug-tolerant state in cancer cell subpopulations. Cell 141, 69–80 (2010).
It seems pretty obvious that the observation of subpopulations of cancer cells that become tolerant to anticancer drugs is highly reproducible and that this involves amongst other things, chromatin remodelling. This was a hugely important observation [you could read about it here if you felt like it: Boumahdi, S., de Sauvage, F.J. The great escape: tumour cell plasticity in resistance to targeted therapy. Nat Rev Drug Discov 19, 39–56 (2020)].
What are we to make of a study that wasn’t replicated at a point of time but which has been multiply reproduced and lead to fundamental insights? What does it have to say about the progression of scientific fields that you discuss in your blog post? I don’t think one can reliably address these questions without stepping outside of the “drowning in junk science” perspective.
Chris –
, but those seem lazy, second-hand notions to me.
Interesting. I didn’t read that as Ben saying it’s necessarily true that emprical science are an epistemic nightmare, but that it’s a common complaint that they are. And not in a Trump aphophasis kind of way (“Many people are saying I’m the most Christ-like president who ever existed”…)
And I think it’s fair to say that “Psychology, medicine, nutrition science, cancer research, economics, and other disciplines have dealt with embarrassing replication crises over the past decade. All have massive, conflicting literatures, suggesting that much of the published research is wrong.” is more or less a statement of fact..
My take was that he’d largely agree that there’s a problem with the “somewhat manufactured” aspect of the “crisis” framing. Maybe I need to go back and read it again. Did I get it wrong?
I didn’t look into the details of that topic, however this sounds like *direct* replication failed but *conceptual* replication succeeded.
This pattern is very common in medical literature and is expected based on the standard practices.
How it works is that by changing a key detail, there is an “out” any time the conceptual replication fails.
Eg, original study was done in male mice, but replication used female mice.
If results are similar, great! Successful replication!
If not, great! We learned something new about the difference between males and females!
In this way a byzantine misinformation-filled narrative can be progressively weaved for generations without ever actually confirming anything or solving the problem society expects the researchers to be working on (eg, curing disease).
This was already described in 1967 by Meehl here:
https://statmodeling.stat.columbia.edu/2016/05/06/needed-an-intellectual-history-of-research-criticism-in-psychology/
Also the cancer replication project had to drop half the studies because no one from the original labs could share enough info to even attempt a replication. Stuff like DMSO concentration (the hospital pharmacist made it and got a new job) and so on. So they are starting from half being unreproducible even in principle.
Another way of putting all this is that all of the quality assurance/control has been dropped so cancer research is freerunning detached from any real feedback.
Joshua, maybe but it’s not so clear to me and Ben does go on to say “I agree with Gelman that it’s a mess out there—many empirical sciences are in crisis, and much “research” of the past 20 years should be heavily discounted.” That sounds to me that he’s agreeing with the “crisis” narrative.
In the field that I am familiar with (broadly speaking, cell/mol biol and preclinical biomed science) I don’t believe the crisis narrative applies and it’s lazy (especially for a PhD researcher with an interest in “truth-seeking”!) to spread the crisis narrative (which may apply to psychology and social science – that’s certainly the impression one gets on this blog), into fields where it may not apply – that simply serves dubious agendas.
I do agree with Ben though about the value of dogged persistence in science.
I disagree with your statement (my edit) that: “…cancer research has a massive, conflicting literature(), suggesting that much of the published research is wrong.” is more or less a statement of fact... That’s an assertion that really requires some evidence. IMO the cancer replication doesn’t provide that for reasons including what I wrote in my post (studies that failed to replicate at one moment in time but were actually widely reproduced
don’t constitute evidence that “much of the published research is wrong”).
It would be valuable for those interested in this topic to go back to the original Cancer Replication Study papers and read what conclusions the authors made. In fact their interpretations were not that the studies they failed to replicate were “wrong”, but that better means of documenting and disseminating research finding would be incredibly useful…many of their suggestions have been or are being implemented.
Unfortunately this nuance has been buried beneath a deluge of agenda-led misrepresentation which unfortunately is reaping some of its rewards (IMO also).
I don’t think that’s right Anoneuoid. A paper was published that apparently showed that subpopulations of cancer cells can become tolerant to anticancer drugs and that this involves changes in chromatin. The cancer replication study was able to only partially replicate the experiments in the study.
In the mean time a vast body of research following up this early study has reproduced this observation widely including in humans. Failure to replicate a study at one particular moment in time doesn’t somehow negate all the subsequent resarch that validates the conclusions. As the authors of the Cancer Replication Project are careful to point out, failure to replicate doesn’t mean that the study is “wrong”.
Here is from the original:
https://pmc.ncbi.nlm.nih.gov/articles/PMC2851638/
Then the replication is reported here: https://elifesciences.org/articles/73430#content
I will parse the jargon. The original authors reported if you treat cancer cells in a dish with a certain anti-cancer drug dosage schedule, that 99+% of them will die*. The remaining ~1% are deemed special and resistant to the cancer drug. Further that they were still affected by the drug (suppressed EGFR kinase activity) in the same way as it is expected to kill the cells. However, at very high doses, those resistant cells will still be killed by the drug.
The replication found that the original authors did not report the DMSO (solvent) concentration used to dissolve the cancer drug. Note that DMSO is well-known for affecting cell death and proliferation rates on its own, via an unknown mechanism. So already they can’t be sure they are actually replicating the study. They try out a couple concentrations and pick one that seems reasonable.
They find that using the reported drug dose schedule, more like 5% of cells survived and the EGFR activity was *not* suppressed. However, they kind of see a similar result if the drug is given in the very high doses *also* reported to kill the resistant cells in the original study. But not really, because there is *still* substantial EGFR kinase activity even at these high doses. Such results indicate the cells could be resistant because they are pumping the drug out or it isn’t getting in, or something.
At this point they decide its unclear whats going on, this is a mess, let us devote our resources to something else.
So the original conclusion that there is a very small percent of cells resistant to the cancer drug for reasons besides pumping the drug out, metabolizing it, etc is not supported by the replication.
Shouldn’t that affect your claim that “subpopulations of cancer cells that become tolerant to anticancer drugs is highly reproducible and that this involves amongst other things, chromatin remodelling”? If not, then what is the point of these experiments?
I mean it seems obvious to me that there will be subpopulations of resistant cells too. We don’t need dubious experimental results to think that.
* Really it doesn’t measure cell death, they measure the density of cells which is determined by both death rate and proliferation rate. But for simplicity here I refer to it as a cell killing effect.
Anoneouid
What do you mean by “these experiments”? The original study or the replication attempts?
In any case I think we can be guided by the authors of the replication study who say (amongst lots of other things):
And:
Those seem quite appropriate statements in the context although I would substitute “reproductions” for “replications” (or “replications and reproductions”) in the last two sentences of the second quote. For example, the original study was done using a human lung cancer cell line (PC9) but we are very interested in the possibility that the observations can be more widely generalized to other contexts and that’s what the follow up work has also done. As is implicit in the statements of the replicability project I pasted above, the inability to replicate some of the experiments from the original study at some point in time doesn’t negate the reality that the observations (a subpopulation of cancer cells that become resistant to drug treatment; the mechanism seems to be epigenetic rather than a result of mutations; the cells can undergo reversible phenotypic switching etc.) have been reproduced by others in other contexts. I cited a review above where one can explore this.
Otherwise, I agree with the conclusions of the authors of the Cancer Replicability Project. It would be very helpful if mechanisms for making it easier for researchers to repeat published experiments were implemented – some of this is being done (more extensive methods and documentation as in the STAR Methods used by CellPress journals and similar by others; video’s to support experimental descriptions; stricter enforcement of data and resource sharing which was clearly a problem with some of the replication attempts etc.) The latest NIH/NSF government budget had (IIRC) a $10million sum to explore replicability. I would like to see a system (would be easy to implement) whereby authors could update or supplement their published papers with recent information about their experiments. In my experience lots of struggles with replicating some one elses study (and even one’s own) results from some technical change in a reagent, or a manufacturer has changed the coating on their multiwell plates which don’t work anymore as described in the paper etc. and some of this could be alleviated if authors could easily provide updated infomation.
Stop! We’re in what Phil so memorably has called “garbage time.”
Andrew —
You write
I’m having trouble threading the needle between what I call “bias” and what you call seeing research as “a kind of annoying paperwork.” You seem to view these as distinct. For all practical purposes, they seem synonymous to me.
Take the recent brouhaha over teen mental health and social media. My view on the situation is that the statistical evidence for causation is weak to non-existent. But several prominent psychologists who shall remain nameless are staking their reputations on the connection. I’m sure they are biased in favor of this conclusion, and view their “meta-analyses” as simply a means to prove their point (i.e., as annoying paperwork that they need only show in their substack posts, not in their books or Atlantic articles).
On my view, it’s too high a bar to ask them *not* to view statistics as annoying paperwork. Instead, cultural + scientific norms need to enforce that they do their statistics correctly, so that the rest of us can be more assured of their conclusions.
I can’t tell to what extent we’re disagreeing :)
Ben –
I enjoyed your post and agree that the truth-seeking debate often becomes too binary. Most scientists are a mix—biased, sometimes annoyed by the grind of research and publishing, yet still driven to uncover scientific truth. This human complexity can lead to unproductive finger-pointing and tribalism, which hinders progress. Your focus on the system’s role in filtering errors is spot-on. It would help if it were easier, but at least more common, for people to accept that we’re all necessarily vulnerable to cognitive biases.
Some but not all of the physics education research types would be pretty explicit in conversations about seeing “research itself as a kind of annoying paperwork”.
What is up with physics education anyway?
It seems like I constantly come across stuff like “the aether was disproven”, then digging deeper find “Well, Einstein said he just renamed the aether to spacetime”.
Or “speed of light is constant in a vacuum.” Dig deeper: “Well, Einstein said this is only true in a homogenous universe, ie one without gravity. Actually the speed slows down in stronger gravitational fields.”
This happens so often, practically every physics topic you can dig into. There is some misleading superficial description, then a real description so different it may as well contradict the simple one.
Or “CO2 causes warming”. “Well, actually its the water vapor per se that causes all the warming”.
The sort of physics education issues that I was discussing here are more politicized ones, not the genuine difficulties in working over from familiar concepts to less familiar ones.
Still, it’s fun to discuss the points you raised.
On spacetime/aether: Einstein’s claim was that all physical laws, including all the ones that had not been found at the time, would be invariant under a broad class of coordinate transformations. That’s radically different from any previous aether theory.
On speed of light: The constancy holds on any local coordinate patch. However you choose to describe the space-time light paths on a larger scale it cannot be “the speed slows down in stronger gravitational fields” since the gravitational field is not an invariant. Perhaps you are referring to how clock rates change as a function of gravitational potential, not field.
On “its the water vapor per se that causes all the warming”: No, CO2 directly causes a lot. It’s primarily H2O vapor that causes the positive feedback that roughly doubles the direct effect.
Regarding the aether:
https://www.gutenberg.org/files/7333/7333-h/7333-h.htm
I actually posted about the variable speed of light when I was looking into it here:
https://statmodeling.stat.columbia.edu/2025/02/14/maybe-they-should-just-write-some-papers-about-their-priors-and-not-mess-around-with-actual-data/#comment-2392480
For greenhouse effect, the wikipedia claim is that 70+% is due to water vapor (clouds also require water vapor):
https://en.wikipedia.org/wiki/Greenhouse_gas#Contributions_of_specific_gases_to_the_greenhouse_effect
I’ll also note:
Following ref 2:
Yet for the moon (which has no atmosphere) the same calculation gives ~270 K:
https://nssdc.gsfc.nasa.gov/planetary/factsheet/moonfact.html
Obviously, this difference in albedo is because the moon has lacks substantial clouds/ice which require an atmosphere that includes water on the Earth. Point is, there are techinical issues with that calculated greenhouse effect to begin with.
If you could resolve the apparent discrepancies for me that would be great!
I will bow out of the thread either way, but be sure to read any responses.
I agree with Ben’s points about bias above – it cannot be removed from science nor should it be. I also agree with Andrew’s response that bias is fine, but it should not stand in the way of examining evidence and being open to critique and change. The comment about viewing statistics as “annoying paperwork” is valid – bias need not, and should not, make it so. I think both points (as well as other comments) point to the need for the system to work better – to promote good scientific practice despite the reality of biased humans. And, our current systems don’t seem to be doing a very good job.
In that context, there is a comment by David Manheim above that got my attention. He certainly is more optimistic than me regarding the self-correcting nature of science, in the long run:
“I would argue science as a semiotic process already does work this way – fields may only progress slowly, but as long as there are empirical tests that can be performed using good methods, science is generally robust to bad work in the limit.”
We can certainly disagree about this optimism. I usually see the glass as 1/3 full. But I think an important issue is how this “semiotic process” is changing over time. I think technology has dramatically increased the speed of change. I would argue that the challenges to the self-correcting nature of science have increased more rapidly than the process itself. The long run, or “limit,” is becoming more elusive. While we wait for dangerous unscientific beliefs to be cast aside, more and more damage occurs. Each generation (showing the burdens or wisdom of age) seems to be less capable of evaluating evidence, tolerating ambiguity, and exhibiting patience in learning. Self-correcting processes take time, and time is the one resource that seems to be getting scarcer the most quickly.