And, if we really want to get real, let’s be open to the possibility that the effect is positive for some people in some scenarios, and negative for other people in other scenarios, and that in the existing state of our knowledge, we can’t say much about where the effect is positive and where it is negative.

Posted on June 24, 2019 9:51 AM by Andrew

Javier Benitez points us to this op-ed, “Massaging data to fit a theory is not the worst research sin,” where philosopher Martin Cohen writes:

The recent fall from grace of the Cornell University food marketing researcher Brian Wansink is very revealing of the state of play in modern research.

Wansink had for years embodied the ideal to which all academics aspire: innovative, highly cited and media-friendly.

I would just like to briefly interrupt that not all academics aspire to be media-friendly. I have that aspiration myself, and of course people who aspire to be media-friendly are overrepresented in the media—but I’ve met lots of academics who’d prefer to be left in peace and quite to do their work and communicate just with specialists and students.

But that’s not the key point here. So let me continue quoting Cohen:

[Wansink’s] research, now criticised as fatally flawed, included studies suggesting that people who go grocery shopping while hungry buy more calories, that pre-ordering lunch can help you choose healthier food, and that serving people out of large bowls leads them to eat larger portions.

Such studies have been cited more than 20,000 times and even led to an appearance on The Oprah Winfrey Show [and, more to the point, the spending of millions of dollars of government money! — ed.]. But Wansink was accused of manipulating his data to achieve more striking results. Underlying it all is a suspicion that he was in the habit of forming hypotheses and then searching for data to support them. Yet, from a more generous perspective, this is, after all, only scientific method.

Behind the criticism of Wansink is a much broader critique not only of his work but of a certain kind of study: one that, while it might have quantitative elements, is in essence ethnographic and qualitative, its chief value being in storytelling and interpretation. . . .

We forget too easily that the history of science is rich with errors. In a dash to claim glory before Watson and Crick, Linus Pauling published a fundamentally incoherent hypothesis that the structure of DNA was a triple helix. Lord Kelvin misestimated the age of the Earth by more than an order of magnitude. In the early days of genetics, Francis Galton introduced an erroneous mathematical expression for the contributions of different ancestors to individuals’ inherited traits. We forget because these errors were part of broader narratives that came with brilliant insights.

I accept that Wansink may have been guilty of shoehorning data into preconceived patterns – and in the process may have mixed up some of the figures too. But if the latter is unforgivable, the former is surely research as normal.

Let me pause again here. If all that happened is that Wansink “may have mixed up some of the figures,” that this is not “unforgivable” at all. We all “mix up some of the figures” from time to time (here’s an embarrassing example from my own published work), and nobody who does creative work is immune from “shoehorning data into preconceived patterns.”

For some reason, Cohen seems to be on a project to minimize Wansink’s offenses. So let me spell it out. No, the problem with the notorious food researcher is not that he “may have mixed up some of the figures.” First, he definitely—not “may have”—mixed up many—not “some”—of his figures. We know this because many of his figures contradicted each other, and others made no sense (see, for example, here for many examples). Second, Wansink bobbed and weaved, over the period of years denying problems that were pointed out to him from all sorts of different directions.

Cohen continues:

The critics are indulging themselves in a myth of neutral observers uncovering “facts”, which rests on a view of knowledge as pristine and eternal as anything Plato might have dreamed of.

It is thanks to Western philosophy that, for thousands of years, we have believed that our thinking should strive to eliminate ideas that are vague, contradictory or ambiguous. Today’s orthodoxy is that the world is governed by iron laws, the most important of which is if P then Q. Part and parcel of this is a belief that the main goal of science is to provide deterministic – cause and effect – rules for all phenomena. . . .

Here I think Cohen’s getting things backward! It’s Wansink’s critics who have repeatedly stated that the world is complicated and that we should be wary of taking misreported data from 97 people in a diner to make general statements about eating behavior, men’s and women’s behaviors, nutrition policy, etc.

Contrariwise, it was Wansink and his promoters who were making general statements, claiming to have uncovered facts about human nature, etc.

Cohen continues a few paragraphs later:

Plato attempted to avoid contradictions by isolating the object of inquiry from all other relationships. But, in doing so, he abstracted and divorced those objects from a reality that is multi-relational and multitemporal. This same artificiality dogs much research.

Exactly! Wansink, like all of us, is subject to the Armstrong Principle (“If you promise more than you can deliver, then you have an incentive to cheat.”). Most scholars, myself included, are scaredy-cats: in order to avoid putting ourselves in a Lance Armstrong situation, we’re careful to underpromise. Wansink, though, he overpromised, presenting his artificial research has yielding general truths.

In short, we, the critics of Wansink and other practitioners of cargo-cult science, are on Cohen’s side. We’re the ones who are trying to express scientific method in a way that respects the disconnect between experiment and real world.

Cohen concludes:

Even if the quantitative elements don’t convince and need revising, studies like Wansink’s can be of value if they offer new clarity in looking at phenomena, and stimulate ideas for future investigations. Such understanding should be the researcher’s Holy Grail.

After all, according to the tenets of our current approach to facts and figures, much scientific endeavour of the past amounted to wasted effort, in fields with absolutely no yield of true scientific information. And yet science has progressed.

I don’t get the logic here. “Much endeavour amounted to wasted effort . . . And yet science has progressed.” Couldn’t it be that, to a large extent, the wasted effort and the progress has been done in by different people, different places?

To use general scientific progress as a way of justifying scientific dead-end work . . . that’s kinda like saying that the Bills made a good choice to keep starting Nathan Peterman, because Patrick Mahomes has been doing so well.

Who cares?

So what? Why keep talking about this pizzagate? Because I think misconceptions here can get in the way of future learning.

Let me state the situation as plainly as possible, without any reference to this particular case:

Step 1. A researcher performs a study that gets published. The study makes big claims and gets lots of attention, both from the news media and from influential policymakers.

Step 2. Then it turns out that (a) the published work was seriously flawed, and the published claims are not supported by the data being offered in their support: the claims may be true, in some ways, but no good evidence has been given; (b) other published studies that appear to show confirmation of the original claim have their own problems; and (c) statistical analysis shows that it is possible that the entire literature is chasing noise.

Step 3. A call goes out to be open-minded: just because some of these studies did not follow ideal scientific practices, we should not then conclude that their scientific claims are false.

And I agree with Step 3. But I’ve said it before and I’ve said it again: We should be open-minded, but not selectively open-minded.

Suppose the original claim is X, but the study purporting to demonstrate X is flawed, and the follow-up studies don’t provide strong evidence for X either. Then, of course we should be open to the possibility that X remains true (after all, for just about any hypothesis X there is always some qualitative evidence and some theoretical arguments that can be found in favor of X), and we should also be open to the possibility that there is no effect (or, to put it more precisely, an effect that is in practice indistinguishable from zero). Fine. But let’s also be open to the possibility of “minus X”; that is, the possibility that the posited intervention is counterproductive. And, if we really want to get real, let’s be open to the possibility that the effect is positive for some people in some scenarios, and negative for other people in other scenarios, and that in the existing state of our knowledge, we can’t say much about where the effect is positive and where it is negative. Let’s show some humility about what we can claim.

Accepting uncertainty does not mean that we can’t make decisions. After all, we were busy making decisions about topic X, whatever it was, before we had any data at all—so we can keep making decisions on a case-by-case basis using whatever information and hunches we have.

Here are some practical implications. First, if we’re not sure the effect of an intervention, maybe we should think harder about costs, including opportunity costs. Second, it makes sense to gather information about what’s happening locally, to get a better sense of what the intervention is doing.

All the work that you haven’t heard of

The other thing I want to bring up is the selection bias involved in giving the benefit of the doubt to weak claims that happen to have received positive publicity. One big big problem here is that there are lots of claims in all sorts of directions that you haven’t heard about, because they haven’t appeared on Oprah, or NPR, or PNAS, or Freakonomics, or whatever. By the same logic as Cohen gives in the above-quoted piece, all those obscure claims also deserve our respect as “of value if they offer new clarity in looking at phenomena, and stimulate ideas for future investigations.” The problem is that we’re not seeing all that work.

As I’ve also said on various occasions, I have no problem when people combine anecdotes and theorizing to come up with ideas and policy proposals. My problem with Wansink is not that he had interesting ideas without strong empirical support: that happens all the time. Most of our new ideas don’t have strong empirical support, in part because ideas with strong empirical support tend to already exist so they won’t be new! No, my problem with Wansink is that he took weak evidence and presented it as if it were strong evidence. For this discussion, I don’t really care if he did this by accident or on purpose. Either way, now we know he had weak evidence, or no evidence at all. So I don’t see why his conjectures should be taken more seriously than any other evidence-free conjectures. Let a zillion flowers bloom.

33 thoughts on “And, if we really want to get real, let’s be open to the possibility that the effect is positive for some people in some scenarios, and negative for other people in other scenarios, and that in the existing state of our knowledge, we can’t say much about where the effect is positive and where it is negative.”

jd on June 24, 2019 10:17 AM at 10:17 am said:

I think the Armstrong Principle needs another name, or another application. If I remember correctly Armstrong said he realized brought a knife to the gunfight, so he decided to go the gun store. He started doping long before a Tour victory and long before cancer and long before he was a household name.
As it started, he simply wanted to compete in Europe, where he felt everyone was doping, and not go home.
As it stands, the Armstrong Principle sounds like you make of big promise, realize you can’t deliver, and so now have an incentive to cheat. But for Armstrong, the doping came first, as a means to even be there. It was an entry card.
The story just doesn’t sound like quite what you are getting at with the principle.
See Lance Armstrong: Next Stage (full interview) on youtube, for a recent interview.

Reply ↓
- Brent Hutto on June 24, 2019 10:37 AM at 10:37 am said:
  
  I think the publication situation is a fairly good parallel to how you describe Armstrong’s doping history. In most fields, if a junior faculty person wants to publish in any of the accepted journals they 100% absolutely, positively must report NHST p-values arrived at by some method or model that’s common in their field. Maybe they can put some alternative treatments of their data in there but NHST p-values are required.
  
  My perception of the NHST p-value thing is exactly like Armstrong’s perception that he would never even be a decent Domestique in European cycling without doping. Maybe I’m wrong, maybe he was too cynical about the doping situation in Europe. But it’s the same line of thinking and both ideas, on their face, are consistent with observed reality.
  
  A lot of the advice or guidance or whatever you call it that’s espoused on this blog is very much superior to the usual practices of NHST, p-value, intent to treat arm waving, non-hierarchical group-by-time definitions of treatment effect and so forth. But it’s advice that’s only accessible to those who already have a seat at the table or whose careers do not depend on acquiring a seat at the table.
  
  It seems fine with me to puncture the claims of anyone selling fantastic or fanciful narratives to the media based on flawed analysis. But be careful with blanket criticism of everyone who dares to put their name on a paper with a bunch of associations listed in a table, each with or without an asterisk for “p<0.05". Just like not every cyclist who engages in doping woke up one morning and decided to cheat their way to a yellow jersey just because they think they can.
  
  Reply ↓
Anoneuoid on June 24, 2019 11:37 AM at 11:37 am said:

Linus Pauling published a fundamentally incoherent hypothesis that the structure of DNA was a triple helix

That characterization of this work looks way off. He published a precise, quantitative but tentative hypothesis and described it as such:

We have now formulated a promising structure for the nucleic acids, by making use of the general principles of molecular structure and the available information about the nucleic acids themselves. The structure is not a vague one, but is precisely predicted; atomic coordinates for the principal atoms are given in table 1. This is the first precisely described structure for the nucleic acids that has been suggested by any investigator. The structure accounts for some of the features of the x-ray photographs; but detailed intensity calculations have not yet been made, and the structure cannot be considered to have been proved to be correct.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1063734/

I’m not willing to figure out how similar the structures are but triple helical DNA does form:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2586808/

But anyway, the idea that this was somehow a problematic publication is wrong.

Reply ↓
- Jordan Anaya on June 24, 2019 4:33 PM at 4:33 pm said:
  
  Linus Pauling’s model is deserving of ridicule, he had the phosphates in the center which is impossible.
  
  Reply ↓
  - Anoneuoid on June 24, 2019 4:48 PM at 4:48 pm said:
    
    Before I discovered how useless NHST was I had come to the conclusion that everything I could imagine is possible when it comes to bio.
    
    Anyway, is this impossibility known from the data available in 1953? Were they sure the data was clean enough to be relied on? He literally says it was the first precise quantitative model… I have difficulty faulting him for that.
    
    Reply ↓
Steve on June 24, 2019 11:42 AM at 11:42 am said:

Cohen says, “The critics are indulging themselves in a myth of neutral observers uncovering “facts”, which rests on a view of knowledge as pristine and eternal as anything Plato might have dreamed of.” This is a great example of what I call the “Horns of a Dilemma Fallacy.” We start with a dichotomy that seems plausible. You reject the dichotomy and then falsely believe that you have to choose one horn of the dichotomy when, in fact, a continuum may be a perfectly good alternative to a dichotomy. Here it is the dichotomy between objective facts and opinion or between neutral observers uncovering facts and researchers whose subjective judgments enter into their research. Cohen rejects the dichotomy. Neutral observers are a myth. But, nothing follows from that. We can locate research on a continuum of degrees of researcher freedom or something similar. We are perfectly capable of distinguishing good from bad even if there is no such thing as perfect.

Reply ↓
- Martha (Smith) on June 24, 2019 11:56 AM at 11:56 am said:
  
  I mostly agree, but I’ve got a problem with the last sentence. Specifically, “We are perfectly capable of distinguishing good from bad …” itself seems to posit a dichotomy between “good” and “bad”.
  
  I’d say something more like,
  “We can still make reasonable judgments to rate search quality and credibility on a continuum from very good to very bad”.
  
  Reply ↓
  - Steve on June 24, 2019 1:36 PM at 1:36 pm said:
    
    Fair point. The problem is that we are so used to putting things into dichotomies that it is hard to thing continuously. I wasn’t supposing that there is a dichotomy between good and bad research. I am supposing that there is a continuum, and we can make a decision about on what parts of the continuum lie acceptable and unacceptable research. We still have to make decisions, which often are dichotomous (e.g., what research to publish, what research to base policy on) even if the bases of those decisions is not.
    
    Reply ↓
    - Martha (Smith) on June 24, 2019 4:07 PM at 4:07 pm said:
      
      “The problem is that we are so used to putting things into dichotomies that it is hard to think continuously.”
      
      This is an indictment of the poor quality of education. I admit that I was lucky to go to an unusually good (public) high school, but I believe that I had learned to think continuously by the time I graduated from it. It’s sad that not everyone has had such good educational opportunities. And very disturbing that so many people get through graduate school without having learned to think continuously
- Corey on June 24, 2019 12:36 PM at 12:36 pm said:
  
  I’ve heard something very similar being called the fallacy of gray, after this quote:
  
  “The Sophisticate: “The world isn’t black and white. No one does pure good or pure bad. It’s all gray. Therefore, no one is better than anyone else.”
  The Zetet: “Knowing only gray, you conclude that all grays are the same shade. You mock the simplicity of the two-color view, yet you replace it with a one-color view….”
  ― Marc Stiegler, David’s Sling
  
  (The difference is that in your fallacy one erases the distinction by disregarding one alternative and picking the other by default while in the fallacy of gray one erases the distinction by disregarding both alternatives.)
  
  Reply ↓
  - Steve on June 24, 2019 2:06 PM at 2:06 pm said:
    
    +1 I like it, both fallacies stem from the same problem, failing to recognize that a continuum could replace the dichotomy.
    
    Reply ↓
Alex on June 24, 2019 11:46 AM at 11:46 am said:

The thing about calls to be “open-minded” in the face of negative results is that we mostly hear it when the result that didn’t replicate flatters a particular mindset. Much (not all) of the replication crisis has involved studies about how humans can be manipulated. If true, these findings mean that, on the one hand, the rest of us need to be humble because we are so easily manipulable, while the social scientists who demonstrate our manipulability are amazing wizards who can control the human race. And if we just learn their simple hacks then we too can be wizards!

If tomorrow somebody did a good study finding that a particular cognitive test does not predict performance on whatever task, and the study was properly designed and executed and had a large N and was consistent with other studies, nobody would say “Now, now, let’s be open-minded.” Those who see value in cognitive tests would just say “Well, we’ve always known that these tests have limitations, and apparently they don’t predict performance on this task. That’s fine. They predict plenty of other things.” Meanwhile, those who don’t see value in cognitive tests would crow about this negative result, rather than encouraging everyone to be open-minded and wait to see if maybe a subsequent study shows something else.

Reply ↓
- Martha (Smith) on June 24, 2019 11:58 AM at 11:58 am said:
  
  Good points.
  
  Reply ↓
Peter Dorman on June 24, 2019 12:50 PM at 12:50 pm said:

The op-ed by Cohen is behind a paywall, so I won’t be giving him a close reading, but the snippets reproduced here are familiar. He is a critic of quantification, empirical skepticism and appeals to objective criteria under the banner of complexity and positionality (we are in the world rather than outside it). He emphasizes and endorses the non- or metarational side of communication (storytelling and I would assume expressive motives and playful reframing as well) while arguing the severe limitations of rational understanding predicated on intersubjectivity. Maybe I’m wrong, but the snippets seem to place Cohen within a recognizable stream of academic discourse.

I agree with Andrew’s general theme here, that, insofar as we recognize the limits of knowability, the mismatch between simple, general “laws” and a complex world, etc., we who have questioned research malpractice are on the side of humility, not hubris. But I wouldn’t want to let the perspective Cohen represents pass without scrutiny.

On some deep level, I think these anti-empiricist writers confuse necessary and sufficient conditions for knowledge. They critique the hypothetical claim that gathering and analyzing statistical data are sufficient for understanding the world, which they obviously aren’t, with the far more reasonable stance that a claim that is contradicted by the available evidence has to be substantially false. This confusion has consequences in many specific areas of methodology/epistemology; for instance, the criteria for what constitutes credible evidence depend on what the evidence is to be used for—the immaculate construction of an “objective” knowledge (which is rightfully challenged) or simply the testing and filtering of potential claims based on their consistency with the aspects of the world we are able incorporate in our research methods.

I may be overreacting here, but I’ve been forced to deal with more than my share of this stuff as a teacher and researcher, and I think the urge to live and let live should be resisted. It’s not healthy to have a large chunk of the academic and intellectual universe given over to militant irrationalism. For an example of a recent encounter with the anti-empiricist world, see https://econospeak.blogspot.com/2019/02/nonsense-on-stilted-language-review-of.html

Reply ↓
- Martha (Smith) on June 24, 2019 4:12 PM at 4:12 pm said:
  
  “immaculate construction” — great turn of phrase.
  
  Reply ↓
- Marius on June 24, 2019 9:24 PM at 9:24 pm said:
  
  Thanks for this – your essay in the linked post is fantastic. Like you, I do realize there is value to cultural criticism, and I think you do a good job of spelling out what that value is. But I have a huge issue with the “confusion often found in cultural studies between things and our understanding of them”, and it seems like this confusion isn’t challenged enough.
  
  The last research centre I worked at had a strong contingent of these anti-empiricists, and I had to sit through multiple seminars feeling immense frustration at the guff they were spouting.
  
  Reply ↓
  - Martha (Smith) on June 25, 2019 1:44 AM at 1:44 am said:
    
    From the beginning of Dorman’s essay:
    
    “Murphy’s Economization of Life” is a work of cultural critique. Its underlying premise (which it shares with many other works) is that history is essentially a sequence of forms of consciousness—concepts, mental patterns, assumed values — that determine the events that take place in the lives we actually live. At any moment a particular consciousness is dominant, and the job of the cultural critic, like Murphy, is to unmask its biases and hidden assumptions. This is accomplished by holding up examples for inspection, such as books and journal articles, produced objects, programs, laws, or anything else that might display the ruling mentality.”
    
    Wow — that underlying premise doesn’t make sense to me. The assumption that certain “concepts, mental patterns, assumed values” actually “determine the events that take place in the lives we actually live” seems really out of touch with the complexity, variability, and uncertainty of reality. Sure, there are likely to be concepts, etc. that *influence* events, but “determine” is much, much too strong to describe the world that I know.
    
    Perhaps some might argue that “determinism” was a belief that strongly influenced [Oh, how my subjectivity is influencing me! I can’t bring myself to say “determined”] many people’s beliefs in past eras, and that “complexity”, “variability”, and “uncertainty” are “modern” concepts that have influenced me. Or maybe I would have been put to death as a heretic in past eras?)
    
    Reply ↓
David Paterno on June 24, 2019 7:17 PM at 7:17 pm said:

The other odd claim I find advanced by Cohen is that ethnographic research is mere story-telling.

Such a view overlooks the reality that an ethnography, with appropriate detail and care, can be organised to collect data and advance falsifiable claims – claims that are potentially open to replication.

Reply ↓
David J. Littleboy on June 24, 2019 9:48 PM at 9:48 pm said:

In a related note:

Amy Cuddy is back.

And as a deadhead, no less. Will wonders never cease.

https://www.salon.com/2019/06/21/communing-with-the-dead-i-followed-the-grateful-dead-to-escape-and-ended-up-finding-home/

Reply ↓
- Andrew on June 24, 2019 9:57 PM at 9:57 pm said:
  
  David:
  
  This part was interesting:
  
  Hart explained to me how they make the magic happen. “We’re talking to each other. We have all these conversations going — in the rhythm, in the melody, in the nuances of everybody’s feelings there that night. If you’re aware of your partners’ feelings — really listening, really really closely — not trying to overplay him or anything like that, trying to form a group instead of a bunch of notes and soloists and playing parts — then it happens.”
  
  Maybe something could be learned from a comparative study of different musical groups, a qualitative study of how they work together and how they sometimes fail in the attempt. I guess it would help to have some stable source of income—I guess the Dead gets this from touring—as I suspect it’s easier to have group harmony when the flow of money is somewhat steady with no big increases or decreases. Not that I’m saying that money is the only thing or even the most important thing—ultimately it’s about the music and the interpersonal connections among the artists. It would just be interesting to try to understand more about the conditions where a group can work well together toward a common purpose.
  
  Reply ↓
  - David J. Littleboy on June 25, 2019 9:37 AM at 9:37 am said:
    
    Although not a deadhead, I’ve always enjoyed their music, and as someone trying to learn jazz, I respect them as solid musicians. There’s a lot in Hart’s “not trying to overplay him or anything like that.” They managed to avoid the whole thing becoming ego trips and remained it being about the music. I don’t know who you are comparing them to, or thinking about as the other “different musical groups”, but the Grateful Dead were unique in the extreme. If you read discussions of their early albums, they were all over the place, trying all sorts of things to figure out what they were doing, and their early concerts were pretty rough. But they managed to speak to their audience. Only the Rolling Stones are anywhere close in terms of longevity and number of concerts.
    
    FWIW, Garcia claimed that they never made as much from touring as most people thought: as the concerts got larger, the costs of putting on the shows went up as well, and it at least felt as though they had to do the next tour to pay the workers for the last one.
    
    I worry about over-rating the Dead. They’re not jazz, they’re folk-rock; musically, jazz (especially bebop and hard bop) is a completely different world (harmonically of course, but especially in terms of improvisation and group dynamics; many GD tunes have tightly composed structures). And, sure, the lyrics are inventive and lovely, but there are some rough edges, too. Still there is some depth there. For example, I take “strangers stopping strangers, just to shake their hand” as being if anything cynical/sarcastic (since it follows “I had to learn the hard way, to let her pass by– let her pass by” it’s saying watch out for these blokes) Cuddy takes it bog-literally, apparently missing the point of the song.) FWIW, there are pages with the lyrics and some discussion for most of the Dead’s songs. Search for “Annotated “.
    
    Whatever, Mayer has done a great job making Dead and Company work. Very kewl.
    
    *: https://www.youtube.com/watch?v=QMOEX0T10S8
    
    Reply ↓
    - David J. Littleboy on June 25, 2019 9:48 AM at 9:48 am said:
      
      Oops. Search for “Annotate Scarlet Begonias” (or whatever) to find the lyrics.
      
      Also, that link is for an example of what hard bop jazz players do with the blues. It makes the GD seem quite tame.
Mark Palko on June 24, 2019 10:37 PM at 10:37 pm said:

This hits one of my pet peeves about Wansink et. al. His defenders constantly frame his claims of magnitude as claims of sign. Wansink didn’t just suggest that “people who go grocery shopping while hungry buy more calories, that pre-ordering lunch can help you choose healthier food, and that serving people out of large bowls leads them to eat larger portions”; he promised enormous, life-changing effects requiring only a few small and easy steps.

Based on personal experience and common sense, I’m inclined to believe that making food more accessible tends to increase consumption. That’s very different than claiming that changing the shelf you keep your cereal on will make you drop twenty pounds. It’s that second claim that gets you on Oprah.

Reply ↓
Joe on June 25, 2019 11:03 AM at 11:03 am said:

I think I may have said this before here, but the problem lies with the incentive structure that modern social science has created by fusing theory and empirics, and the lack of a consensus on the ideal way for research to progress.

Modern social science (in contrast to, say, physics today or economics in the not-so-distant past) expects the researcher to do things: 1) propose some “new” theory and 2) test that theory. You get published if your theory “passes” the test you designed and conducted. You don’t get published if it “fails”. Surprise, surprise, researchers tend to design and conduct tests their theories can pass.

My own take is that science works best when there’s a division of labor between theory and testing. The independent tester’s incentive is just to get it right and be objective. I, for one, never really believe results unless they’ve either produced or replicated by a disinterested third party. The problem is not the research practices (or at least any specific set of research practices), it’s the incentives. Until we return to more of a division between theories and empirics, the incentives are just wrong and nothing will ever fix the problem.

For the present, though, we live in a world where theory and empirics are fused. So, how should we approach it? I think there are two philosophies out there. By analogy to the criminal justice system, these are:

1) The Cop Philosophy: The researcher is like a cop, going out to investigate. The job is to both develop a theory of the crime and gather the evidence to test this. The ideal cop does this with a constantly open mind and as objectively as possible. It’s up to you to get to the bottom of things, and the goal is just to get it right. Done right, the cop is all you need.

2) The Lawyer Philosophy: The researcher is an advocate for the theory. The job is to provide the evidence in favor of the theory, not the evidence against it. This does not extend to deceit or fraud, but the lawyer is there to argue one side, not find the truth. The lawyer model works because there’s also someone arguing the other side. If each argument is presented in the most effective, sympathetic manner possible, then the judge/jury will be equipped to find out the truth.

Andrew is very clearly in the “cop” camp, whereas Benitez seems to be in the “lawyer” camp. Both sides often seek to take the moral high ground, but it’s fairly obvious that neither has the high ground.

The “cop” approach works if it’s possible for researchers to actually overcome their incentives and be objective. But, of course, we’ve all seen how well that works in the criminal justice system. Even with pure intentions, it’s easy to get attached to preconceived notions and subconsciously shade the facts. Researchers have to be much more than just ethical to make this work. The problem with the cop system is that it holds researchers to an unreasonable standard for objectivity.

The “lawyer” approach works if the debate is sufficiently vigorous for outside observers to draw clear conclusions. You need someone to be actively pushing both (or all) sides of the argument, or else the flaws in each case will never become evident. The system works best when some baseline ethical level is maintained, but the real effectiveness comes from the argument process. You need opposing advocates who are willing and able to push hard. The problem with this system is that, outside some particular and fiercely contested research areas, researchers are often reluctant to criticize one another. Adversarial testing only works with an adversary.

At the moment, in many fields, researchers talk like cops but act like lawyers. And that’s probably the worst of both worlds. I don’t really know which model of the scientific community works better, though my stylized reading of the history and philosophy of science is that it’s actually the lawyer model. Many important theories are easily dismissed as wrong at the outset and need a dedicated, partisan advocate (who may be bending some rules) to even make it to their “day in court.” And wrong ideas can be helpful ecologically because they genuinely do lead to right ideas.

Reply ↓
- Andrew on June 25, 2019 11:16 AM at 11:16 am said:
  
  Joe:
  
  I don’t “hold researchers to an unreasonable standard for objectivity.” I just want to see the evidence for people’s claims. If Brian Wansink wants to propound theories based on his qualitative understanding of the world, that’s fine with me. If he wants to claim quantitative support for his theories based on data, I’d like to see the data and evidence. Wansink did not supply the evidence; rather, he supplied piles of numbers were not consistent with any possible data, then he tried to talk his way out of the problem.
  
  It is not asking for an unreasonable standard for objectivity to ask that, when researchers publish data summaries, that these be summaries of actual data. It is not asking for an unreasonable standard for objectivity to ask that researchers admit rather than minimizing their mistakes.
  
  As I wrote elsewhere, Wansink has every right not to share his data and research protocols, and we have every right not to believe a damn thing he says.
  
  Regarding “wrong ideas can be helpful ecologically because they genuinely do lead to right ideas”: Yes, maybe so. I wrote about that here.
  
  Reply ↓
  - Joe on June 25, 2019 1:05 PM at 1:05 pm said:
    
    Sorry, I didn’t mean that you were holding anyone to an unreasonable standard. I also think that Wansink did act unreasonably, even under the standards of what I’m calling the “lawyer” approach. Deception is always unreasonable.
    
    My point was that, more broadly, it’s unreasonable to hold people to the standard of being able to objectively judge the validity of their own theories. We recognize that it’s a conflict of interest when authors take even very small amounts of money from sources who are (potentially) interested in a particular research outcome. But we seem to be in denial about the conflict of interest when researchers test theories in which they are personally and professionally invested. The lifetime earnings premium associated with an early career publication in a top journal could easily run into the hundreds of thousands of dollars, and you can (for the most part) only get one if your results are in a particular direction.
    
    The effects of that are really substantial, even for people who hold themselves to the highest standards. Have you ever seen the Wiseman and Schiltz paper on experimenter effects in parapsychology (https://marilynschlitz.com/wp-content/uploads/2014/11/Wiseman-and-Schlitz-1999.pdf)? The goal was to test the parapsychological hypothesis that “remote staring” (i.e., someone staring at your image on a monitor in another room) causes a physiological response. Wiseman and Schiltz both studied the phenomenon independently, conceptually replicating early work. Wiseman, a skeptic, found no evidence for the phenomenon. Schiltz, a believer, found evidence for it. Probably not so surprising. Experimenter effects are large in parapsychology generally because people have an incentive to come to a particular finding.
    
    So, the two of them decide to team up and do the right thing from the “cop” perspective. They agree to do a joint study of remote staring using exactly the same methods, materials, facilities, etc. Each of them conduct half the trials. At the end, analyzing the trials shows an effect among the subjects tested by Schiltz an no effect among those tested by Wiseman (the sample size here is small, but that’s not really the point — after all, they chose that together). Wiseman and Schiltz have no idea how/why this happened. It could be fraud or malpractice, of course, but they did agree to do the experiment together. They were, by every indication, trying to do everything right, and it’s not clear at all where anything went wrong. After all, this is a procedure chosen by mutual agreement. It seems extremely unlikely that either of them did anything deliberately to mess with the results, and whatever may have happened subconsciously was so subconscious that they couldn’t dig it out themselves even after the divergence in results became apparent.
    
    The point here isn’t that the study was perfect (it wasn’t, of course), but rather that these researchers were trying their very best to be objective, and they still couldn’t escape whatever subtle, undetectable way that they (or, really, one presumes Schiltz) influenced the result. This kind of thing is a little more obvious in parapsychology, but the same tendencies are there and are powerful.
    
    We know that parapsychology is bunk, but not because of effective, objective “cop” researchers. It was the powerful effort of motivated, aggressive and decidedly non-objective debunkers. Uri Geller could fool the well-meaning parapsychologists at Stanford because they wanted to believe and were incentivized to do so. Puthoff and Targ could never have published a paper in Nature showing that Geller did *not* have psychic abilities. The debunkers get cast as heroes (and perhaps rightly so), but they weren’t neutral either. They were both predisposed and incentivized to take down Geller. James Randi was not trying to do a neutral replication and probably couldn’t have done so if he wanted to. Likewise, Ray Hyman was very committed to the idea that Geller did * not* have psychic powers and he wasn’t out to do a dispassionate analysis.
    
    I’m not saying that adversarial testing of ideas is the ideal, but at least it’s realistic about people’s motivations. As I said before, I think the ideal is a division of labor that minimizes the incentive to find results that go in a particular direction, like the division between theorists and experimentalists in physics.
    
    But the bottom line, is that the model of expecting researchers to be the judge in the case of their own theories will always have problems. Such a system guarantees that some unscrupulous people will put their fingers on the scale through deliberate or conscious deceit. But even if everyone is ethical and following best practices, the incentives are still there both consciously and unconsciously, influencing every little step. I have yet to see a human system where bad incentives can be fixed through individual ethics. The more adversarial approach seems to do more to fix incentives (except the aversion to criticizing others).
    
    Reply ↓
    - Andrew on June 25, 2019 1:26 PM at 1:26 pm said:
      
      Joe:
      
      Agreed.
    - Keith O'Rourke on June 26, 2019 8:18 AM at 8:18 am said:
      
      Perhaps the one hopeful difference in science is that, unlike in adversarial debate where you just want to cause others to agree with your take on something (that happened), you really (should) want cause others to understand why is its worth persisting on working with your take on something (that will continually happen).
      
      Ideally, the persistence in working (scientifically) on your view by others will eventually remove all errors and personal idiosyncrasies…
    - Martha (Smith) on June 26, 2019 3:35 PM at 3:35 pm said:
      
      +1
- gec on June 25, 2019 12:27 PM at 12:27 pm said:
  
  > My own take is that science works best when there’s a division of labor between theory and testing.
  
  Unfortunately, this also leads to perverse incentives for theorists to propose theories that cannot (at least in the foreseeable future) be tested at all. This is currently the case in much of particle physics as documented strenuously (and amusingly) by Sabine Hossenfelder (https://backreaction.blogspot.com/). The result is particularly “lawyerly” in that advocacy happens on the basis of essentially aesthetic criteria of formal elegance rather than via empirical constraint, providing quite a lot of vigorous debate without much progress on understanding the world.
  
  Reply ↓
  - Anoneuoid on June 25, 2019 2:36 PM at 2:36 pm said:
    
    Unfortunately, this also leads to perverse incentives for theorists to propose theories that cannot (at least in the foreseeable future) be tested at all.
    
    It is up to whoever came up with it to find a meaningful (can distinguish the predictions from those of other theories) test for it. Why should anyone pay attention if you can’t/won’t do that?
    
    If people want to waste their time digging deep into untestable theories and arguing about logical/mathematical/philosophical issues then that is their prerogative I guess. Personally, I don’t want taxes going to that until the theory has already shown some merit. I’m tending to want separation of science and state in general anyway though.
    
    Reply ↓
    - Andrew on June 25, 2019 2:45 PM at 2:45 pm said:
      
      Anon:
      
      The trouble is that people do pay attention to bad science. Even if you don’t want to pay attention to it, and I don’t want to pay attention to it, decision makers might. Hence the effort I put into posts like this one. Or the whole pizzagate thing. It’s all too easy for people who want to believe in the results that are supported by sloppy research, and often these believers seem to think they have rigor on their side.
    - Anoneuoid on June 25, 2019 9:27 PM at 9:27 pm said:
      
      Yes, of course. Not only do people pay attention to bad research, but the world is simply swamped with it right now. It is to the point where they do not even recognize real science (derive a model from some postulates, determine some predictions, go get data to check those predictions) or believe it is possible anymore. Wansink’s methods could not have defenders in a sane reality.
      
      I’m just saying it is pretty easy to tell when a theory isn’t worth spending much time on, because at least one meaningful prediction (either already checked, or awaiting someone to check it) should be found in the first paragraph you read about the theory.

Statistical Modeling, Causal Inference, and Social Science

Leave a Reply Cancel reply