Fabrication in survey research!

Mike Spagat writes:

I think some of your loyal readers will be interested in this conference on fabrication in survey research.

You certainly have covered this topic from time to time but I think it would be fair to say that it’s still a little bit too far under the radar screen. The LaCour experience would be a case in point where it initially didn’t occur to you (or to me) that the data may have been fabricated.

I believe that there are still a few empty seats left for the conference but, more importantly, there will be live streaming with recordings being posted afterwards as well.

The conference is at the National Opinion Research Center’s offices in Bethesda, Maryland. Great lineup of speakers but I was disappointed not to see Weggy there. Maybe he’ll be a surprise guest? He’s only one state away, and I’m sure he has lots to say on the topic.

If I were speaking at this conference, I’d emphasize what I see as the continuity between data fabrication, cheating in data analysis, sloppily opportunistic data analysis (for example, dubious rounding processes that conveniently turn p=.052 into p=.048), and general overestimation of what can be learned from small samples (this is what doomed the studies of beauty and sex ratio, ovulation and clothing, etc). The common theme is researchers believing they have the answer, and thinking of data collection and statistical analysis just as a series of hoops they must jump through to get their expected confirmation. To put it another way: Stapel made everything up, Lacour seems to have made up a lot of things; Hauser might still believe that his codings of the notorious monkey videos are correct; the ovulation-and-voting people did no misconduct at all, they were just following the rules that were taught to them. But all of them want to prove the truth of a hypothesis and none of them have any real evidence in favor.

21 thoughts on “Fabrication in survey research!

  1. I’m not sure treating this as a continuum is helpful. I think they’re conceptualy and ethically quite distinct, but at a most immediate level, they’re different on a practical level: The way to prevent things that strongly violate disciplinary norms (Stapels, LaCour, Hauser) and those that basically follow disciplinary norms is completely different.
    To prevent the ovulation and voting studies and the like, we need a combination of better training, better reviewing, and better/more skeptical science reporting. Those are hard to do, but on the plus side, the flaws of such papers are relatively obvious to people who actually know statistics, so they don’t become established. scientific knowledgetOn the other hand, the LaCour and Stapels (don’t know the Hauser case very well) studies were, even to a highly statistically literate reader like yourself, pretty convincing. LaCour had all the training to make everything look right on paper. Heck he even tried to fake the pre-registration of the study. It’s much less clear what we can do to catch outright fraud like that, especially data fabrication.

    • “It’s much less clear what we can do to catch outright fraud like that, especially data fabrication.”

      This seems like a solved issue.

      Make it standard operating procedure to perform independent replications. Require the theory to make predictions about data from the future or otherwise outside the control of the person coming up with the theory.

      Lots of research no one cares enough about to replicate will go away, so will a lot of post hoc theories that don’t make distinctive predictions.

  2. When discussing the Garden of Forking Paths Andrew tends to emphasize that he doesn’t think that people are intentionally cheating, but, rather, making a subtle statistical error. A potential advantage of making this point is that it can open up some possibilities for dialogue that might not be present if the claim were that someone had intentionally cheated. I also think that there is a lot of reality behind the idea that people keep trying specifications until they find one that “works” and then honestly come to believe that this is the right specification.

    I guess you could say that the next step is to more knowingly go about about tenderizing your data until you get a p below 0.05.

    The next time maybe you make up a few data points to get you where you need to go. This is OK because you know your theory is right in the first place but the data are being a bit recalcitrant.

    Eventually you’re just making stuff up from scratch.

    So maybe the continuum idea makes some sense.

    Still, there may be two discontinuity points.

    1. Where you cross over from fooling yourself to knowingly fooling other people. But this might still be a rather blurred line.

    2. Where you cross over from manipulating your data to making up the data. Still, with techniques like hot decking missing data this too can be a bit of a blur.

    • Mike:

      Another point that often comes up is data-processing rules. For example, consider this:

      Cortisol scores at both time points were sufficiently normally distributed, except for two outliers that were more than 3 standard deviations above the mean and were excluded; testosterone scores at both time points were sufficiently normally distributed, except for one outlier that was more than 3 standard deviations above the mean and was excluded.

      That’s just one example. Lots of these Psychological Science-style papers feature idiosyncratic data-exclusion rules which I suspect are set up to exclude data that don’t fit the researchers’ story. No data are made up. Then again, a sculpture of stone doesn’t create anything; he or she just chips away, creating a pattern by removing bits of marble that don’t fit the image.

      • Thanks Andrew. This is a highly relevant observation about something I wasn’t thinking about.

        Separately, great choice of the movie ad to accompany the post. This movie is incredibly good and pertinent. It really brings home just how weak our defenses against fraud really are.

      • Or consider one of your favorite examples, that notorious Iraq survey. Was there fraud? Presumably not initially, as they were just trying to do their best, conducting a survey under difficult conditions. But then after problems were pointed out with the survey, the researchers didn’t defend their claims very well. I don’t know if I’d call that fraud either, but it’s a pattern I see a lot, that a researcher cuts corners and then tries to paper over errors. This is what seems to have happened with Amy Cuddy too: she is, I assume, a true believer in her research programme and so she would not see it as inappropriate to use opportunistic data analysis rules, inappropriate rounding of intermediate quantities, etc.—but it’s at the very least unscientific for her to not want to address the problems in her work. Similarly for Marc Hauser: if he really cared about his research hypotheses and truly respected their empirical implications, he’d want as many coders as possible to look at those videotapes. Again, not quite fraud but it seems to me to be on the spectrum.

      • Andrew said, “Lots of these Psychological Science-style papers feature idiosyncratic data-exclusion rules which I suspect are set up to exclude data that don’t fit the researchers’ story.”

        I’m not so sure about the “set up to exclude data that don’t fit the researchers’ story.” I do suspect this is true in some cases, but I’m pretty sure that in other cases people act on the misguided belief that outliers “should” be excluded. This is in part fostered by software that spits out lots of information, including “outliers” as determined by some default procedure. But it’s also fostered in part by poor education — too many textbooks and instructors do not emphasize that outliers are part of the distribution, and should not automatically be excluded.

        An interesting/amusing anecdote: A student once came to me concerned because when she removed the “outliers” identified by her software and ran the regression again without them, the software identified new “outliers”. She feared that she would have no data left if she continued this process, so asked for advice on when to stop. It was good that she realized a possible problem in her assumed proper way to proceed — I of course took advantage of the opportunity to supply “just in time” teaching about outliers (which might have been something I had already said in class, but which she was now very open to hearing.)

        • Or maybe another way to ask this, what are (references to) best practices for determining the circumstances (if any) under which it is appropriate to exclude outliers?

        • This is a better question, though the adoption of ‘best practices’ often seems to be a way to avoid doing the critical thinking oneself.

          Ask yourself the question. Under what circumstances do you think you could logically defend excluding outliers?

        • something strange about the outlying observations that can be articulated (with a straight face), but not modeled for some reason?

        • Some reasons that come to mind are:

          1. If in a survey with quite a few questions, all answers are the exact same and this would be logically impossible given the positive and negative framing of the same question. So, the cases where responses were all 1’s or all 5’s might be candidates.

          2. If a very small number of people in a dataset that includes income as a variable make an order of magnitude more than the next set of people, one might want to run the analyses both with and without and discuss the differences. Or at the very least compute the statistics that allow for this to be explained such a wide divergence between the mean and median.

          These are just a couple of reasons, but there are likely many more justifiable reasons for excluding outliers or at the very least running the analyses both with and without.

        • I agree with Curious that “the adoption of ‘best practices’ often seems to be a way to avoid doing the critical thinking oneself. ”

          One example I often give is in studying heights of some population of adult men. Heights of 7 feet and 11 feet would both be considered outliers. But 7 feet is a possible height for an adult man, so should not be thrown out. In contrast, 11 ft is highly implausible for the height of a man. So if you find a height of 11 feet, it is likely to be a mistake in data recording or transcription. Thus you should investigate this possibility and correct any recording or transcribing error you might find. If you can’t find clear evidence of either a recording or transcribing error, and also no evidence that this is indeed a real height, then the best solution may be to exclude that case. However, even then, it would be a good idea to analyze the data with and without the outlier to see if it makes any difference in results.

  3. The picture which accompanies Andrew’s blog post refers to an Orson Welles’ film entitled “Fake” and deals with a fake within a fake–the artist forger, Elmyr de Hory, and his biographer, Clifford Irving, who in turn goes on to fake the biography of Howard Hughes. Irving was especially gifted in the forging trade because of his ancillary acting process:

    “Less than two months after the [1972] CBS News interview, Irving admitted that his book was a hoax. Time magazine dubbed him ‘Con Man of the Year.’ And CBS News, too, gave him an award worthy of his performance.”

    “On March 19, 1972, 60 Minutes nominated Irving as best actor of the year in a starring role.”

    The best response regarding faking I ever came across was from the infamous Nazi, Hermann Goering, a noted “collector,” that is, thief of artworks. During his Nuremberg War Trial regarding stolen paintings in his possession, he was informed that some of the Vermeers he had “purchased” were actually done by Han van Meegeren, a Dutch forger who was on trial for collaborating with the Nazis. Goering replied, “I never realized how low humanity could sink.”

Leave a Reply to jrc Cancel reply

Your email address will not be published. Required fields are marked *