Skip to content
 

“In general I think these literatures have too much focus on data analysis and not enough on data collection.”

Mike Zyphur pointed me to an article appearing in Psychological Bulletin with a meta-analysis of ovulatory cycle effects:

Title: Do Women’s Mate Preferences Change Across the Ovulatory Cycle? A Meta-Analytic Review
Authors: Gildersleeve, K; Haselton, MG; Fales, MR
Source: PSYCHOLOGICAL BULLETIN , 140 (5):1205-1259; SEP 2014
Abstract: Scientific interest in whether women experience changes across the ovulatory cycle in mating-related motivations, preferences, cognitions, and behaviors has surged in the past 2 decades. A prominent hypothesis in this area, the ovulatory shift hypothesis, posits that women experience elevated immediate sexual attraction on high- relative to low-fertility days of the cycle to men with characteristics that reflected genetic quality ancestrally. Dozens of published studies have aimed to test this hypothesis, with some reporting null effects. We conducted a meta-analysis to quantitatively evaluate support for the pattern of cycle shifts predicted by the ovulatory shift hypothesis in a total sample of 134 effects from 38 published and 12 unpublished studies. Consistent with the hypothesis, analyses revealed robust cycle shifts that were specific to women’s preferences for hypothesized cues of (ancestral) genetic quality (96 effects in 50 studies). Cycle shifts were present when women evaluated men’s “short-term” attractiveness and absent when women evaluated men’s “long-term” attractiveness. More focused analyses identified specific characteristics for which cycle shifts were or were not robust and revealed areas in need of more research. Finally, we used several methods to assess potential bias due to an underrepresentation of small effects in the meta-analysis sample or to “researcher degrees of freedom” in definitions of high-and low-fertility cycle phases. Neither type of bias appeared to account for the observed cycle shifts. The existence of robust relationship context-dependent cycle shifts in women’s mate preferences has implications for understanding the role of evolved psychological mechanisms and the ovulatory cycle in women’s attractions and social behavior.

and to some critiques, starting with this one, published in the same journal:

Title: Elastic Analysis Procedures: An Incurable (but Preventable) Problem in the Fertility Effect Literature. Comment on Gildersleeve, Haselton, and Fales (2014)
Authors: Harris, CR; Pashler, H; Mickes, L
Source: PSYCHOLOGICAL BULLETIN , 140 (5):1260-1264; SEP 2014
Abstract: Gildersleeve, Haselton, and Fales (2014) presented a meta-analysis of the effects of fertility on mate preferences in women. Research in this area has categorized fertility using a great variety of methods, chiefly based on self-reported cycle length and time since last menses. We argue that this literature is particularly prone to hidden experimenter degrees of freedom. Studies vary greatly in the duration and timing of windows used to define fertile versus nonfertile phases, criteria for excluding subjects, and the choice of what moderator variables to include, as well as other variables. These issues raise the concern that many or perhaps all results may have been created by exploitation of unacknowledged degrees of freedom (“p-hacking”). Gildersleeve et al. sought to dismiss such concerns, but we contend that their arguments rest upon statistical and logical errors. The possibility that positive results in this literature may have been created, or at least greatly amplified, by p-hacking receives additional support from the fact that recent attempts at exact replication of fertility results have mostly failed. Our concerns are also supported by findings of another recent review of the literature (Wood, Kressel, Joshi, & Louie, 2014). We conclude on a positive note, arguing that if fertility-effect researchers take advantage of the rapidly emerging opportunities for study preregistration, the validity of this literature can be rapidly clarified.

I replied that I read that first abstract and I’m skeptical. Then again, I would be.

Then I read the second abstract and it seems to make sense.

But I’d probably add one thing, which is that if researchers really are serious about learning about the topic of mate preferences and the ovulatory cycle, they should be doing within-person studies with more precise measurements. In general I think these literatures have too much focus on data analysis and not enough on data collection.

Preregistration: what it does and does not do

To put it another way, let me pick apart the following reasonable-sounding sentence from the abstract of the critical review by Harris, Pashler, and Mickes:

We conclude on a positive note, arguing that if fertility-effect researchers take advantage of the rapidly emerging opportunities for study preregistration, the validity of this literature can be rapidly clarified.

Sure, but . . . let’s be clear about our goals. If we seek to evaluate the validity of various claims in the fecundity-and-psychology literature then, yes, preregistration of existing studies is the way to go; this would (I believe) continue to show that the published claims in much of this literature are nothing but noise. But if we seek to learn anything positive about fecundity and psychology, then I don’t think preregistration will get you anywhere, unless you improve the data collection and design of your studies. Preregistration is a great way to get a sense of what information you have but not necessarily a great way to learn anything new. To put it another way, preregistration removes bias, which is great, but it does not increase effect size or reduce variance. Except in the indirect sense that, if you know your study is preregistered, you also realize that it will not be so easy to attain statistical significance, which in turn might motivate you to perform more careful studies with cleaner effects and less noise. But this will happen only if you get serious about your design and data collection, not if you think of your study as a static entity.

This logic is one reason I get so frustrated with empirical researchers who, when faced with skepticism and criticism regarding the garden of forking paths, just stand there and fight rather than taking the criticism to heart. All the replication and preregistration in the world won’t help you, if you’re tied to a weak, noisy design.

25 Comments

  1. Like you, Andrew, I think that “the published claims in much of this literature are nothing but noise”. In fact, I think that holds for a great many literatures. But it’s a weird sort of belief to hold about “much of” social science. At least, I feel weird holding it. There’s something nihilistic about it, which we’ve talked about before. What does it mean to live in a society the mainstream (i.e., “scientific”) understanding of “much of” which is based on noise? Even if I ignore the noise in forming my own opinions (and policy proposals), I’m still living in a world full of “studies showing” (and people believing) a bunch of things that just aren’t true.

    • Keith O'Rourke says:

      Thomas:

      This is the position I came to about published clinical research in recent years, my default is to assume it is wrong.

      I only think one can get past this, if they have personal information about the research group and how they work, or in a regulatory context where the trails had to be pre-discussed and all data and materials can be audited.

      Its not just me, here is another group arguing pretty much the same thing

      Tom Jefferson, et al (of The Cochrane Collaboration). Risk of bias in industry-funded oseltamivir trials: comparison of core reports versus full clinical study reports http://bmjopen.bmj.com/content/4/9/e005253.full

      Their conclusion:
      “This approach is not possible when assessing trials reported in journal publications, in which articles necessarily reflect post hoc reporting with a far more sparse level of detail. We suggest that when bias is so limiting as to make meta-analysis results unreliable, either it should not be carried out or a prominent explanation of its clear limitations should be included alongside the meta-analysis.”

      Now, way back when Andrew originally complained about the sex and beauty paper, I said “If it true, it will replicate”.
      That’s not quite right, I should have said “If its true, it would replicate with adequate study design and execution”.
      (At least, I did not say something silly like “If it replicates its true”!)

      So no need to be nihilistic, just a need to be capable of designing and carrying out good studies.

    • Rahul says:

      @Thomas:

      One way to assuage your fears about nihilism is if most of this work is irrelevant; i.e. most people just don’t care. Or they discount these results mentally as unreliable. e.g. red dress = fertility

      Basically, I think a huge chunk of Soc. Sci. research has become an inward looking enterprise. Academics commission studies. Other academics perform them. Still other academics decide if to publish / read them. No significant harm gets done if results are crap. Serious policy-making doesn’t rely on these studies.

      That’s my cynical outlook.

  2. Rahul says:

    The academic system has evolved to maximize article production. It is far easier to get a publishable journal article by analysis than (high quality) data collection.

  3. Martha says:

    Andrew said: “This logic is one reason I get so frustrated with empirical researchers who, when faced with skepticism and criticism regarding the garden of forking paths, just stand there and fight rather than taking the criticism to heart. “

    So maybe what society/social science really needs is some serious study of how to promote taking criticism to heart and decrease standing there and fighting.

  4. Clyde Schechter says:

    Well, good designs can be hard to carry out and might not settle the question anyway.

    For example, to do the red-dress fertility study with a strong within-person design, you would have to recruit some women to participate in your study and come back for repeat assessments. To truly measure where they are in their cycles on each occasion you would probably need to measure follicle-stimulating hormone and luteinizing hormone levels (or something like those)–which involves drawing blood. And you would probably need to conceal the true purpose of the study from them so that their knowledge of your goals didn’t actually influence what they chose to wear on each occasion. Which probably also entails coming up with some extraneous activity for them to do at each assessment to provide some ostensible reason for them to be there. Also, you might want to assess the colors of their clothing under standardized lighting conditions, perhaps including photographing them and doing some kind of photometric color classification. It’s more burdensome to the participants and researchers than just being approach haphazardly one day and answering a question about their last menstrual period and having the color of their dresses observed informally.

    On top of that, not everyone who starts a study will show up subsequently, even if you treat them very nicely and provide incentives, etc. And not only can you not exclude the possibility that the no-shows differ from those who return for follow-up in ways that are related to key study variables, in many contexts it is even likely that this is the case. So, there is a considerable risk you will end up with data that are missing not at random, and unable even to model the missingness. So even after doing the study, considerable doubt about the validity of the results might remain.

    To be clear, I’m not arguing against better designs. I’m just saying that they are typically more cumbersome and resource intensive (so we probably wouldn’t want to waste the effort on unimportant questions like red dress = fertility), and that inevitable problems in implementation can muddy the findings of these studies too.

    • Elin says:

      Yes, I would think at minimum you would need to go to them to do the data collection if you were going to do the blood work every day. But I think before spending all that money you could do a within subjects study to see if there is any indication that the preference for wearing red (or whatever) over time follows the patterns you would expect if the hypothesis was true. What you would expect is very well defined, high, drop off a cliff once a month, steady low, pretty fast increase. Then if you actually saw such a pattern you could do the data collection on the hormones, to see if the pattern is actually associated with the timing of the same pattern in hormone levels.

  5. dab says:

    No one said science should be easy or cheap. If the question is important enough to study, it seems it should be important enough to study with a design that minimizes the noise. Maybe these researchers should pool resources to publish one decent paper rather than a series of shoddy ones? (But I guess it was Rahul’s point above that there are no incentives for that and there are many against it.). And if you want to talk about MNAR, how about all of the women who aren’t out and about to “approach haphazardly one day and answering a question about their last menstrual period and having the color of their dresses observed informally”? I think the designs being used are inferior in every way to the one you proposed as being too complex and costly (except for cost and complexity, of course). But, again, no one said science should be cheap or easy.

    • dab says:

      Oops. This was meant as a reply to Clyde Schechter above….

    • Clyde Schechter says:

      I’m not trying to say that science should be easy or cheap. I’m just pointing out two things:

      1. Some questions are probably not worth the trouble of investigating adequately. (I will not disagree if you want to argue that, in that case, they shouldn’t be investigated at all.)

      2. Rigorous designs often involve extra difficulties in implementation. As a result what we end up with is not the low-bias low-variance study we set out to do, but a pale version of it. And the answer it provides may not be all that much better than the oneprovided by a less well designed study that might get better execution.

  6. zbicyclist says:

    Just FYI, David Brooks has a short piece about a study in a related area a few years ago. But in this case it’s hard to argue with him:

    “The most amazing thing about the study is that there are apparently people capable of looking at Georgia O’Keeffe paintings and not seeing sexual themes. These are the people who need to be studied.”

    http://brooks.blogs.nytimes.com/2011/05/31/the-moons-tide/

  7. Nadia says:

    The authors analyzed the distribution of p values in this cycle, and found many more p values close to 0 than .05. It would be interesting to see what a high quality study would find.

  8. […] Apply or Not to Apply: A Survey Analysis of Grant Writing Costs and Benefits “In general I think these literatures have too much focus on data analysis and not enough on data … Genes linked to autism may also make people smarter The melting of Antarctica was already really […]

Leave a Reply