How does a Nobel-prize-winning economist become a victim of bog-standard selection bias?

Posted on July 20, 2017 9:58 AM by Andrew

Someone who wishes to remain anonymous writes in with a story:

Linking to a new paper by Jorge Luis García, James J. Heckman, and Anna L. Ziff, an economist Sue Dynarski makes this “joke” on facebook—or maybe it’s not a joke:

How does one adjust standard errors to account for the fact that N of papers on an experiment > N of participants in the experiment?

Clicking through, the paper uses data from the “Abecedarian” (ABC) childhood intervention program of the 1970s. Well, the related ABC & “CARE” experiments, pooled together. From Table 3 on page 7, the ABC experiment has 58 treatment and 56 control students, while ABC has 17 treatment and 23 control. If you type “abecedarian” into Google Scholar, sure enough, you get 9,160 results! OK, but maybe some of those just have citations or references to other papers on that project… If you restrict the search to papers with “abecedarian” in the title, you still get 180 papers. If you search for the word “abecedarian” on Google Scholar (not necessarily in the title) and restrict to papers by Jim Heckman, you get 86 results.

That’s not why I thought to email you though.

Go to pages 7-8 of this new paper where they explain why they merged the ABC and CARE studies:

CARE included an additional arm of treatment. Besides the services just described, those in the treatment group also received home visiting from birth to age 5. Home visiting consisted of biweekly visits focusing on parental problem-solving skills. There was, in addition, an experimental group that received only the home visiting component, but not center-based care.[fn 17] In light of previous analyses, we drop this last group from our analysis. The home visiting component had very weak estimated effects.[fn 18] These analyses justify merging the treatment groups of ABC and CARE, even though that of CARE received the additional home-visiting component.[fn 19] We henceforth analyze the samples so generated as coming from a single ABC/CARE program.

OK, they merged some interventions (garden of forking paths?) because they wanted more data. But, how do they know that home visits had weak effects? Let’s check their explanation in footnote 18:

18: Campbell et al. (2014) test and do not reject the hypothesis of no treatment effects for this additional component of CARE.

Yep. Jim Heckman and coauthors conclude that the effects are “very weak” because ran some tests and couldn’t reject the null. If you go deep into the supplementary material of the cited paper, to tables S15(a) and S15(b), sure enough you find that these “did not reject the null” conclusions are drawn from interventions with 12-13 control and 11-14 treatment students (S15(a)) or 15-16 control and 18-20 treatment students (S15(b)). Those are pretty small sample sizes…

This jumped out at me and I thought you might be interested too.

My reply: This whole thing is unfortunate but it is consistent with the other writings of Heckman and his colleagues in this area: huge selection bias and zero acknowledgement of the problem. It makes me sad because Heckman’s fame came from models of selection bias, but he doesn’t see it when it’s right in front of his face. See here, for example.

The topic is difficult to write about for a few reasons.

First, Heckman is a renowned scholar and he is evidently careful about what he writes. We’re not talking about Brian Wansink or Satoshi Kanazawa here. Heckman works on important topics, his studies are not done on the cheap, and he’s eminently reasonable in his economic discussions. He’s just making a statistical error, over and over again. It’s a subtle error, though, that has taken us (the statistics profession) something like a decade to fully process. Making this mistake doesn’t make Heckman a bad guy, and that’s part of the problem: When you tell a quantitative researcher that they made a statistical error, you often get a defensive reaction, as if you accused them of being stupid, or cheating. But lots of smart, honest people have made this mistake. That’s one of the reasons we have formal statistical methods in the first place: people get lots of things wrong when relying on instinct. Probability and statistics are important, but they’re not quite natural to our ways of thinking.

Second, who wants to be the grinch who’s skeptical about early childhood intervention? Now, just to be clear, there’s lots of room to be skeptical about Heckman’s claims and still think that early childhood intervention is a good idea. For example, this paper by Garcia, Heckman, Leaf, and Prados reports a benefit/cost ratio of 7.3. So they could be overestimating their effect by a factor of 7 and still have a favorable ratio. The point is, if for whatever reason you support universal day care or whatever, you have a motivation not to worry too much about the details of a study that supports your position.

Again, I’m not saying that Heckman and his colleagues are doing this. I can only assume they’re reporting what, to them, are their best estimates. Unfortunately these methods are biased. But a lot of people with classical statistics and econometrics training don’t realize this: they thing regression coefficients are unbiased estimates, but nobody ever told them that the biases can be huge when there is selection for statistical significance.

And, remember, selection for statistical significance is not just about the “file drawer” and it’s not just about “p-hacking.” It’s about researcher degrees of freedom and forking paths that researchers themselves don’t always realize until they try to replicate their own studies. I don’t think Heckman and his colleagues have dozens of unpublished papers hiding in their file drawers, and I don’t think they’re running their data through dozens of specifications until they find statistical significance. So it’s not the file drawer and it’s not p-hacking as is often understood. But these researchers do have nearly unlimited degrees of freedom in their data coding and analysis, they do interpret “non-significant” differences as null and “significant” differences at face value, they have forking paths all over the place, and their estimates of magnitudes of effects are biased in the positive direction. It’s kinda funny but also kinda sad, that there’s so much concern for rigor in the design of these studies and in the statistical estimators used in the analysis, but lots of messiness in between, lots of motivation on the part of the researchers to find success after success after success, and lots of motivation for scholarly journals and the news media to publicize the results uncritically. These motivations are not universal—there’s clearly a role in the ecosystem for critics within academia, the news media, and in the policy community—but I think there are enough incentives for success within Heckman’s world to keep him and his colleagues from seeing what’s going wrong.

Again, it’s not easy—it took the field of social psychology about a decade to get a handle on the problem, and some are still struggling. So I’m not slamming Heckman and his colleagues. I think they can and will do better. It’s just interesting, when considering the mistakes that accomplished people make, to ask, How did this happen?

P.S. This is an important topic. It’s not ovulation-and-voting or air rage or himmicanes or anything silly like that: We’re talking about education policy that could affect millions of kids! And I’m not saying I have all the answers, or anything close to that. No, it’s the opposite: data are relevant to these questions, and I’m not close to the data. What’s needed is an integration of theory with observational and experimental data, and it’s great that academic economists such as Heckman have put so much time into studying these problems. I see my role as statistician as a helper. For better or worse, statistics is a big part of the story, and when people are making statistical errors, we should be correcting them. But these corrections are not the end of the story; they’re just necessary adjustments to keep research on the right track.

74 thoughts on “How does a Nobel-prize-winning economist become a victim of bog-standard selection bias?”

Anonymous on July 20, 2017 12:33 PM at 12:33 pm said:

While I appreciate that this is a carefully written and sensitive treatment of a difficult topic, I feel like this doesn’t really present a balanced view of the subject because it ignores the scarcity of randomized studies of early childhood care with high quality long term follow up data. Before I delve more into this, full disclosure: I worked as a research assistant for Professor Beckman during and after college.

The fact is, the Abecedarian, and also Perry preschool studies while we’re at it, are vastly overrepresented in the early childhood literature because they are the only studies that combine both high quality study design (randomized treatment) with high quality, expensively collected long term follow up data. To my knowledge, alternative studies of the effects of early childhood education are based on either (a) more recent, better designed studies with larger sample sizes and randomized treatment assignment but very short term or low quality follow up information, e.g. the Head Start Impact Study or (b) much worse designed studies with long term follow up information, e.g. the Chicago Longitudinal Study. To point out their overrepresentation in the literature without acknowledging this seems unfair. Furthermore, both studies have so many papers on them in part because there has been a new wave of papers every time new data has become available (which has happened many times as more follow ups were conducted).

Now regarding the concerns you raised about the way Professor Heckman’s group combined the Abecedarian and CARE data, I am a little confused. As I understand it, your concern is that they (a) pooled the home-visiting plus center-based CARE group with the Abecedarian treatment group and (b) omitted the home-visiting only treatment group from the CARE data from their statistical analyses. My reading of the paper is that the test is only relevant to (a), and it is used to argue that there is no evidence in the data suggesting that the additional bit of home visiting treatment received by the CARE center-based treatment group will dominate the estimated effects of treatment based on the pooled data. With respect to (a), given that are interested in estimating the effect of participation in an “enriched early childhood education program” from 8 weeks to age five with several years of a center-based treatment, is it really that unreasonable to pool the two groups which only differ by < 3 home visits a month to boost the sample size? My reading of the paper is that they made this decision based on the parameter they wanted to estimate, not the results of the test, and presented the results of the test to demonstrate that their decision was not obviously unreasonable. With respect to (b), my reading of the paper is that this was done to reflect the specific effect they wanted to estimate (specifically, participation in an "enriched early childhood education program" from birth to age five to participation in control conditions home care low quality center based care, to which a home-visiting only treatment group is not relevant), again not because of the results of the test. From your perspective, were there a different, better decisions they should have made? It's not obvious to me what they should have done, although I would say that given that their decisions were likely based on the parameter they wanted to estimate and not the results of the test, they maybe should have just excluded the results of the test to avoid confusion.

As a last point, I disagree with the characterization of the benefit/cost ratio being too high as a *statistical* problem. I would agree that the benefit/cost ratio is too high, but I would argue that that is as much if not moreso due to the subjectivity involved in performing a cost benefit analysis (which is not itself a statistical procedure but rather a difficult economic one of assigning monetary costs and benefits to things like years of education, jail time, etc.), lack of generalizability of the Abecedarian sample (which is a problem that afflicts any results based on a relatively small sample from decades ago regardless of how well the data are analyzed), and lack of scalability (it is probably impossible to replicate the Abecedarian/CARE treatment at the same cost if at all at present).

Reply ↓
- Andrew on July 20, 2017 2:06 PM at 2:06 pm said:
  
  Anon:
  
  You mention “the scarcity of randomized studies of early childhood care with high quality long term follow up data,” the “lack of generalizability” of the study, and “lack of scalability.” These are all good points, and they’re part of the story as to why I think researchers in this area should show a bit more caution in their statements.
  
  For example, here’s Heckman: “A substantial literature shows that U.S. early childhood interventions have important long-term economic benefits.” This can be misleading, partly because the literature may be substantial but the data are not, and partly because of his strong, deterministic way of putting it. This sort of statement by Heckman does not, it seems to me, respect the concerns that you give in your comment.
  
  Here’s another statement of Heckman that’s just wrong from a statistical perspective: “The fact that samples are small works against finding any effects for the programs, much less the statistically significant and substantial effects that have been found.” Again, this does not address the issues of scarcity of data, lack of generalizability, and lack of stability that you address.
  
  Finally, one reason the benefit/cost bias is a statistical problem is that the benefit of the program is estimated using published estimates that are subject to the statistical significance filter. These are statistically biased estimates, even setting aside subjective considerations in defining costs.
  
  Reply ↓
  - Martha (Smith) on July 20, 2017 6:03 PM at 6:03 pm said:
    
    +1
    
    Reply ↓
  - Anonymous on July 20, 2017 11:50 PM at 11:50 pm said:
    
    Thank you for taking the time to respond. That said, I’m disappointed that the response you gave doesn’t really address the questions I asked.
    
    With respect to the two quotes you provided, I’m not sure they actually address the questions I’ve raised. As far as I can tell, you use the fact that Professor Heckman and his coauthors have made two questionable claims in the past to criticize the quality of their research in general. I would note also that the first quote is taken completely out of context from an abstract of a different paper. When the same claim is repeated in the body of the paper at least 6 papers summarizing the results of at least 4 different studies providing evidence of that claim are cited. While this is not necessarily the amount of evidence I’d want to see to make the claim Professor Heckman and his coauthors made in my own work, is it really that unreasonable if they are basing that claim on the results of several studies?
    
    It is also still not clear to me what you mean about the results of the specific paper at hand having been run through a statistical significance filter. Did you think the way they constructed the samples was chosen to obtain a statistically significant result? Or is it that they reanalyzed a study that has previously been found to have positive effects? In general, statisticians do have to make decisions about sample composition because study designs are often messy, and statisticians also often have to make decisions about working with data that is of high quality but limited in scope and possibly overrepresented in the literature, as opposed to data that is of poor quality but broad in scope and possibly underrepresented. What would you actually suggest they do?
    
    Reply ↓
    - Martha (Smith) on July 21, 2017 12:31 AM at 12:31 am said:
      
      Anonymous,
      
      In response to your last paragraph, see
      https://statmodeling.stat.columbia.edu/2011/09/10/the-statistical-significance-filter/ and the links from that page.
      
      (Also, please note that what Andrew said was “the benefit of the program is estimated using published estimates that are subject to the statistical significance filter.” I hope that you will see after reading the link above that what you said, “the results of the specific paper at hand having been run through a statistical significance filter” is not the same as what Andrew said.)
    - Andrew on July 21, 2017 8:42 AM at 8:42 am said:
      
      Anon:
      
      In answer to your question, “What would you actually suggest they do?”, to start I would suggest they temper their claims and they stop reporting unadjusted estimates with such large biases; for more on this example, see Section 2.1 here. More generally I recommend they become more open to the idea that they might be making statistical errors. I also think they should recognize the limitations of these small experimental studies as noted in your above comment.
    - B on July 23, 2017 11:36 PM at 11:36 pm said:
      
      Andrew –
      
      In terms of what people actually should do, what is your solution when the only thing you can get is 100 observations? It is very difficult to do a randomized control trial with thousands of people. You note in your paper “That’s the way it goes: getting better data can take work. Once the incentives have been changed to motivate higher-quality data collection, it can make sense to think about how to do this” – I think what you can’t forget is that it can take infinite time and money to get perfect data. Decisions do have to be made with data and some statistical tests are better than none. Should we discount positive effects found in research of a new cancer drug because the sample size is 20? Heckman realizes this and cites to multiple studies that have all found similar effects — you are reading that he cited a study with a sample size of 30 and determining that all his work is garbage.
      
      Lastly, I find it ironic that you insult Heckman’s lengthy research history while citing a paper you wrote that cites yourself 10 times that all mentions the same things. Although I find your work illuminating and I certainly believe Heckman flaunts his Nobel Prize around town, it is a reminder that we all have agendas.
    - Andrew on July 24, 2017 9:04 AM at 9:04 am said:
      
      B:
      
      1. You ask, “what is your solution when the only thing you can get is 100 observations?”
      
      I don’t have a “solution”; indeed I think that part of the problem is what Tukey referred to as “the aching desire for an answer” in his famous quote. (“The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.”) Sometimes the data at hand just don’t tell us that much beyond what happened to the particular people in the study.
      
      As I wrote in my comment above, I would suggest researchers recognize the limitations of these small experimental studies and stop reporting unadjusted estimates with such large biases. That said, decisions need to be made. I’m certainly not saying that early childhood intervention should not be funded just because it hasn’t been proven to work. We make decisions given the information that is available. As a statistician, I’m interested in quantifying the information we do have, and part of this work is to assess concerns such as selection bias.
      
      3. You ask, “Should we discount positive effects found in research of a new cancer drug because the sample size is 20?” My response is that it depends on the context, and it depends what you mean by “discount.” But, just speaking generally, in what we call in statistics a frequentist sense, if a series of medical experiments with N=20 are performed, and in each case the outcome data are highly variable, and there is an implicit rule to find and publish statistically significant comparisons, then, yet, this will produce positively biased estimated of treatment effects, and these biases can be huge.
      
      Cancer is important; so is preschool education. Both can be difficulty to study, in large part because outcomes are highly variable and it can require a long time to find out the results, which in turn leads to problems of missing data. All these issues are real, and we’re not doing cancer patients or schoolchildren any favors by sidestepping them.
      
      4. You write that Heckman cites “multiple studies that have all found similar effects.” The trouble is that these studies are in general subject to selection bias—the statistical significance filter, which causes overestimation of effects, what we’ve also called type M (magnitude) errors—and also as my correspondent pointed out, many of these studies were done on the same small datasets, so they don’t represent independent evidence. I have no doubt that Heckman believes these effects are large, and the data are consistent with large effects, but the data also appear to be consistent with small or even slightly negative effects.
      
      5. You write that I “insult Heckman’s lengthy research history” and that I am “determining that all his work is garbage.” Nope. Nowhere did I insult Heckman’s research history, nor did I say that all his work is garbage. Nothing of the kind. What I said is that Heckman made a mistake. We all make mistakes. As I wrote in my post, “I’m not slamming Heckman and his colleagues. I think they can and will do better.” People have pointed out errors in my work too, and not always so politely; see here, for example. When they do this, I don’t say they’re insulting me, I say they’re pointing out possible errors in some of my work. They’re doing me a favor, actually.
      
      6. I have no problem with Heckman letting people know he has a Nobel prize. The point of the prize is to recognize his work; it’s not to be a secret.
      
      7. In my post above, I link to my paper with John Carlin with 31 references, including five articles in which I was an author or coauthor. I published another related article, mentioning the Gertler, Heckman, et al. paper specifically, and with 26 citations, including four articles in which I was an author or coauthor and also four of my blog posts. I included the blog posts because, as stated on the first page of that article, some of the material in the article appeared in these blog posts, and if you repeat previously published work it’s recommended scholarly practice to cite it. More generally, I cite the work that seems most relevant in its development. If appropriate, I will cite my published work five times or whatever. I don’t see what’s wrong with that.
      
      Anyway, the short answer is that Heckman and his colleagues made a mistake: they published estimates with clear biases. To say this is not to “insult Heckman’s lengthy research history” nor is it to say that “all his work is garbage,” nor does it represent an “agenda.” It’s statistical criticism. Studying the frequency properties of statistical estimates is part of a statistician’s job, and the job is hard enough without people going around attaching it to claims of insults, garbage, agendas, etc. I’m glad that other people point out statistical errors that I make too.
    - Anoneuoid on July 24, 2017 11:50 AM at 11:50 am said:
      
      Should we discount positive effects found in research of a new cancer drug because the sample size is 20?
      
      No, you should discount such results because the people claiming they understand the effects of the drug can’t predict anything specific about what it will do. Sample size of 20 is huge if you have a theory that predicts some precise thing will happen to the people who get the drug. How much got done with 9 planets + a few moons and comets for astronomy?
      
      The problem is the people who do eg medical research are by and large “non-quantitative”, so are unfamiliar with the tools they need (calculus, programming) to do a good job or to understand when someone else has done a good job. Instead these red herrings like small sample size are focused on, and they come up with excuses about how complicated their topic matter is.
    - Andrew on July 24, 2017 11:54 AM at 11:54 am said:
      
      Anon:
      
      I’m thinking we could get a sample size of a few hundred planets pretty quickly using Mechanical Turk.
    - Matt on July 25, 2017 7:10 AM at 7:10 am said:
      
      How about the fact that physics is in a league of its own in term of the precision of its theory? You keep asking for that in medical research and social science; sorry, not going to happen. There is simply not going to be theory in these areas that have comparably specific and quantitative predictions to those in physics.
    - Martha (Smith) on July 25, 2017 5:40 PM at 5:40 pm said:
      
      I’m inclined to agree with Matt, but with some elaboration: Medicine and social science inherently involve more randomness (uncertainty, if you will) than physics, which means that we can’t expect results as precise as with physics. However, I think we can do much better in medicine than we are now doing (I am not so sanguine about social science) — and part of doing better is accepting the uncertainty and not looking for so much certainty.
    - N on July 24, 2017 10:45 PM at 10:45 pm said:
      
      Andrew, I’m still trying to understand where the bias is coming from here. In terms of the “garden of forking paths” analogy, what seems to have happened here is: (i) the authors had a choice between merging the ABC & CARE samples, versus treating them as distinct samples (ii) relying on an estimate of the effect of home visits which is derived from the same data, they conclude that home visits have weak effects (iii) based on (ii), they chose to merge the two samples. Any bias is thus coming from the data-dependent choice in step (iii): had the researchers been presented with a different dataset where the estimated effect of home visits was strong, they would have chosen not to merge the two samples in step (iii).
      
      Is that the critique of this paper?
      
      If so, it seems to me the obvious solution (and one that a good referee who reads the paper carefully would surely suggest) is to report the coefficient estimates without merging the ABC/CARE samples. If it then turns out the coefficient estimates are not sensitive to this choice, we should be less worried about bias stemming from making data-dependent choices. I am not sure why the solution should instead be to “stop reporting unadjusted estimates with such large biases”.
      
      I completely agree by the way that researchers should temper their claims, because right now we are in an unfortunate equilibrium where researchers make strong claims and the audience has to mentally discount the strength of these claims. An unfortunate side-effect of this of course is that a researcher who makes temperate claims gets ignored completely in the rabble.
    - Andrew on July 24, 2017 11:51 PM at 11:51 pm said:
      
      N:
      
      1. There are many forking paths in the analysis, not just the two or three mentioned in your comment above. Given all the forking paths, it is no surprise that the researchers were able to find statistically significant differences. In the context of forking paths, I think it’s appropriate to do something like a multiverse analysis (see here) and a hierarchical model (see here).
      
      2. But even without any forking paths at all, the estimate is biased because of selection on statistical significance. In this particular example, I think the bias is large; see section 2.1 of this paper.
      
      Point 2 is why I don’t think it’s appropriate to report the raw estimates with no bias correction.
      
      As I wrote in my above post, what bothers me is not just the lack of bias correction in the published paper, but also the apparent lack of recognition that the estimates are biased in the first place. Sure, this bias issue is tricky—that’s why I needed to publish several articles on the topic, and that’s why it’s necessary to be explaining it in this blog comment—but the whole point of having a statistician or econometrician as a coauthor on a project like this is to fix this sort of statistical problem.
    - N on July 25, 2017 1:35 AM at 1:35 am said:
      
      Andrew:
      
      Thanks for the response and the links to the papers. As you mention in the first paper, many economics papers carry out some form of robustness/sensitivity analysis. If there is some step in the analysis where there are potentially multiple ways to proceed, they show the estimates for each of these possibilities and discuss how “robust” the results are. To me this seems to be a crude form of multiverse analysis (though it would be helpful if people presented something like Table 1 in the first link!). This however is more common for the analysis carried post-data processing, and much less common during the data processing stage, so I agree there is much room for improvement here.
      
      On point 2, I am not sure if I am quite following you. Suppose researchers set out to study effect X, and there are no forking paths in the analysis. Now one of two things could happen.
      
      [A] If they find statistically significant evidence of effect X, the paper gets published in a top journal.
      [B] If not, they can either (i) choose not to write the paper (ii) write the paper anyway and publish it probably at a less well-known journal.
      
      In scenario [A] (which is where Heckman et al. find themselves), there should only be a bias correction if in the counterfactual scenario [B], the authors would have chosen to abandon the project and not write the paper at all. If instead, the authors are committed to writing the paper whatever the results may be, then they can safely publish the uncorrected estimates in scenario [A]. Right?
      
      Of course I realize there is an obvious peril here, which is that no researcher will admit (maybe even to themselves) that in scenario [B] they would in fact have taken route (i).
    - Keith O'Rourke on July 25, 2017 9:12 AM at 9:12 am said:
      
      N: You need to be able to rule the study credibly outside the reference class of studies that could end up being published selectively or with possibly selective analyses being done within them. And you have to do that with just the published paper and public documents.
      
      I do believe in some areas of regulatory review there can be adequately supervised and audited research (study was discussed before it was conducted, a detailed protocol was set out, deviations from that were documented, raw data was provided in full, all questions to investigators, including suggested sensitivity analyses, were answered and any other study being done except possibly in third world countries would known about) that can be the case.
      
      In an academic setting that sort of confidence _today_ would require insider’s knowledge which by definition isn’t public. Some partial fixes suggested here https://www.stat.columbia.edu/~gelman/research/published/incrementalism_3.pdf
    - N on July 25, 2017 4:45 PM at 4:45 pm said:
      
      Keith:
      
      (For some reason I was not able to reply directly to your comment so I have replied to Andrew’s original comment instead).
      
      I looked at the paper and I believe most of the suggestions are various ways to avoid making multiple comparisons, for example by appropriate design and by doing a multiverse analysis. Assuming that researchers do deal with any multiple comparisons in their own analysis and the study was done correctly, and if they remain in the frequentist framework, I still do not see why they shouldn’t then report just the raw estimates without bias correction. Publication bias is very much a real thing but there does not seem to be a way for an individual study to adjust for that when reporting its own findings (except by doing the kind of counterfactual thinking I mentioned, where they compute the probability that the paper would still be published had they found an insignificant estimate). Instead, it seems that publication bias correction is better done in surveys of the literature, which can look at all the studies published on a given effect, and come up with an E(beta) which takes into account the fact that statistically significant estimates are much more likely to be published.
    - Andrew on July 25, 2017 4:52 PM at 4:52 pm said:
      
      N:
      
      You write, “Assuming that researchers do deal with any multiple comparisons in their own analysis and the study was done correctly . . .” Here’s the problem: As Eric Loken and I discuss in our “garden of forking paths” article, the problem is not just multiple comparisons, it’s also multiple potential comparisons.
      
      Regarding your suggestion that the authors report their biased point estimates and leave it for later researchers to follow up and fix the biases using meta-analysis: The trouble is that Heckman and others are taking these numbers and trying to use them to make policy conclusions. I don’t think we should be making policy based on estimates that have such large biases.
      
      I’m offering two possible solutions: (a) analyze all the potential comparisons using a multilevel model, or (b) make some sort of bias correction on the reported results. I’d prefer option (a), but given that some estimates have already been published and publicized, I also see the virtue of option (b) as it would allow some salvage of what’s out there. Either (a) or (b) would pull estimated effect sizes toward zero.
    - Keith O'Rourke on July 26, 2017 8:06 AM at 8:06 am said:
      
      N: We just have keep responding to “Andrew says:July 24, 2017 at 11:51 pm”
      
      > Assuming that researchers do deal with any multiple comparisons in their own analysis and the study was done correctly, and if they remain in the frequentist framework, I still do not see why they shouldn’t then report just the raw estimates without bias correction.
      
      OK but those are seldom reasonable assumptions unless you have insider knowledge and then all you can say is trust me you can trust these guys did it right. If you just have the published paper there is ample empirical evidence that whats claimed to have been in published papers was often not actually done in the study. Additionally, your insider knowledge might have overlooked that the authors overlooked the multiple potential comparisons Andrew mentions.
      
      More generally, all the statistical techniques and method to avoid bias are just attempts – they are not guarantees. My personal sense of the remaining biases to finding and exaggerating effects in adequately supervised and audited research is that the assessed p_values are about half of what they actually should be and if its good to find an effect it is overestimated and if its bad its underestimate. Life finds a way to screw up controls put on it.
      
      Even more generally, no one should by try to conclude anything from a single study (except maybe in a pandemic emergency) but rather should simply be reporting on what was done and what was observed – ideally making all the data securely available to confidential access by third party researchers to do a comprehensive meta-analysis [ Meta-Analysis. S Greenland, K O’Rourke https://www.amazon.ca/Modern-Epidemiology-Kenneth-J-Rothman/dp/1451190050 ]. Anything else puts everyone at risk of more harm than good.
    - Sameera Daniels on February 5, 2019 8:39 PM at 8:39 pm said:
      
      Yes, they may have been making statistical errors.
M on July 20, 2017 1:14 PM at 1:14 pm said:

Successful people deal with huge research compromises, and some sucumb to compromises

Reply ↓
phayes on July 20, 2017 2:00 PM at 2:00 pm said:

“zero acknowledgement of the problem.”

That seems to be standard practice in economics. Economics bloggers are even worse.

Reply ↓
Jimbo on July 20, 2017 7:52 PM at 7:52 pm said:

*revs engine*

Reply ↓
Terry on July 20, 2017 10:22 PM at 10:22 pm said:

There is a fairly famous PNAS study “showing” that parole board decisions are heavily influenced by how hungry the judge is.

Here is a pretty funny refutation of the study.
https://daniellakens.blogspot.nl/2017/07/impossibly-hungry-judges.html

Thought you might like this because the argument is that the result is too ridiculously large to be real (society would be completely different to take the effect into account if it were actually true).

Reply ↓
- Thomas on July 21, 2017 7:58 AM at 7:58 am said:
  
  Nick Brown’s comment to that post is great: “A lot of (social) psychology seems to be about chasing effects that are invisible to the naked eye, and were unknown to Plato or Shakespeare, yet apparently emerge the size of an elephant with a sufficiently clever protocol.”
  
  Ordinary claims require ordinary evidence. Claims about psychological mechanisms that were unknown to Shakespeare require more than a clever protocol.
  
  Reply ↓
  - Andrew on July 21, 2017 8:54 AM at 8:54 am said:
    
    Thomas:
    
    To be fair, often the ideas in these social psychology papers have been anticipated by Plato, Shakespeare, etc. See items 2 and 12 in Zwaan’s list here.
    
    Reply ↓
    - Thomas on July 21, 2017 1:31 PM at 1:31 pm said:
      
      Good point. I guess these sorts of studies provide ordinary evidence where either extraordinary evidence or none at all is called for. They’re bringing their knives to either gun or fist fights.
    - Andrew on July 21, 2017 1:38 PM at 1:38 pm said:
      
      Thomas:
      
      Yes, but I’d prefer to avoid these analogies to combat and war. It reminds me too much of Fiske’s equating scientific criticism with personal attacks and terrorism.
Terry on July 20, 2017 11:17 PM at 11:17 pm said:

This obsession with the Abecedarian (and Perry Preschool) data is pretty weird. The studies are tiny and about 50 years old. Is that really the best evidence available? Haven’t there been dozens of similar projects in the meantime? If you have to go back 50 years to find data to support your hypothesis, doesn’t that pretty much demonstrate that you don’t have good support?

Also, people used to talk about the Milwaukee Project too, but that collapsed in a cloud of fraud. So shouldn’t there be a correction for possible fraud (or more likely, well-intentioned strong-arming of the data) in any analysis of the Abecedarian and Perry projects? The Milwaukee Project proved that the probability of such conduct is greater than zero in these types of studies.

Reply ↓
- Elin on July 27, 2017 9:47 AM at 9:47 am said:
  
  No I don’t think there have been “dozens” or even a few that feature both reasonable controls and long term follow up. It’s the same as with the Jamaica study, the fact that they have such complete long term follow up is the distinctive element. if you want 10, 20, 30 years of follow up that is a major commitment of resources and time plus implies the need for patience. Yes, it is all very well established from many small studies of various levels of quality that there is measurable short term benefit (say roughly through the end of first grade) of high quality preschool. Yes, based on the subset of studies that continue the differences basically disappear as children move into the childhood (as opposed to early childhood) levels of schooling. (Does that mean it’s wrong to help kids be more successful in kindergarten and first grade? Personally no, I think those grades where you are learning to read are important but YMMV.)
  Now we want to know what happens long term, high school graduation, college enrollment, pregnancy, marriage, jobs. There are very few studies that have had the continuity of both vision and funding to be able to assess those.
  That said I feel that when we start adding more and more potential outcome variables, all of which are completely justifiable and substantively important (as well as predicted by theory) as people move through the life course we are adding more and more researcher degrees of freedom. There is no way to avoid this because studying humans is a lot harder than studying plants or particles because they are so complex and (thankfully) long lived.
  
  To me, part of the issue is that the multiwave follow up is both
  
  Reply ↓
Ruben on July 21, 2017 2:03 AM at 2:03 am said:

> who wants to be the grinch who’s skeptical about early childhood intervention?

it’s not a nice job to do, but these two do it well:
https://sites.uci.edu/dhbailey/publications/
I especially liked: https://www.researchgate.net/publication/307944850_Persistence_and_Fadeout_in_the_Impacts_of_Child_and_Adolescent_Interventions
https://www.johnprotzko.com/cv/

I think Heckman can be called dishonest. For example, he puts out made-up graphs like this one
https://twitter.com/heckmanequation/status/880052656785375233

and says on his website that fadeout is a myth https://heckmanequation.org/
he invents the word “fade up” to counter it, when all the evidence to the contrary is a clear example of picking and choosing outcomes after the fact.

Really dishonest or heavily deluded.

Don’t forget advocating this type of expensive intervention is probably doing active harm compared to the way simpler alternative of “giving money to the needy” (e.g. see GiveDirectly being used as a baseline in GiveWell).

Reply ↓
- Andrew on July 21, 2017 9:01 AM at 9:01 am said:
  
  Ruben:
  
  Just on your last point, it’s not clear to me that “giving money to the needy” is the alternative that would be occurring if these programs were abolished. It’s not like there’s a fixed amount of resources to be given to poor kids, and if more is given to program A, then less will be given to program B. It could even be the other way: to the extent that program A is viewed as a success, this could create the political environment under which more would be spent on program B. This is all separate from the question of the efficacy of these programs; I just think you have to be careful when saying that a program is good or bad by comparing it to alternatives in this way.
  
  Reply ↓
  - Kyle MacDonald on July 21, 2017 1:16 PM at 1:16 pm said:
    
    It’s interesting that Ruben’s example is GiveDirectly being used as a baseline for GiveWell. A single person’s decision about where to donate money might well be a case where there effectively is “a fixed amount of resources to be given to poor kids, and if more is given to program A, then less will be given to program B”. People might dig a bit deeper in their pockets if it looks like a particular charity is doing really good work, but not to the extent that a national government, say, can decide to decide more of the federal budget to foreign aid or education. Another good example of how it’s dangerous to compare personal finance and economic policy.
    
    Reply ↓
  - Ruben on July 22, 2017 3:18 AM at 3:18 am said:
    
    Sure. That was kind of a throwaway at the end. You should probably compare it to something rather than nothing though (both in the RCT and in the policy discussion), but I don’t know about politics to say what would be realistic.
    
    Reply ↓
- Kyle C on July 21, 2017 11:41 AM at 11:41 am said:
  
  And (although I got pushback here last time I raised this) remember that any long-term *employment* effects such studies measure take place in an environment of less than full employment (certainly at the lower end of the wage scale). This means that, ceteris paribus, even if an early intervention has a positive effect on adult employment, you have basically swapped some underprivileged people (your subjects) into jobs that other underprivileged people could have filled. There is no question that every person deserves the best possible education, but the overarching policy interest in these interventions is clearly to lift underprivileged *populations* out of poverty, and it is far from clear that “more skills” is even theoretically how to do that.
  
  Reply ↓
  - Kyle MacDonald on July 21, 2017 1:29 PM at 1:29 pm said:
    
    Could you explain why you think that any increased employment for children who receive some kind of intervention would generally be at the same kind of job that they would have if they didn’t receive an intervention? I’m not familiar with the details of the studies, but it seems like any employment benefit would derive at least in part from better education outcomes that gave access to jobs that they wouldn’t have been able to land otherwise, and for which they therefore wouldn’t be competing with their fellows who didn’t receive the intervention. IIRC, the children who received the intervention were in fact more likely to leave the country later in life, which doesn’t suggest that they were succeeding by becoming more competitive within the domestic job market.
    
    If the only job available was bricklaying at an hourly rate, and the only possible skill you could acquire was to lay bricks better and thus become a better job applicant than your neighbour, then I would agree that skill improvement would be a zero-sum game. But I’m not convinced that any labour market actually looks like that.
    
    I wasn’t around last time you raised this, so if I’m re-hashing old points, please tell me.
    
    Reply ↓
    - Kyle C on July 21, 2017 2:19 PM at 2:19 pm said:
      
      I’m not familiar with the data on leaving the country, which would certainly add a twist. In the studies I am thinking of, members of the treatment group, as adults, tended to have (as I recall) only about 4 negative interactions with law enforcement, compared to about 7 for the control group, and tended to hold their jobs longer and earn a bit more. Those results did not suggest that the treatment group had “jumped their class,” as a Brit might say, or were competing for totally different jobs than the other group.
    - Kyle MacDonald on July 21, 2017 5:11 PM at 5:11 pm said:
      
      Fair point. For this study, I’m not really interested in chasing down the details, but I’ll definitely keep this kind of thing in mind when reading similar things in the future.
    - Elin on July 27, 2017 1:33 PM at 1:33 pm said:
      
      Actually if you look at the Jamaica study two of the things that made it complicated were that the intervention students were more likely to emigrate and many more of them were still in school at the time income was measured for some of the follow ups. So it is indeed very complicated to understand direct and indirect differences with long term follow up.
    - Kyle C on July 27, 2017 4:53 PM at 4:53 pm said:
      
      Understood. My general assertion is that the question funders and policy makers really want answered is, “Would giving this intervention to *every* person in the subject population help society to close the socioeconomic gaps we are worried about?” whereas the between-group comparisons answer a different question (does the intervention help, at the margin, the people who get it, versus those who don’t). It’s almost as if they were giving intensive swimming lessons to young triathletes. Probably the treatment group would place higher in future triathlons, but that can’t mean that giving everyone the same lessons would make everyone place higher. One could argue that the analogy doesn’t hold, and that class barriers are in fact so low that a few years of Head Start for everyone in the lowest income quintile would greatly increase their group mobility into higher quintiles (as many would hope). But that could only be addressed at a macro level, and you would need to identify a credible mechanism for mass improvement in employment prospects, which is not evident (to my knowledge) in either micro or macro data. This is, in a way, a “researcher degree of freedom” to design studies to answer an unhelpful question.
- Elin on July 27, 2017 1:30 PM at 1:30 pm said:
  
  Even though my kid had strep throat before and antibiotics helped, she still got strep throat again 2 years later. Fade out effect?
  
  Reply ↓
Jordan Anaya on July 21, 2017 10:20 AM at 10:20 am said:

The Perspectives on Psychological Science just posted what appear equivalent to blog posts (the irony) from the “Special Symposium on the Future Direction of Psychological Science”:
https://journals.sagepub.com/toc/pps/current

They really got the A-Team of people to discuss methods. I didn’t read all of them, but you will probably want to take a look at Fiske’s, which mentions your blog in the context of personal attacks.

Reply ↓
- Martha (Smith) on July 21, 2017 10:47 AM at 10:47 am said:
  
  Susan Fiske’s abstract contains the sentence, “Principles facilitating this adversarial collaboration include using our respective tribes as secure bases for exploration,” which boggles my mind. Tribes???
  
  Scott Lilienfeld’s abstract (https://journals.sagepub.com/doi/full/10.1177/1745691616687745) seems the best of the lot, by a long shot.
  
  Reply ↓
  - AnonAnon on July 21, 2017 2:23 PM at 2:23 pm said:
    
    Martha,
    
    Here’s a decoding of “Principles facilitating this adversarial collaboration include using our respective tribes as secure bases for exploration.”
    
    Tribalism (e.g. respective tribes) in the stereotyping and prejudice literature, which is one of the areas that Susan contributes to, is commonly referred to construct describing the role that group membership plays in things like stereotype formation. Tajfel, H. (1982). Social psychology of intergroup relations. Annual review of psychology, 33(1), 1-39. would be a classic reference I believe.
    
    Susan mentions secure bases for exploration in a call out to attachment theory.
    
    In the 1970s, problems with viewing attachment as a trait (stable characteristic of an individual) rather than as a type of behaviour with organising functions and outcomes, led some authors to the conclusion that attachment behaviours were best understood in terms of their functions in the child’s life.[123] This way of thinking saw the secure base concept as central to attachment theory’s logic, coherence, and status as an organizational construct.[124] Following this argument, the assumption that attachment is expressed identically in all humans cross-culturally was examined.[125] The research showed that though there were cultural differences, the three basic patterns, secure, avoidant and ambivalent, can be found in every culture in which studies have been undertaken, even where communal sleeping arrangements are the norm.
    
    Reply ↓
    - Daniel Lakeland on July 21, 2017 5:38 PM at 5:38 pm said:
      
      All of this indicates to me that the goal of Fiske et al is mainly rent-seeking: keep a cushy job by continuing to stay “attached” to a “tribe” that provides a “secure base” where she’s more or less immune to having to deal with the issue that the field doesn’t provide the service that it’s being paid to provide.
      
      Being attached to some happy people who nurture you is not in and of itself a problem. I mean, Comic-Con and The Grateful Dead are the same thing. None of that would be problematic except for the part where dollars are taken from the public and funneled into labs run by people in her “tribe” for the purpose of providing useful accurate predictive theories about policy and everyday life and all that stuff she happy-talks in Andrew’s quote below but in fact the useful accurate predictive stuff never happens.
      
      https://statmodeling.stat.columbia.edu/2017/07/20/nobel-prize-winning-economist-become-victim-bog-standard-selection-bias/#comment-527731
    - AnonAnon on July 21, 2017 6:31 PM at 6:31 pm said:
      
      Daniel,
      
      I understand the temptation to scare quote attached, tribe, and secure base, but that’s sloppy and doesn’t capture what I think Susan is trying to communicate. Granted she’s relying heavily on the jargon associated with those terms. One could easily substitute other jargon to see what (I think) Susan is trying to say here:
      
      Principles facilitating this adversarial collaboration include using our respective theoretical frameworks as fulcrums for joint scientific exploration.
      
      Susan is using jargon familiar to her to express what I think is at least on its own a reasonable premise. Namely, that adversarial collaborations don’t require abandoning our preferred theoretical frameworks. One can work on a problem and be ecumenical in the use of frequentist or bayesian frameworks to tackle it.
      
      Now while I might be willing to buy that premise in another argument, I don’t happen to buy it here because I believe Susan is being sloppy.
      
      As for your point regarding rent seeking behavior by Fiske: Even if I were to grant you that she’s made poor editorial choices as a gatekeeper for PPNAS you’d still have the remaining body of her research to contend with and arguably discredit before you could sustain that line of argument.
    - Daniel Lakeland on July 21, 2017 11:35 PM at 11:35 pm said:
      
      Let’s just say that looked at from the outside the jargon doesn’t help.
      
      I wasn’t referring to Fiske directly in terms of rent-seeking behavior. I point this finger at all of Psychology (based mainly on information from this blog, I don’t have lots of background in Psych), and Medicine (where I know more) and lots of Biology (where I’ve actually published), and lots of Engineering (which I studied and have a PhD in and published in), and let’s not forget all the Wansinkers, and that’s not even to mention all the outright intentional frauds such as LaCour and Potti and whoever. Everywhere I’ve looked, sure there are some people doing some good work, but plenty of rent-seeking made easier by in-group out-group politics and special privileges.
      
      Today grants are handed out by committees that are composed primarily of … people who write grants to be handed out by other committees composed of… you guessed it. Academia as a whole is largely rent-seeking directly from the government.
      
      It might be acceptable if at least we occasionally got flying cars and major reductions in air pollution and cheap low pollution energy production, and cures for breast cancer, and all that nice stuff we’re always promised by the University Public Relations office whenever they tout the new multi-year multi-center multi-PI comprehensive cancer eradication and smart energy CO2 sequestration and cure for obesity grant extravaganza and continual soiree.
      
      I don’t buy it anymore, unfortunately the track record is pretty darn bad.
      
      Not that the private sector is any less rent-seeking… just that it’s infected our society in bulk as an equal opportunity productivity killer.
    - Daniel Lakeland on July 21, 2017 11:42 PM at 11:42 pm said:
      
      “I wasn’t referring to Fiske directly in terms of rent-seeking behavior” should really read “I wasn’t singling out Fiske only”, I did after all say “Fiske et al” but I meant for the “et al” to be pretty broad, basically any of the many poorly behaving academics across all fields.
      
      It might help to know that I’ve been personally following this story fairly closely:
      
      https://www.latimes.com/local/california/la-me-usc-doctor-20170717-htmlstory.html
      
      I think it reeks of the stench of Academia-as-tax-exempt-hedge-fund that permeates the modern University. The story sounds just like the ones about all the high-flying-crash-and-burn startups in Silicon Valley as well, also feeding off the government teat of near zero interest rates.
    - Andrew on July 22, 2017 10:37 AM at 10:37 am said:
      
      Daniel:
      
      I followed your link, and . . . wow! Among other things, this story indicates the value of a free press at the national, state, and local levels. The basic setup is a dog-bites-man mashup of the familiar stories of swaggering-executive and bigshot-doctor-becomes-drug-addict; what makes it stand out is how the doc/executive combined the two roles, being not just a quiet addict (that must happen all the time) but getting into the whole underworld lifestyle.
    - AnonAnon on July 22, 2017 3:05 PM at 3:05 pm said:
      
      Daniel,
      
      Ah I see what you’re saying now regarding rent seeking. Thanks for the clarifying and expanding your point. I think it’s a good one.
      
      I’ve been on the short end of the funding stick before as a graduate student because I wanted to push a more statistically complicated model as part of a DDIG grant. The cynical part of me wants to point out how similar rent-seeking behavior is to the standard meritocracy model of science where the best students work with the best researchers who run the best labs.
      
      Now that I’m in industry though I’ve seen enough start ups implode that I wonder what other factors might be contributing. For example, how much inefficiency in federal funding comes as a byproduct of knowledge extraction. I mean cures for cancer, flying cars, etc are hard problems which will probably produce plenty of abortive attempts. So how does one distinguish between the nature of the risk in investing and the rent seeking behavior? (Genuinely open ended question for a Saturday morning with admittedly little thought put into it.)
    - Martha (Smith) on July 22, 2017 5:06 PM at 5:06 pm said:
      
      The Puliafito story is indeed distressing.
    - Daniel Lakeland on July 25, 2017 2:14 PM at 2:14 pm said:
      
      AnonAnon:
      
      The rent seeking behavior is where people who are supposed to be working on problems like flying cars and cures for cancer and cheap energy are instead doing X,Y,Z which have basically no hope of moving us towards those goals, have “scientistic” justifications that don’t hold up under close examination, and involve all the usual flim-flam of used car salesmen.
      
      Science is a classic case of a combination of asymmetry of information, and the principal-agent problem:
      
      The general public wants … scientific good stuff (flying cars, cancer cures, lowered pollution, high quality chemicals). But is by and large incapable of evaluating the research proposals. Instead, agents evaluate the research proposals. Typically, these are agents called “program officers” in federal organizations like NIH, NSF etc. These program officers are not really capable of evaluating either the volume or the content of many proposals, and so they pass the duty on to further agents in the form of … volunteers from among the grant-seekers, in other words people whose whole livelihood depends on continued flow of money from the agencies into their own pockets. Now, of course people don’t review their *own* grants, that would be too obvious of a conflict of interest, but they do review grants *of their friends and colleagues* and so an effective strategy in this game-theoretic scenario is tit-for-tat in which the grant reviewers (AKA the scientists) don’t work too hard on the interests of the people they represent (the taxpaying public) and instead of looking hard at the big picture (why are you doing a study of which chemical is better at killing prostate cancer cells in a dish, when you don’t even have a comprehensive theory of what it means for a cell to be “a prostate cancer cell”). No, instead we have lots of picky fiddly bits about ultimately irrelevancies regarding specific methodologies in a poorly motivated study, plus a GENEROUS helping of “name recognition”.
      
      The public gets exactly what it would get if everyone who wanted a car had to pay money into a web site and then a third party person would go to some big CarMax type dealer, and get you a car and drive up to your door with whatever they decided was best for you, and by the way the third party people who do this are… you guessed it, people who sell used cars to the car lots.
    - Martha (Smith) on July 22, 2017 12:21 AM at 12:21 am said:
      
      AnonAnon,
      
      Thanks for providing the “decoding”/explanation of Fiske’s terminology and background. It helps a little — but the differences in background do till seem rather overwhelming to me.
      
      Your translation, “Principles facilitating this adversarial collaboration include using our respective theoretical frameworks as fulcrums for joint scientific exploration” is a little more intelligible to me than what Fiske wrote, but “fulcrum” seems out of place to me (I’m not criticizing; just saying that we seem to be using different languages that seem to reflect different ways of thinking.)
      
      I also don’t agree with your further translation “adversarial collaborations don’t require abandoning our preferred theoretical frameworks.” I can accept that this may sometimes be true, but believe that it is often not true — the difference in “preferred theoretical frameworks” may in some cases be crucial. I see your example “One can work on a problem and be ecumenical in the use of frequentist or bayesian frameworks to tackle it,” as a case where, although there are serious differences between Bayesian and frequentist frameworks, they both rely on a common common underlying framework, namely, probability. I’m afraid I don’t (at least at this point) see any analogous commonality between my thinking/worldview and Fiske’s.
    - Martha (Smith) on July 22, 2017 12:23 AM at 12:23 am said:
      
      oops — “do still seem rather overwhelming to me.” (not “till”)
    - AnonAnon on July 22, 2017 2:50 PM at 2:50 pm said:
      
      Martha,
      
      I can understand why you wouldn’t see any analogous commonality between your worldview and Susan’s. To be fair, even as a social psychologist by training, I don’t see analogous commonality between my worldview and Susan’s. In reading her article, I found her argument to be sloppy (as I noted briefly in my reply to Daniel). She’s trying to equivocate between psychological and methodological frameworks. It’s like comparing the situationist account for human behavior (i.e. social settings direct our behavior a majority of the time) with the frequentist framework in statistics. It’s an apples and oranges comparison. It’s sloppy.
      
      And I think that as you keenly pointed out, for an adversarial collaboration to work, those frameworks must be rooted in some common framework. So in my opinion what Susan is doing is some very sloppy slight of hand.
- Andrew on July 21, 2017 11:37 AM at 11:37 am said:
  
  Jordan:
  
  I followed the link and read Fiske’s article. Her characterization of my work is inaccurate.
  
  Fiske writes, “some critics go beyond scientific argument and counterargument to imply that the entire field is inept and misguided (e.g., Gelman, 2014; Shimmack, 2014).” I can’t speak for Shimmack (actually, I think Fiske misspelled his name) but I took another look at the cited Gelman, 2014, which is this post entitled, “How much time (if any) should we spend criticizing research that’s fraudulent, crappy, or just plain pointless?” Nowhere in that post did I say or imply anything about “the entire field.” I don’t just mean that I never used the phrase “the entire field”; I mean that I never talked about the entire field of psychology, or social psychology, or whatever. I did write that there was bad research, and I referred to hype as “the Psychological Science strategy,” but it’s clear in the post that I was referring there to the journal Psychological Science, not the entire field of psychology.
  
  The other place Fiske mentions my name is here: “often going hand-in-hand with impugning my (and other people’s) motives (search the Gelman or Shimmack [sic] blogs for my name; I prefer not to rehash the personal attacks).” It’s impossible for me to respond to a criticism with no specifics, so all I can say is that I have never personally attacked Fiske. I’ve questioned her scientific judgment on a number of occasions. That’s not a personal attack. Fiske and I just have a scientific disagreement about the merits of articles she has published as journal editor on himmicanes, air rage, ages ending in 9, etc. I have in many places given my scientific reasons for why I am not convinced by those published papers, and I’ve also speculated as to how it is that Fiske and others could be misled by such work, in the same way that earlier journal editors were misled by the work of Daryl Bem, Satoshi Kanazawa, etc. None of this is a personal attack. Science is hard, statistics is hard, and it’s worth exploring how people can get things wrong. But I think it will be difficult for Fiske to make progress as long as she’s operating from a position of happy talk such as “Psychology is definitely going in the right direction . . . we are everywhere in policy advice and popular culture . . . the Obama White House listened to a Social and Behavioral Sciences Team . . . psychological science appears regularly in the New York Times . . . TED talks . . . Consumers of popular culture cannot get enough of our field. . . .”
  
  Beyond this, I find it laughable that someone who wrote about “methodological terrorists” and never apologized for it, should be talking about “the tone of online discussions.”
  
  Reply ↓
  - Jordan Anaya on July 21, 2017 12:05 PM at 12:05 pm said:
    
    My main issue with these articles by Fiske, Cuddy, or whomever, is that they selectively present evidence, and when they do, it’s an obscure reference to some blog, somewhere.
    
    Here’s an example, Fiske writes:
    “Indeed, I have received nearly 100 emails from people supporting the APS Observer column and voicing their relief that someone less vulnerable would raise the issue of tone.”
    
    Every time I hear something to this effect I can’t help but think of Donald Trump and him saying “lots of people tell me…many people are saying”.
    
    I don’t doubt that Fiske has received emails–I’ve received emails from people thanking me for my work on exposing Wansink (though not 100). But I don’t use that as evidence that I was correct in publicizing problems with Wansink’s work, as I suspect people who disagree with what I did wouldn’t email me.
    
    Fiske seems to be assuming that despite her position, she is actually speaking for young scientists who are afraid of these vicious terrorists that are just waiting to criticize their amazing work, which is turning people away from the field. Has she ever thought about the possibility that there are students who are being turned away from the field because it is dominated by researchers who perform terrible science, use said science for fame and profit, and then try to discredit anyone who criticizes the work? Maybe she just doesn’t get those emails. Maybe we should do something about that.
    
    Reply ↓
    - Andrew on July 21, 2017 1:21 PM at 1:21 pm said:
      
      Jordan:
      
      Yes, I receive emails too. I’m sure that Fiske is speaking for some young scientists, and I’m speaking for other young scientists. The trouble is, it’s easy to suppose that there’s a “silent majority” agreeing with you, even if this isn’t the case.
  - Anoneuoid on July 21, 2017 1:47 PM at 1:47 pm said:
    
    the entire field is inept and misguided
    
    Not “entire” but close enough, and psychology is the least problematic instance of this (education, medical, etc). Many fields of research are 99.9% a waste of time right now, and have been that way for a generation at this point. Future generations will need to redo all of it to figure out what is actually going on.
    
    When I did biomed research I really wanted to do a good job and help people. There were so many needless obstacles put in the way, all due to people like Fiske believing something about statistical significance that isn’t true. So much wasted time/effort unlearning wrong (or who knows?) things and pointing out the same errors over and over…
    
    Reply ↓
  - Jordan Anaya on July 26, 2017 1:58 PM at 1:58 pm said:
    
    Andrew:
    I know you aren’t on Twitter so I thought you might be interested to see that Chris Chambers has publicized the issue: https://twitter.com/chrisdc77/status/890212842779136000
    
    I’m not sure if you plan on blogging about the incident.
    
    Reply ↓
    - Andrew on July 26, 2017 2:22 PM at 2:22 pm said:
      
      Jordan:
      
      I find the whole thing very upsetting, and in all seriousness I wish nobody had told me about that article by Fiske. I feel like I should respond because she was mischaracterizing my work (and that of Schimmack, whose name she couldn’t even bother to spell correctly). I’m tempted to echo Bob Dole and say that Fiske should “stop lying about my record,” but I don’t know that she’s lying. Lying is the knowing telling of an untruth, and it could well be possible that Fiske didn’t actually read the posts by Schimmack and me that she cited, or that she somehow thought that criticizing of certain published work in her field was somehow a criticism of “the entire field.” Maybe she even thinks that her work and that of her friends is “the entire field” of psychology. I have no idea.
      
      Anyway, I sent an email to the journal editor suggesting a correction so I’m hoping that will work out. I’ll blog too, but I thought it could make sense to get the correction settled first. The journal editor persuaded me that they can’t easily correct Fiske’s claim that Schimmack and I had done “personal attacks” as this is just too subjective a concept—it makes me wonder, if the concept is so subjective, how it ended up in a peer-reviewed article published in Perspectives on Psychological Science, but that’s another story—but the mischaracterization of my work and Schimmack’s (as well as the spelling of Schimmack’s name) are clear errors, so I assume the journal will run a correction there.
      
      If they don’t correct it then they really are, ummm, not “lying” exactly . . . What’s the word for if you make a false statement but then refuse to correct it after the error has been pointed out to you? “Brooksing”? In any case, it’s not something that anyone would want to be doing in a serious scientific journal.
      
      I have no desire to make a big deal about this. Fiske is bothered, perhaps legitimately, by the habit that Schimmack and I have of not showing deference to various published articles in social psychology. She can do her best to make the case that disrespectful criticism is a bad thing, it can foster a defensive attitude, it can intimidate people from trying out speculative research ideas, all sorts of things. Perhaps statistical criticism is a bad thing because all sorts of wonderful ideas from air rage to ESP to the behavior of patrons at pizza buffets can be unfairly disrespected just because they happened to have appeared in papers with errors in statistics and methodology. I disagree with Fiske’s position on these issues—I think that, on balance, severe criticism is a good thing, indeed I’ve welcomed severe criticism of my own work, and I’d prefer to see criticism appear right away, on social media if necessary, rather than waiting for long review processes—but I respect that Fiske has a view here, which she supports with arguments even if with no data. It should be possible for Perspectives on Psychological Science to correct Fiske’s clear errors while allowing her to state her opinions, expressed as such.
    - jrc on July 26, 2017 2:51 PM at 2:51 pm said:
      
      “Whether rank is chronically possessed or temporarily embodied, higher ranks create psychological distance from others, allow agency by the higher ranked, and exact deference from the lower ranked. Beliefs that status entails competence are essentially universal.”
      
      And knowing is (apparently only) half the battle.
      
      https://static1.1.sqspcdn.com/static/f/1605966/27061973/1465245648073/Fiske+COP+2016.pdf?token=CQKLsHU5ttAwDAK61%2BcZduFqeos%3D
    - Andrew on July 26, 2017 6:02 PM at 6:02 pm said:
      
      Jrc:
      
      Wow. That’s an amazing quote from Fiske. From her point of view, though, she’s in the business of protecting low status people such as her collaborators Cuddy and Norton, and the sorts of people who would be intimidated away from working on topics such as air rage, himmicanes, ESP, pizza consumption, etc., because of fear that someone like Nick Brown might check to see if their t-statistics add up.
    - jrc on July 26, 2017 7:28 PM at 7:28 pm said:
      
      I googled something like “fiske deference status” and those are literally the 3rd and 4th sentences of the paper that popped up.
    - Martha (Smith) on July 27, 2017 1:09 AM at 1:09 am said:
      
      A little further than jrc’s quote is a section titled “Defining power and status”. It begins,
      
      “Expert consensus is clear: Power is asymmetrical control over resources, and status is social prestige [2,3*].”
      
      I can’t imagine myself writing a section titled “Defining x and y” that starts, “Expert opinion is clear.”
      
      BTW, the cited reference are:
      “2. Fiske ST: Interpersonal stratification: status, power, and subordination. In Handbook of Social Psychology. Edited by Fiske ST, Gilbert DT, Lindzey G. Wiley; 2010:941-982.”
      
      and
      
      “3. Galinsky AD, Rucker DD, Magee JC: Power: Past findings, present considerations, and future directions. In APA Handbook of Personality and Social Psychology, vol 33. Edited by Mikulincer M, Shaver PR. American Psychological Association;
      2015:421-460.
      Authoritative review of power.”
      
      So I guess Fiske considers herself an expert?
      
      It’s a whole different world (and a whole different set of values) than the world I live in.
      
      Then again, the article is from “Current Opinion in Psychology” — well, it certainly qualifies as opinion. But to me, “opinion” needs to be distinguished from “fact” — in particular, by using qualifying language and refraining from sounding so certain.
    - Andrew on July 27, 2017 3:36 PM at 3:36 pm said:
      
      P.S. I wrote:
      
      If they don’t correct it then they really are, ummm, not “lying” exactly . . . What’s the word for if you make a false statement but then refuse to correct it after the error has been pointed out to you? “Brooksing”?
      
      Maybe it is ok to just call it lying. Bret Stephens today in the New York Times writes, “The C.I.A. has not publicly corrected the record. The White House is knowingly allowing Scavino’s falsehood to stand. That’s called lying . . .”
      
      Here’s the definition of “lie” according to Merriam-Webster:
      
      1 : to make an untrue statement with intent to deceive . . .
      
      2 : to create a false or misleading impression . . .
      
      Definition 2 suggests that you can lie “by accident,” as it were. Susan Fiske created a false and misleading impression of the writings of Ulrich Schimmack and myself by stating, incorrectly, that we had implied that “the entire field [of psychology or social psychology] is inept and misguided.” But I assume she did not lie under Definition 1, because I assume she did not know her statement was untrue: in that sense, she was sloppy, not lying.
      
      But what happens when the journal refuses to correct the error? They’ve moved into the scenario described by Bret Stephens, of letting an untruth stand, which in some ways could be said to be morally equivalent to lying but does not quite follow the definition.
      
      Stephens’s example seems crystal clear—even clearer than the one about Schimmack and myself, in that the record to be inspected is just a few seconds of video rather than two blog posts—so I assume the next step will be for the government to scrub the false statement from the record entirely, and pretend it never happened. That’s another sort of lie, I guess.
    - Andrew on January 1, 2023 9:52 AM at 9:52 am said:
      
      P.S. I contacted the journal and they flat-out refused to correct the lying article that they published. Very annoying. Not a surprise, exactly, given the love that the Association for Psychological Science continues to give to clickbait junk science, but still a disappointment.
zbicyclist on July 21, 2017 2:51 PM at 2:51 pm said:

“When you tell a quantitative researcher that they made a statistical error, you often get a defensive reaction, as if you accused them of being stupid, or cheating”

For sure.

And this is a particular problem because statistics is such a wide field, and a field that is much wider than it was, say, before the advent of computerized methods made so many things possible. Nobody can really cover the field adequately. Of course, that’s the same thing in other fields like chem, bio, physics, etc. But my naive impression of statistics in, say, the 1950s is that you could pretty much cover the entire field so some degree.

So, no matter how smart you are or how much you’ve studied, there’s still stuff others know that you don’t.

Reply ↓
- Martha (Smith) on July 22, 2017 12:30 AM at 12:30 am said:
  
  +1 to last sentence.
  
  Reply ↓
- Ben Prytherch on July 22, 2017 1:18 AM at 1:18 am said:
  
  “It might be well for all of us to remember that, while differing widely in the various little bits we know, in our infinite ignorance we are all equal” – Karl Popper
  
  Reply ↓
Anoneuoid on July 25, 2017 10:25 AM at 10:25 am said:

Matt wrote:

How about the fact that physics is in a league of its own in term of the precision of its theory? You keep asking for that in medical research and social science; sorry, not going to happen. There is simply not going to be theory in these areas that have comparably specific and quantitative predictions to those in physics.

As I said, this is an excuse. Could the typical medical/psych/social researcher today ever hope to come up with something like a quantitative law given their training? Having been through the training, I doubt it. Anyway, there are tons of these, and I don’t think it is an accident that most were come up with pre-1940, if not in the early 1900s (ie pre-NHST):

https://en.wikipedia.org/wiki/Law_of_mass_action
https://www.ncbi.nlm.nih.gov/pubmed/17845298
https://en.wikipedia.org/wiki/Law_of_effect
https://www.tandfonline.com/doi/pdf/10.1080/00221309.1934.9917847
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2916857/
https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2007940/pdf/brjcancer00386-0010.pdf
https://en.wikipedia.org/wiki/Cardiac_output

The bigger problem is that researchers in those areas are not collecting the type of data we need to constrain the parameters of these models and compare between competing models. Also in some cases collecting the needed data is difficult/unethical (eg very high quality epidemiological data).

Reply ↓
- Daniel Lakeland on July 25, 2017 12:43 PM at 12:43 pm said:
  
  I totally agree with you on this. The lack of quantitative models in medicine or biology is entirely down to the lack of quantitative knowledge of the researchers. In fact, there are lots of areas these days where that’s changing. Lots of people are looking at quantitative models for things like the transport of signaling molecules and their occupation levels in cells to explain how it is that cells decide to differentiate into various tissues in development, or during repair.
  
  Sure, it’s the case the these models are going to look different from Coulomb’s law, because they are going to be statistical in nature (in the sense of describing the average/median/typical behavior of some group of things) but they’re still quantitative. If you want to know something about say inflammatory bowel disease, you should be looking for quantitative information about how certain triggers activate certain immune cells, what the network of signaling is, how large are the resident populations of immune cells, how much recruitment is there from other parts of the body through the bloodstream, what markers are there circulating in the blood. But, no. There are maybe some biologists doing this, but in clinical practice it’s all about “try this and see if it’s better than the null hypothesis” or “test for non-inferiority of this drug relative to that drug” or whatever.
  
  Reply ↓
  - Anoneuoid on July 25, 2017 4:37 PM at 4:37 pm said:
    
    If you want to know something about say inflammatory bowel disease, you should be looking for quantitative information about how certain triggers activate certain immune cells, what the network of signaling is, how large are the resident populations of immune cells, how much recruitment is there from other parts of the body through the bloodstream, what markers are there circulating in the blood. But, no. There are maybe some biologists doing this, but in clinical practice it’s all about “try this and see if it’s better than the null hypothesis” or “test for non-inferiority of this drug relative to that drug” or whatever.
    
    Yep, here is a more recent model:
    
    Because many of the reaction coefficients in Fig. 1 B are also unknown, we allocated a number of possible parameter sets to qualitatively analyze the kinetics of these reactions
    
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1366631/
    
    That was in 2005, how much work went into testing models like this by collecting the necessary data to constrain the parameters rather than doing NHST? I bet the situation has barely changed, if at all.
    
    Reply ↓
Esso on August 9, 2017 4:39 PM at 4:39 pm said:

Early childcare programs and aggressive marketing of early childcare could add many stay-at-home mothers to the labour supply. This could have a negative effect on the price of labor. It is likely some people believe they can greatly profit from cheaper labor, and these people might be ready to invest in studies that argue for their favored policies.

Not trying to ascribe dishonest motivations to anyone in particular, just speculating about some obvious possibilities.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

How does a Nobel-prize-winning economist become a victim of bog-standard selection bias?

74 thoughts on “How does a Nobel-prize-winning economist become a victim of bog-standard selection bias?”

Leave a Reply Cancel reply