Only positive reinforcement for researchers in some fields

This is Jessica. I was talking to another professor in my field recently about a talk one of us was preparing. At one point, the idea of mentioning, in a critical light, some well known recent work in the field came up, since this work had omitted to consider an important aspect of evaluation which would help make one of the points in the talk. I thought it seemed reasonable to make the comment, but my friend (who is more senior than me), ‘We can’t do that anymore. We used to be able to do that’. I immediately knew what they meant: that you can’t publicly express criticism of work done by other people these days, at least not in HCI or visualization.

What I really mean by “you can’t publicly express criticism” is not that you physically can’t or even that some people won’t appreciate it. Instead it’s more that if you do express criticism or skepticism about a published piece in a public forum outside of certain established channels, you will be subject to scrutiny and moral judgment, for being non-inclusive or “gate-keeping” or demonstrating “survivor bias.” The overall sentiment being that expressing skepticism of the quality of some piece of research out of the “proper” channels of reviewing and post-conference-presentation QA makes you somehow threatening to the field. It’s like people assume that critique cannot be helpful unless its somehow balanced with positives or provided in the context of some anonymous format or at a time when authors have prepared themselves to hear comments and will therefore not be surprised if someone says something critical. Andrew has of course commented numerous times on similar things in prior posts.

I write these views as someone who dislikes conflict and publicly bringing up issues in other people’s work. If I’m critiquing something, my style tends to involve going into detail to make it seem more nuanced and shading the critique with acknowledgement of good things to make it seem less harsh. Or if there are common issues I might write a critical paper pointing to the problems in the context of making a bigger argument so that it feels less directed at any particular authors. But I don’t think all this hedging should be so necessary. Criticism in science should be acceptable regardless of how it comes up, and you can’t imply it should go away without seeming to contradict the whole point of doing research. This has always seemed like a matter of principle to me, even back when I was getting critiqued myself as a PhD student and not liking it. So I still get surprised sometime when I realize that my attitude is unusual, at least in the areas I work in. 

One thing I really dislike is the idea that its not possible to be both an inclusive field and a field that embraces criticism. Like the only way to have the former is to suppress the latter. It’s unfortunate I guess that some fields that embrace criticism are not very diverse (say, finance or parts of econ), and that other fields that prioritize novelty and diversity in methods over critiquing what exists tend to be better on diversity, like HCI or visualization which do pretty well in terms of attracting women and other groups. 

In a different conversation with the same friend above, they mentioned how once in giving an invited seminar talk at another university, another professor we know at that university made some critical comments and my friend got into a back and forth with them about the research. My friend didn’t think much of it, but as their visit went on, got the impression that some of the PhD students and other junior scholars who had attended saw the critique and exchange between my friend and the other faculty member as embarrassing (to my friend) and inappropriate. This was surprising to my friend, who felt it was totally normal and fine to that the audience member had given blunt remarks after the talk. I had a similar experience during an online workshop a few months back, where a senior well known faculty member in the audience had multiple critical comments and questions for the keynote speaker, which I thought was a great discussion. But others seemed to view as an extreme event that bordered on inappropriate.   

Related to all this, I sometimes get the sense that many people see it as predetermined that open criticism will have more negative consequences than positive, because it will a) undermine the apparent success of the field and/or b) discourage junior scholars, especially those that bring diversity.  On the latter, I’m not how much evidence people opposed to criticism have in mind versus they can simply imagine a situation where some junior person gets discouraged. But a different way to think about it could be, It’s the responsibility of the broader field, not just the critic, if we have junior researchers fleeing in light of harsh critique. I.e., where are the support structures if all it takes is one scathing blog post? There’s sort of an “every man for himself” attitude that overlooks how much power mentors can have in supporting students who get critiqued. Similarly there’s a tendency to downplay how one person’s research getting critiqued is often less about that particular person being incompetent than it is about various ways in which methods get used or claims are made in a field that are conventional but flawed. If we viewed critique more from the standpoint of ‘we’re all in this together’ maybe it would be less threatening.

A few months ago I wrote a post on my other blog that tries to imagine what it would look like to be more open-minded about critique, e.g., by taking for granted that we are all capable of making mistakes and updating our beliefs. I would like to think it is possible to have healthy open critique. But sometimes when I sense how uncomfortable people are with even talking about critique, I wonder if I’m being naive. For all the progress I’ve seen in my field in some respects (including more diversity in background/demographics, and better application of statistical methods) I haven’t really seen attitudes on critique budge.

How did NPR’s pre-election poll get things so so so wrong? The friends/family/coworkers question

There’s been a lot of discussion of how the polls performed well in the recent midterm elections. Response rates are so low that pollsters need to do lots of adjustments to aim to approximate attitudes in the population. We don’t know that they’ll succeed so well in the future, but they get blamed for their mistakes so it’s only fair to give them credit when they succeed.

We learn more from mistakes than successes

We learn from mistakes, though, so let’s look for examples where polling went wrong. Jay Livingston shares an example:

Maybe the usual question — “Who are you going to vote for?” — is not the best way to predict election results.

The most recent episode of NPR’s Planet Money explored this question and in the end tried a less direct approach that some polls are now using. They went to the Marist College poll and got the directors to insert two questions into their polling on local House of Representatives races. The questions were:

– Who do you think will win?

– Think of all the people in your life, your friends, your family, your coworkers. Who are they going to vote for?

At the time, the direct question “Who will you vote for?” the split between Republicans and Democrats was roughly even. But these new two questions showed Republicans way ahead. On “Who will win?” the Republicans were up 10 points among registered voters and 14 points among the “definitely will vote” respondents. On the friends-and-family question, the corresponding numbers were Republicans +12 and +16.

What’s interesting about this NPR report is that, first, it came out on 4 Nov, just a few days before the election; and, second, that it was soooo far off. Sure, people were talking about a so-called red wave, but nobody—nobody—was predicting that the Republicans would win by 12 or 16 percentage points.

So this is an interesting example of a poll that anyone could tell was way off, even before the election actually occurred.

Oddly enough, NPR just reported these numbers with a straight face, without saying anything about how off they seemed to be. But there’s nothing stopping us from trying to figure out what went wrong.

What happened?

I’ll discuss this in three parts, corresponding to each of the thee survey questions:

1. “Who do you plan to vote for?”: Respondents were split 50/50 on this one, which matches the election outcomes. So nothing much to explain here: the pollsters did their best to adjust to attain a representative sample, and it seems they succeeded.

2. “Who do you think will win?”: 10-14% more people thought Republicans would win than thought Democrats would win. I’m actually surprised the gap wasn’t higher! The major media outlets were saying that the Republicans were favored, so the real question is why more than 43-45% of respondents thought the Democrats were favored. I’m guessing that these were hard-core Democrats who were answering the question as a sort of surrogate, “Who do you plan to vote for?” question, but I don’t really know. In any case, as Julia Azari and I wrote back in 2017 (see section 4 of this article), it’s not at all clear how the responses to the “Who do you think will win?” question is giving us any useful information.

3. In that 2017 article, we wrote:

In future studies, we recommend studying information about networks more directly: instead of asking voters who they think will win the election, ask them about the political attitudes of their family, friends, and neighbors.

The good news is that this is what the above-linked NPR survey did! The bad news is that it didn’t go so well: “On the friends-and-family question, the corresponding numbers were Republicans +12 and +16,” numbers which were implausible at the time and even more so once the election results came out.

So what happened? One possibility is that Republican family/friends/coworkers were more public about their political views, compared to Democratic family/friends/coworkers. So survey respondents might have had the impression that most of their contacts were Republicans, even if they weren’t. Another way things could’ve gone wrong is through averaging. If Republicans in the population on average have larger family and friend groups, and Democrats are more likely to be solo in their lives, then when you’re asked about family/friends/coworkers, you might be more likely to think of Republicans who you know, so they’d be overrepresented in this target group, even if the population as a whole is split 50/50.

My recommended next step is to break down those responses to the family/friends/coworkers question: look at the proportion of Republicans and Democrats in the social network of respondents for different ethnic groups, age groups, also compare by state, urban/rural, education, etc. And party ID of respondents.

Also it would be good to see exactly how that question was worded and what the possible responses were. When writing the above-quoted bit a few years ago, I was imagining a question such as, “Among your family and friends who are planning to vote in the election, how do you think they will vote?” and then a 6-point scale on the response: “all or almost all will vote for Republicans,” “most will vote for Republicans,” “slightly more will vote for Republicans,” “slightly more will vote for Democrats,” “most will vote for Democrats,” “all or almost all will vote for Democrats.” I might have even put a question like this on a survey at some point; I can’t remember! Anyway, I’m not committed to that question wording; I’d just like to know the wording that the NPR survey used.

In any case, I hope they look into what went wrong. It still seems to me that there could be useful information in the family/friends/coworkers question, if we could better understand what’s driving the survey response and how best to adjust the sample.

Dying children and post-publication review (rectal suppositories edition)

James Watson (this guy, not the guy who said cancer would be cured in minus 22 years) writes:

Here’s something that you may find of interest for your blog: it involves causal inference, bad studies, and post publication review!

Background: rectal artesunate suppositories were designed for the treatment of children with suspected severe malaria. They can be given by community health care workers who cannot give IV treatment (gold standard treatment). They provide temporary treatment whilst the kid gets referred to hospital. Their use is supported by (i) 1 moderately large RCT; (ii) our understanding of how severe malaria works; (iii) good pharmacological evidence that the suppositories do what they are supposed to do (kill parasites very quickly); and (iv) multiple large hospital based RCTs showing that artesunate is the best available drug.

The story: A group at Swiss TPH got a very large grant (19 million USD) to do a `deployment study’: basically look at what happened when the suppositories were rolled out in three countries in sub-Saharan Africa. This study (CARAMAL) asked the question: “Can the introduction of pre-referral QA RAS [quality-assured rectal artesunate] reduce severe malaria case fatality ratio over time under real-world operational circumstances in three distinct settings?” (see NCT03568344). But they saw increases in mortality after roll-out, not decreases! In Nigeria, mortality went up 4 fold (16.1% versus 4.2%)! In addition, the children who got the suppositories were more likely to die than those who didn’t. These results led the WHO stopping deployment of these suppositories in Africa earlier this year. This is a really serious decision which will probably result in preventable childhood death.

The authors put their findings online last year as 10 pre-prints. In July we wrote a commentary on the overall decision to stop deployment and the reported analyses in the pre-prints. The main points were:
– No pre-specified analysis plan (lots of degrees of freedom of exact comparisons to make, and what to include as baseline variables). This is unusual for a big clinical study.
– Temporal confounding, COVID-19 being one small difference in post versus pre roll-out world….!
– Confounding by indication: comparisons of who got the suppositories versus who didn’t post roll-out are probably due to health workers giving them to the sicker children.
– Mechanistic implausibility of having a massive increase in mortality within 48 hours of giving the drug compared with not giving the drug (this just doesn’t fit with our model of the disease and the pharmacology).

Unsurprisingly (?) the authors did not comment on our piece, and their main paper was published in October with no apparent changes from the pre-print version…

The now published paper (BMC Med) uses pretty strong causal language in the abstract: “pre-referral RAS had no beneficial effect on child survival in three highly malaria-endemic settings”. Given the design and data available in the study, this causal language is clearly not justified. I emailed the authors to ask them for their study protocol (some things are unclear in the paper, like how they exactly recorded who got the suppositories and whether there was a possibility of recall bias). I also wrote my main concerns as a thread on twitter.

They answered:

“We saw that you have just publicly on Twitter implied that we conducted the CARAMAL study without a study protocol nor an analysis plan. You are essentially publicly accusing us of conducting unethical and illegal research. This was less than 24 hours after sending this email requesting us to share the study protocol.

Your Tweet adds to the tendentious and poorly-informed commentary in BMJ Global Health. We fail to see how this style of interaction can lead to a constructive discussion of the issues at the heart of the CARAMAL project.

We have provided all necessary information including the study protocol and the analysis plan to a panel of international experts gathered by the WHO to conduct a thorough evidence review of the effectiveness of RAS. We look forward to their balanced assessment and opinion.”

Basically they refused to share the study protocol. They also admit to reading the previous commentary which discussed the lack of analysis plan. I didn’t accuse them of not writing a study protocol or analysis plan, but not posting it with the paper (which is a fact). Most medical journals make you post the protocol with the publication.

Why this is important: pushback from various people has made the WHO convene an expert group to see whether the moratorium on rectal artesunate deployment was justified. They will be making a decision in February. The underlying study that caused this mess is poorly thought out and poorly analysed. It’s quite urgent that this gets reversed.

This seems pretty wack that they say they have a study protocol and an analysis plan (perhaps not a preanalysis plan, though) but they’re not making it public. What’s the big secret?

To put it another way, if they want to keep key parts of their research secret until the panel of international experts gives “their balanced assessment and opinion,” fine, but then what does it mean for them to be publishing those strong causal claims? Once the big claims have been published, I can’t see a good reason for keeping the protocol and analysis plans secret.

Also what about the issues of temporal confounding, confounding by indication, and mechanistic implausibility? These all seem like a big deal. It always seems suspicious when researchers get all defensive about post-publication criticism and then don’t even bother to address the key concerns. Kids are dying here, and they’re all upset about the “style of interaction”???

Or maybe there’s more to the story. The authors of the published article or anyone else should feel free to provide additional background in the comments. If anyone has the study protocol and analysis plan (or pre-analysis plan), feel free to post that too!

What continues to stun me is how something can be clear and unambiguous, and it still takes years or even decades to resolve

OK, remember Wile E. Coyote when he makes the all-too-common mistake of stepping off a cliff, or standing on a cliff edge that breaks? He stands there in the air, unsupported by anything, until he realizes what’s happening—and only then does he fall.

We see this a lot with science scandals and political scandals: the problem is there for ever and ever, people point it out, they holler and scream, but that damn coyote just stands there, oblivious.

One example we talked about a couple years ago was the scam medical-device company Theranos, which fell apart around 2015, nearly a decade after they faked a test in 2006, causing one of its chief executives to leave.

And Cornell’s Food and Brand Lab (“Pizzagate”), which fell apart around 2017, years after their work had been publicly criticized several times, with the lab never offering any reasonable rebuttal to these points.

Let me be clear here. It’s not a surprise to me that people whose bad research has been called out will continue to publish bad research: there are lots of journals out there that are looking for publishable articles, so if you have the knack for writing an article in that publishable style, you can keep getting published. And then people will read your article and take it seriously, because the default mode of reading a published article is to take it seriously, so you can get citations etc. Similarly, it’s no surprise that companies that are in proximity to rich people or some other source of suckers can get funding even if they have a track record of lies.

So that’s no surprise. What surprises me is the high-profile cases. Theranos didn’t just pull in some money; it got a lot of publicity. The Food and Brand Lab people were all over the major media. In both cases, it took many years for the crash to come.

OK, here are a couple more examples that came up recently. I read them in Defector, the sports website that never sticks to sports:

Meet Richard Fritz, America’s Most Unelectable Elected Official

Cops Are Still Fainting When They Touch Fentanyl

Both of these are scandals that have been going on for years—and people have been screaming about them for years—but they just keep on going. The perpetrators have that chutzpah which is one of the strongest forces of nature.

And then, the most horrible story. I just read Nickel Boys—I’ll read anything by Colson Whitehead, as long as it’s not about poker—and then I went to wikipedia to read up on the true story that it’s based on. OK, here’s the deal. The Florida School for Boys in Marianna, Florida, had all these scandals, starting shortly after it was founded in 1900 (“In 1903, an inspection reported that children at the school were commonly kept in leg irons. . . . A fire in a dormitory at the school in 1914 killed six students and two staff members. . . . A 13-year-old boy sent to the school in 1934 for ‘trespassing’ died 38 days after arriving there. . . . there were 81 school-related deaths of students from 1911 to 1973 . . . In 1968, Florida Governor Claude Kirk said, after a visit to the school where he found overcrowding and poor conditions, that ‘somebody should have blown the whistle a long time ago.’ . . .”) OK, horrible things going on for nearly 70 years, all happening in a racist environment (the “School” separately housed whites and blacks), then they finally blew the whistle—1968 was way too late, but that was a time of reform.

But, no, it didn’t stop in 1968. The wikipedia entry continues: “In 1982, an inspection revealed that boys at the school were “hogtied and kept in isolation for weeks at a time”. The ACLU filed a lawsuit over this and similar mistreatment at a total of three juvenile facilities in Florida. . . . In 1985, the media reported that young ex-students of the school, sentenced to jail terms for crimes committed at Dozier, had subsequently been the victims of torture by guards at the Jackson County jail. The prison guards typically handcuffed the teenagers and hanged them from the bars of their cells, sometimes for over an hour. The guards said their superiors approved the practice and that it was routine. . . . In 1994, the school was placed under the management of the newly created Florida Department of Juvenile Justice . . . On September 16, 1998, a resident of the school lost his right arm in a washing machine. . . . In April 2007, the acting superintendent of the school and one other employee were fired following allegations of abuse of inmates.”

That’s 2007—over 100 years after the first scandal (the inspection in 1903 with the leg irons), nearly 30 years after the governor talked about blowing the whistle, and over 20 years after the reports of torture. And then, “In late 2009, the school failed its annual inspection. . . .” The “School” was finally closed in 2011.

Everything took so long. It “failed its annual inspection” in 2009! What did they think about all the annual inspections that came before? At some point, this history should case doubts upon the inspection process itself, no?

How does it happen?

My point here is not to stir up indignation about a past scandal, which is part of the whole “New Jim Crow” thing that’s been doing its part to destroy our country for a long time. This is just the most striking example I’ve come across of a general phenomenon of the truth being out there, available to all, but nothing happens. This is not a new thing—consider, for example, the Dreyfus affair, where it was clear that the evidence was fabricated, but this didn’t stop it from being a live issue for years and years after that. But, that was the 1800s! We should realize that this sort of thing continues to happen today, in so many different domains, as discussed above.

Important questions

This post is by Lizzie.

I have been robbed twice in my life: first as a grad student when I was traveling and most of my belongings were in a rental car, and second when some teenagers crow-barred the apartment I was staying in. The first time I was robbed I learned a some useful information that made getting robbed the second time easier: when you tell people you’ve been robbed they often say the most useless things. People who purported to be my friends would reply immediately with, ‘what type of car was it?’ or ‘where exactly did you park?’ While there are definitely predictors that increased the probability of me being robbed, I didn’t fully see the point of these questions.

Eventually my Mum or some other soul wiser than me explained this to me. They’re asking these questions to construct a model of the world in which they don’t get robbed. I sort of get this: it can be traumatic to have most of your worldly possessions stolen. I hope they didn’t also know that it can be painful to have your friends be so lame and self-centered.

I was trying to explain something similar the other night when a friend and colleague caught me after a group zoom call to ask me something. She had met someone who was a graduate student when I interviewed at their graduate institution for a faculty job. I always enjoyed meeting grad students on interviews; they were so much fun, asked interesting questions and made me hopeful about the world. I interviewed at many places and have run into some of these students since and always really enjoy it, so I thought this was where the conversation was going. No.

The student had recounted that at the end of my job talk a senior white faculty member got up and made an obnoxious and gendered comment that in no way related to my science.

I don’t remember this. I suspect I don’t because it was the least of my troubles back then. I remembered the guy though. In our one-on-one interview he asked me if I was planning to freeze my eggs. And in case you as a reader want to chock this up to a one-off, my notes from asking a female faculty member in the department are: “Everyone has had totally weird interactions with him, when she interviewed he told her all about his relationship with his wife. Someone else interviewing told him to **** off. And nothing slows him down, he’s gotten worse with age.”

What I told my colleague on the Zoom call, though, was what hurt was how people replied to it. How many people tried to brush it off in one way or another (even when I couched it in my reality — I liked this guy in many ways and people are multifaceted and complex, they are rarely all good or evil).

I was trying to recount this an hour later over beers but I didn’t get to finish the thought. One woman worked to come up with witty quips that I should have said back. And the one guy said, ‘oh, I know him. He likes to shock people. You know how you should handle this guy is ….’ which led onward to ‘I wanted to invite him a few years back and my colleague said ‘I dunno’ so then we invited him and someone known to be even crazier! And let me tell you about him. He got into a physical altercation with someone ….’

So, I just wanted to make a public service announcement to folks who want to reply in similar ways to people telling you about harassment they received — consider keeping these thoughts to yourself. And then maybe take some time to figure out why you so desperately jump at saying them.

In other news, look what I saw on my hike on Wednesday!

“Rankings offer no sense of scale”

Laura “Namerology” Wattenberg writes:

Rankings offer no sense of scale. There is always a #1 name, even if that name is only a fraction as popular as #1s of past eras. That’s a problem when scale itself—the changing level of consensus—is part of the story.

In the past, popularity charts served as a solid snapshot of everyday name style, especially for boys. America’s top 10 names accounted for about a third of all boys born. But the popularity curve has flattened. Today’s top 10 accounts just one-fourteenth of American boys.

Here’s how we put it in Regression and Other Stories:

Wattenberg summarizes:

America’s top-10 list of popular baby names always looks comforting. Or, depending on your perspective, boring. The names change slowly, and their style remains mostly traditional. The overall impression is of gradual evolution and cultural continuity. And it’s a lie.

Change in 21st-century naming has been anything but gradual. It’s been revolutionary, a splintering of style that breaks with the past in dramatic ways. But the top of the popularity charts cannot tell this story. A top-10 list is a perfect instrument to create the illusion of stability in the midst of a cyclone. . . .

By consistently reporting name data in scaleless rankings, news reports obscure the fact that the top 10 is losing relevance as a portrait of how we name children.

Last year, an incredible 31,538 different names registered in U.S. baby name stats. Yet 8% of babies received a name that was too rare to even register in the count—more than received a top-10 name. That’s the real story of American names today. But as long as we continue to focus on the top of the popularity charts, we’ll continue to see only the most traditional sliver of our increasingly freewheeling name culture.

Good point, as sociology, statistics, and statistical communication.

Distrust in science

Gary Smith is coming out with a new book, “Distrust: Big Data, Data Torturing, and the Assault on Science.” He has a lot of examples of overblown claims in science—some of these have appeared on this blog, and Smith takes pretty much the same position that I take on these things, so I won’t talk about that part further.

Rather, I want to talk about the big picture that Smith paints, which is the idea that science is very important to our lives, and bad science degrades that trust. Ironically, I think this sort of attitude is also behind some of the anti-reform movement by leading academics: they think science is just wonderful and they’d prefer if people just kept quiet about scientific errors or chalked things up to “the self-correcting nature of science.” Smith and I are more in the clear-the-rotten-apples-out-of-the-barrel camp.

Smith has a broader perspective than just talking about junk science. He also talks about misplaced technology enthusiasm, from bitcoin-as-savior to chatbots-as-AI. One thing he doesn’t really get into is the political angle, the idea that bad actors are sowing distrust as a tactic to reduce respect for serious science. Familiar examples are industry-funded junk science on cigarettes and climate change, which then gets picked up by ideologues. And then there are the HIV/AIDS denialists, which seems more like contrarianism, and the covid denialists, which has a political angle. I’m not saying that Smith should’ve discussed all that—there’s a limit to how much can fit in one book—; I’m just bringing it up to emphasize that distrust in science is not just an unfortunate byproduct of frauds like the disgraced primatologist and confused people like the ESP guy, it’s also something that a lot of people are fostering on purpose.

When we write about himmicanes and forking paths and multilevel modeling and all the rest, that’s just a small part of the story. And one of the challenges is that we can’t simply root for “science” as an institution. We root for the good stuff but not the bad stuff.

Wandering through Sforza castle

Weekend before last I spent a day in Milan to see an old colleague (I am on leave in Zurich just now so miraculous things like a dayhop to Milan are feasible). He used to be a research technician in my lab where he did the classical ecology lab tasks of things like identifying and tagging trees in forests, making thousands of observations of leaf unfolding in a growth chamber experiment (on paper! Though that part was neither my idea nor his) and nudged the lab forward through working on automating photo capture of leaf unfolding. The resulting images and time lapse videos were gorgeous, but they didn’t change how I did science.

But since then my Bayesian models have become far more generative—many thanks to my Stan collaborators–and I have started to realize some sad hard truths about my data and my science. The first is that you need a lot more data than I often have to fit some of the models I think underlie the processes I am studying. I work on winegrapes because they have way more data than other systems I am interested in (such as forests, which critically store carbon) and when I don’t have enough data to fit a temperature response curve for winegrapes I can skip trying it on more `natural systems.’ The second is that we also need way better data. A stats-department colleague said to me this year, ‘it’s not like you don’t have a lot of data, it’s like the data you have are quarters [and you need higher denominations].’ By later in the day when he next picked up the metaphor, but was now calling my data pennies. (Sigh.)

He’s not wrong. Ecology has a history of valuing what you can learn staring intently at a backyards pond or an artificial pool near Palermo full of water bugs. We need to understand the ‘natural history’ well to understand systems. Very true, but I sometimes wonder how we advance. A massive NSF project to collect lots of large-scale, NEON, has not revolutionized much. For my own research, I think we need to cover greater temperature variation — and work harder to know what happens at the extremes (you have to wait a long time for plants to do something at 5C, but maybe we need to wait for that; at the other end many warming chambers don’t function about 30 C well, but that’s well below what’s too hot for most plants), we need more replicates and we need to collect better data, at finer scales.

Which is part of why I went to Milan. To pick my colleague’s brain about how my lab can best break out. He’s now building up new FabLab for his company’s new North American headquarters in Chicago, and had some useful ideas.

At the end of our meeting he asked if he should go to grad school, which struck me. It struck me for a lot of reasons, one is that some undergrads in my lab, who I think could bridge new technologies to ecology, are getting scooped up by start-ups digitizing agriculture, and putting their undergrad degrees on a possibly never-ending pause.

The same weekend I saw a colleague from my long-lost NCEAS days. In between nerd-crushing/raving about our colleague Jim Regetz, we discussed the apparent disconnect between the number of PhDs being awarded (erm, not sure about that verb) and the number of job openings where a PhD is critical. Some of his former postdocs were starting a new company, trying to make theoretical ecological models more useful to the point of underpinning a for-profit company.

I imagine academia often feels to be falling behind, but this weekend I felt it a little more acutely. We’re supposed to have the freedom and metaphorical space to be racing ahead. But it doesn’t feel that way when I can easily see why students in my lab would ‘pause’ undergrad to race around North America to improve how we harvest wheat, when we churn out publications faster and faster at the expense of the time it takes to really advance science (it’s so much quicker to grab a p-value than to develop a model with parameters you care about, then step back and gape at that the uncertainty around the estimates; p-values are so happy to hide your meaningful parameters and their uncertainty from you), or similarly churn out PhDs without a clear idea of their job prospects (hello Canada’s `HQP’). I am not so worried about folks in my lab, we train strongly in computational methods and how to design and answer useful questions—skills industry and beyond needs, but I worry about the future, and I could certainly train in this area better if there was more pressure, recognition, and support for it in ecology.

On the good news side, I enjoyed my take-out pizza from Milan for two glorious dinners!

Best letter-to-the-editor ever (Tocqueville and Algeria)

From the London Review of Books, 7 July 2022:

William Davies writes that Alexis de Tocqueville ‘paid little attention to the French colonisation of Algeria’. In fact, Tocqueville was regarded as the National Assembly’s leading expert on Algeria and made two visits to the country in 1841 and 1846, during the army’s counterinsurgency against a rebellion led by the Emir Abdel-Kader. He was also an impassioned advocate for Algeria’s violent colonisation. In his 1841 report on Algeria, he wrote: ‘I believe that the right of war authorises us to ravage the country and that we must do it, either by destroying harvests during the harvest season, or year-round by making those rapid incursions called razzias, whose purpose is to seize men or herds.’ While many of his fellow liberals ‘find it wrong that we burn harvests, that we empty silos, and finally that we seize unarmed men, women and children’, he felt that ‘these, in my view, are unfortunate necessities, but ones to which any people who want to wage war on the Arabs are obliged to submit.’ He advocated total war to defeat the insurrection, followed by the creation of a settler-colony, because ‘until we have a European population in Algeria, we shall never establish ourselves there [in Africa] but shall remain camped on the African coast. Colonisation and war, therefore, must proceed together.’ His brazen imperialism went hand in hand with his liberalism.

Adam Shatz

Wow. Davies really got that one wrong!

History is full of twists of turns, and we shouldn’t be too harsh on Davies for making that one mistake in this long review—he was writing about a topic he knew nothing about, and he didn’t think to google *Tocqueville Algeria*. Understandable: you can’t get around to googling everything.

I do think, though, that it would be gracious of him to respond to that letter with a brief, “Sorry, my bad,” and for the LRB to publish this response. Just publishing the letter without an acknowledgment . . . that’s the kind of thing you’d do when there’s a difference of opinion, not a blatant factual error.

Last post on that $100,000 Uber paper

We recently had two posts (here and here) on the controversy involving Alan Krueger, the economist who was paid $100,000 in 2015 by the taxi company Uber to coauthor a paper for them. As I wrote, I’ve done lots of business consulting myself, so I don’t really see any general problem here, but, at this point it seems that nobody really trusts that particular article anyway. I guess what Uber got from the payment was a short-term benefit to its reputation: the article came out at a time when much of the economics profession was in love with Uber, and this paper was just one more part of it.

The discussion was kicked off a couple of months ago by an economist who sent me an email on the topic and who wishes to remain anonymous. After seeing the two posts, my correspondent writes:

I think I largely agree with you that almost everyone is now discounting the paper, pretty much completely. And after I read what the Uber executives were doing (or at least thinking of doing), it’s hard not to come to the same conclusion. Even if nothing worst was done, the possibility that the some data might not be provided means that there’s an inherent bias that falls right there with the typical issues you’ve discussed before of forking paths, except instead of happening due to choices made by the researcher, it’s happening at the data provider and in a even more blatantly manipulative way. Barring some new revelation from Hall or the other economists engaged by Uber that they somehow managed to stop Uber from acting this way (or that Uber somehow convinces us that they didn’t), scepticism is more than justified.

Note, that I’m emphasising here the core issue of the Uber executives and administrative staff much more so than the researchers’ actions, to me that’s where the biggest problem lies.

There’s one last issue that I feel remains and I won’t be surprised if you’re still thinking of addressing it, which is how to do we move forward from all this? The inherent tension of the costs and benefits of using proprietary data are now even more sharply delimitated now. Should researchers no longer accept payment from companies they engage with if they want to publish in academic journals and working paper series? I know that you’ve written that you’ve received payment in such situations in the past, does that make you feel you need to revaluate things? Essentially, I’m asking what actions can researchers take that would give themselves (and their peers) the assurance that companies are not trying to use them by omitting data and the like?

Don’t get me wrong, the potential benefits are high from using this sort data and not just from a getting-published-in-a-great-journal perspective. I’ve come up with several research ideas that would only really be possible with such data and I would leap at such opportunities; to be honest, I have a big research project that, in my esteem is quite important, that’s currently stalled precisely because of this issue! So, I would certainly not advocate for ceasing these types of collaborations. But it does seem to me that to best benefit society and science, there needs to be a discussion about what’s the best way forward…

Anyway, I’ve clearly written too much as it stands, I have teaching prep and research to do! I’ve really enjoyed seeing the debates and I’m happy that these questions are being addressed openly. We, as in economists, could use some more careful thinking about all of this…

My response: I think open display of funding is a good idea. Some journals expect this sort of list and some don’t. I include a list of past and present sponsors here. Also, people have lots of non-financial conflicts of interest relating to friends, political ideologies, fear of pissing off the wrong people, etc. I bring these up not to say that conflict-of-interest declarations are a bad idea, just to say that things are complicated.

I guess that right now I assume that undeclared conflicts of one sort or another are inevitable in many cases. In the Uber example, the conflict was declared right away on the front page of the article, so that was pretty clear, and I think the only reason the news media didn’t make a bigger deal of the conflict at the time was that the article was telling them something they wanted to hear.

The other issue brought up by my correspondent was proprietary data. I’m not sure what to say here except that lots of data isn’t proprietary or confidential at all, but researchers still aren’t sharing. An extreme case was that disgraced primatologist who wouldn’t even share his videotapes with his own research collaborators. Another example would be those surveys you sometimes here about where they don’t tell you how they did the sampling or what were the questions being asked (here’s an example).

But, yeah, if as a reader of the article you don’t have full access to the data or the data-collection mechanism, then that limits the trust you have in the results, whether or not the funding source has been clarified.

Newton’s Third Law of Reputations

This news article by Tiffany Hsu explains how the big bucks earned by Matt Damon, Larry David, LeBron James, etc., from, etc., did not come for free. These celebs are now paying in terms of their reputation.

That’s all fine. After all, what’s the point of reputation if you can’t convert it to something else—in this case, more money for people who are already unimaginably rich—actually, my guess is that the reason these celebs were endorsing crypto is not so much for the money but because they felt they were getting in on the ground floor of the next big thing, i.e. they were more conned-upon than conning, although conned-upon in a very mild way as I guess they made some $$$ from it all—, and a positive social benefit to these stars being considered clueless at best and fraudsters at worst is that maybe for awhile we’ll be spared the advice of Matt Demon, Larry David, LeBron James, etc., on politics, culture, health, etc. So good news all around: these stars made money, and we’ll be less subject to their influence.

The episode reminded me of what’s been happening with the medical journal Lancet over the years: they keep lending their reputation to fraudulent or politically-slanted research, and now it’s reached the point where when we see that something’s published in Lancet, we get a little bit suspicious. Not completely so, as they publish lots of good stuff too, but a little bit.

Or Dr. Oz, who cashed in on the reputation of the medical establishment (and Columbia University!) to get money and fame. When it turns sour, that reduces the future value of the “Dr.” label and the Ivy League connection. On the other hand, if he gets elected to the Senate, then maybe the presidency, then eventually is elected king of the world, then maybe they’ll change the rules on who gets to be called Dr.: maybe it will be an absolut requirement that if you want the “doctor” label you have to endorse at least two fraudulent cures.

Or the Wall Street Journal, which published a regular column by someone associated with fraudulent research?

Reputation is a two-way street.

Cheating in sports vs. cheating in journalism vs. cheating in science

Sports cheating has been in the news lately. Nothing about the Astros, but the chess-cheating scandal that people keep talking about—or, at least, people keep sending me emails asking me to blog about it—and the cheating scandals in poker and fishing. All of this, though, is nothing compared to the juiced elephant in the room: the drug-assisted home run totals of 1998-2001, which kept coming up during the past few months as Aaron Judge approached and then eventually reached the record-breaking total of 62 home runs during a season.

On this blog we haven’t talked much about cheating in sports (there was this post, though, and also something a few years back about one of those runners who wasn’t really finishing the races), but we’ve occasionally talked about cheating in journalism (for example here, here, here, and here—hey, those last two are about cheating in journalism about chess!), and we’ve talked lots and lots about cheating in science.

So this got me thinking: What are the differences between cheating in sports, journalism, and science?

1. The biggest difference that I see is that in sports, when you cheat, you’re actually doing what you claim to do, you’re just doing it using an unauthorized method. With cheating in journalism and science, typically you’re not doing what you claimed.

Let me put it this way: Barry Bonds may have juiced, but he really did hit 7 zillion home runs. Lance Armstrong doped, but he really did pedal around France faster than anyone else. Jose Altuve really did hit the ball out of the park. Stockfish-aided or not, that dude really did checkmate the other dude’s king. Etc. The only cases I can think of, where the cheaters didn’t actually do what they claimed to do, are the Minecraft guy, Rosie Ruiz, and those guys who did a “Mark Twain” on their fish. Usually, what sports cheaters do is use unapproved methods to achieve real ends.

But when journalism cheaters cheat, the usual way they do it is by making stuff up. That is, they put things in the newspaper that didn’t really happen. The problem with Stephen Glass or Janet Cooke or Jonah Lehrer is not that they achieved drug-enhanced scoops or even that they broke some laws in order to break some stories. No, the problem was that they reported things that weren’t true. I’m not saying that journalism cheats are worse than sports cheats, just that it’s a different thing. Sometimes cheating writers cheat by copying others’ work without attribution, and that alone doesn’t necessarily lead to falsehoods getting published, but it often does, which makes sense: once you start copying without attribution, it becomes harder for readers to track down your sources and find your errors, which in turn makes it easier to be sloppy and reduces the incentives for accuracy.

When scientists cheat, sometimes it’s by just making things up, or presenting claims with no empirical support—for example, there’s no evidence that the Irrationality guy ever had that custom-made shredder, or that the Pizzagate guy ever really ran a “masterpiece” of an experiment with a bottomless soup bowl or had people lift an 80-pound rock, or that Mary Rosh ever did that survey. Other times they just say things that aren’t true, for example describing a 3-day study as “long-term”. In that latter case you might say that the scientist in question is just an idiot, not a cheater—but, ultimately, I do think it’s a form of cheating to publish a scientific paper with a title that doesn’t describe its contents.

But I think the main why scientists cheat is by being loose enough with their reasoning that they can make strong claims that aren’t supported by the data. Is this “cheating,” exactly? I’m not sure. Take something like that ESP guy or the beauty-and-sex-ratio guy who manage to find statistical methods that give them the answers they want. At some level, the boundary between incompetence and cheating doesn’t really matter; recall Clarke’s Law.

The real point here, though, is that, whatever you want to call it, the problem with bad science is that it comes up with false or unsupported claims. To put it another way: it’s not that Mark Hauser or whoever is taking some drugs that allow him to make a discovery that nobody else could make; the problem is that he’s claiming something’s a discovery but it isn’t. To put it yet another way: there is no perpetual motion machine.

The scientific analogy to sports cheating would be something like . . . Scientist B breaks into Scientist A’s lab, steals his compounds, and uses them to make a big discovery. Or, Scientist X cuts corners by using some forbidden technique, for example violating some rule regarding safe disposal of chemical waste, and this allows him to work faster and make some discovery. But I don’t get a sense that this happens much, or at least I don’t really hear about it. There was the Robert Gallo story, but even there the outcome was not a new discovery, it was just a matter of credit.

And the journalistic analogy to sports cheating would be something like that hacked phone scandal in Britain a few years back . . . OK, I guess that does happen sometimes. But my guess is that the kinds of journalists who’d hack phones are also the kind of journalists who’d make stuff up or suppress key parts of a story or otherwise manipulate evidence in a way to mislead. In which case, again, they can end up publishing something that didn’t happen, or polluting the scientific and popular literature.

2. Another difference is that sports have a more clearly-defined goal than journalism or science. An extreme example is bicycle racing: if the top cyclists are doping and you want to compete on their level, then you have to dope also; there’s simply no other option. But in journalism, no matter how successful Mike Barnicle was, other journalists didn’t have to fabricate to keep up with him. There are enough true stories to report, that honest journalists can compete. Yes, restricting yourself to the truth can put you at a disadvantage, but it doesn’t crowd you out entirely. Similarly, if you’re a social scientist who’s not willing to fabricate surveys or report hyped-up conclusions based on forking paths, yes, your job is a bit harder, but you can still survive in the publication jungle. There are enough paths to success that cheating is not a necessity, even if it’s a viable option.

3. The main similarity I see among misbehavior in sports, journalism, and science is that the boundary between cheating and legitimate behavior is blurry. When “everybody does it,” is it cheating? With science there’s also the unclear distinction between cheating and simple incompetence—with the twist that incompetence at scientific reasoning could represent a sort of super-competence at scientific self-promotion. Only a fool would say that the replication rate in psychology is “statistically indistinguishable from 100%“—but being that sort of fool can be a step toward success in our Ted/Freakonomics/NPR media environment. You’d think that professional athletes would be more aware of what drugs they put in their bodies than scientists would be aware of what numbers they put into their t-tests, but sports figures have sometimes claimed that they took banned drugs without their knowledge. The point is that a lot is happening at once, and there are people who will do what it takes to win.

4. Finally, it can be entertaining to talk about cheating in science, but as I’ve said before, I think the much bigger problem is scientists who are not trying to cheat but are just using bad methods with noisy data. Indeed, the focus on cheaters can let incompetent but sincere scientists off the hook. Recall our discussion from a few years ago, The flashy crooks get the headlines, but the bigger problem is everyday routine bad science done by non-crooks. The flashy crooks get the headlines, but the bigger problem is everyday routine bad science done by non-crooks. Similarly, with journalism, I’d say the bigger problem is not the fabricators so much as the everyday corruption of slanted journalism, and public relations presented in journalistic form. To me, the biggest concern with journalistic cheating is not so much the cases of fraud as much as when the establishment closes ranks to defend the fraudster, just as in academia there’s no real mechanism to do anything about bad science.

Cheating in sports feels different, maybe in part because a sport is defined by its rules in a way that we would not say about journalism or science.

P.S. After posting the above, I got to thinking about cheating in business, politics, and war, which seem to me to have a different flavor than cheating in sports, journalism, or science. I have personal experience in sports, journalism, and science, but little to no experience in business, politics, and war. So I’m just speculating, but here goes:
To me, what’s characteristic about cheating in business, politics, and war is that some flexible line is pushed to the breaking point. For example, losing candidates will often try to sow doubt about the legitimacy of an election, but they rarely take it to the next level and get on the phone with election officials and demand they add votes to their total. Similarly with business cheating such as creative accounting, dumping of waste, etc.: it’s standard practice to work at the edge of what’s acceptable, but cheaters such as the Theranos gang go beyond hype to flat-out lying. Same thing for war crimes: there’s no sharp line, and cheating or violation arises when armies go far beyond what is currently considered standard behavior. This all seems different than cheating in sports, journalism, or science, all of which are more clearly defined relative to objective truth.

I think there’s more to be said on all this.

How should Bayesian methods for the social sciences differ from Bayesian methods for the physical sciences? (my talk at the Bayesian Methods for the Social Sciences workshop, 21 Oct 2022, Paris)

The title of the workshop is Bayesian Methods for the Social Sciences, so it makes sense to ask what’s so special about the social sciences.

At first I was thinking of comparing the social sciences to the natural sciences, but I think that social sciences such as political science, economics, and sociology have a lot in common with biological sciences and medicine, along with in-between sciences such as psychology: all of these are characterized by high variability, measurement challenges, and difficulty in matching scientific theories to mathematical or statistical models.

So I decided that the more interesting comparison is to the physical sciences, where variability tends to be lower, or at least better behaved (for example, shot noise or Brownian motion). In the social sciences, statistical models—Bayesian or otherwise—have a lot more subjectivity, a lot more researcher degrees of freedom. In theory, Bayesian inference should work for any problem, but it has a different flavor when our models can be way off and there can be big gaps between actual measurements and the goals of measurements.

So I think there will be lots to say on this topic. I’m hoping the conf will be in French so that I’m forced to speak slowly.

“Depressingly unbothered”

I just finished listening to the Trojan Horse Affair podcast, and . . . ok, you might have heard about it already, it’s the story of a ridiculous hoax leading to a horrible miscarriage of justice . . . I agree with Brian Reed, the co-host of the show, when he says this near the end of the final episode:

Rarely is there one big revelation that undoes years of misinformation and untruth. Most decent journalism is an exercise in incremental understanding. The Trojan Horse letter though, even with my [Reed’s] tempered expectations, I was surprised by how willing people have been to let it stand unchallenged. People are depressingly unbothered that this harmful myth about Muslims persists.

This reminds me of something we’ve discussed before, that an important aspect of being a scientist is the capacity for being upset. We learn so much through the recognition and resolution of anomalies.

Like Reed, I’m upset by people not getting upset by clear anomalies. This annoyed me when New York Times columnist David Brooks promoted anti-Jewish propaganda and nobody seemed to care. (I wasn’t saying Brooks should be fired or fined or anything like that, just that the newspaper should run a goddam correction notice.) And it annoys me when the British national and local government promotes anti-Muslim propaganda and nobody in charge seems to care. I guess I can also say that I’m also bothered when Fox News pushes lies, but that’s a little different because they’re just in the propaganda business. We get worked up in a different way about Brooks and the U.K. government because it doesn’t seem like their original goal is to lie; rather, they act as a sort of flypaper, attracting stories that fit their preconceptions and then sticking with them even after they’ve been refuted.

Anyway, to return to the title of this post: in addition to being bothered by lies, I’m bothered by how unbothered people are about them.

Quantitative science is (indirectly) relevant to the real world, also some comments on a book called The Case Against Education

Joe Campbell points to this post by economist Bryan Caplan, who writes:

The most painful part of writing The Case Against Education was calculating the return to education. I spent fifteen months working on the spreadsheets. I came up with the baseline case, did scores of “variations on a theme,” noticed a small mistake or blind alley, then started over. . . . About half a dozen friends gave up whole days of their lives to sit next to me while I gave them a guided tour of the reasoning behind my number-crunching. . . . When the book finally came out, I published final versions of all the spreadsheets . . .

Now guess what? Since the 2018 publication of The Case Against Education, precisely zero people have emailed me about those spreadsheets. . . . Don’t get me wrong; The Case Against Education drew plenty of criticism. Almost none of it, however, was quantitative. . . .

It’s hard to avoid a disheartening conclusion: Quantitative social science is barely relevant in the real world – and almost every social scientist covertly agrees. The complex math that researchers use is disposable. You deploy it to get a publication, then move on with your career. When it comes time to give policy advice, the math is AWOL. If you’re lucky, researchers default to common sense. Otherwise, they go with their ideology and status-quo bias, using the latest prestigious papers as fig leaves.

Regarding the specifics, I suspect that commenter Andrew (no relation) has a point when he responded:

You didn’t waste your time. If you had made your arguments without the spreadsheets—just guesstimating & eyeballing, you would’ve gotten quantitative criticism. A man who successfully deters burglars didn’t waste his money on a security system just because it never got used.

But then there’s the general question about quantitative social science. I actually wrote a post on this topic last year, The social sciences are useless. So why do we study them? Here’s a good reason. Here was my summary:

The utilitarian motivation for the natural sciences is that can make us healthier, happier, and more comfortable. The utilitarian motivation for the social sciences is they can protect us from bad social-science reasoning. It’s a lesser thing, but that’s what we’ve got, and it’s not nothing.

That post stirred some people up, as it sounded like I was making some techbro-type argument that society didn’t matter. But I wasn’t saying that society was useless, I was saying that social science was useless, at least relative to the natural sciences. Some social science research is really cool, but it’s nothing compared to natural-science breakthroughs such as transistors, plastics, vaccines, etc.

Anyway, my point is that quantitative social science has value in that it can displace empty default social science arguments. Caplan is disappointed that people didn’t engage with his spreadsheets, but I think that’s partly because he was presenting his ideas in book form. My colleagues and I had a similar experience with our Red State Blue State book a few years ago: our general point got out there, but people didn’t seem to really engage with the details. We had lots of quantitative analyses in there, but it was a book, so people weren’t expecting to engage in that way. Frustrating, but it would be a mistake to generalize from that experience to all of social science. If you want people to engage with your spreadsheets, I think you’re better off publishing an article rather than a book.

As a separate matter, Caplan’s “move on with your career” statement is all too true, but that’s a separate issue. Biology, physics, electrical engineering, etc., are all undeniably useful, but researchers in these fields also move on with their careers, etc. That just is telling us that research doesn’t have 100% efficiency, which is a product of the decentralized system that we have. It’s not like WW2 where the government was assigning people to projects.

Comments on The Case Against Education

This discussion reminded me that six years ago Caplan sent me a draft of his book, and I sent him comments. I might as well share them here:

1. Your intro is fine, it’s good to tell the reader where you’re coming from. But . . . the way it’s framed, it looks a bit like the “professors are pampered” attack on higher education. I don’t think this is the tack you want to be taking, for two reasons: First, most teaching jobs are not like yours: most teaching jobs are at the elementary or secondary level, and even at the college level, much of the teaching is done by adjuncts. So, while your presentation of _your_ experience is valid, it’s misleading if it is taken as a description of the education system in general. Second—and I know you’re aware of this too—if education were useful, there’d be no good reason to complain that some of its practitioners have good working conditions. Again, this does not affect your main argument but I think you want to avoid sounding like the proudly-overpaid guy discussed here:

This comes up again in your next chapter where you say you have very few skills and that “The stereotype of the head-in-the-clouds Ivory Tower professor is funny because it’s true.” Ummm, I don’t know about that. The stereotype of the head-in-the-clouds Ivory Tower professor is not so true, in the statistical sense. The better stereotype might be that adjunct working 5 jobs.

2. You write, “Junior high and high schools add higher mathematics, classic literature, and foreign languages – vital for a handful of budding scientists, authors, and translators, irrelevant for everyone else.” This seems pretty extreme. One point of teaching math—even the “higher mathematics” that is taught in high school—is to give people the opportunity to find out that they are “budding scientists” or even budding accountants. As to “authors,” millions of people are authors: you’ve heard of blogs, right? It can useful to understand how sentences, paragraphs, and chapters are put together, even if you’re not planning to be Joyce Carol Oates. As to foreign languages: millions of people speak multiple languages, it’s a way of understand the world that I think is very valuable. If _you_ want to say that you’re happy only speaking one language, or that many other people are happy speaking just one language, that’s fine—but I think it’s a real plus to give kids the opportunity to learn to speak and read in other languages. Now, at this point you might argue that most education in math, literature, and foreign language is crappy—that’s a case you can make, but I think you’re way overdoing it my minimizing the value of these subjects.

3. Regarding signaling: Suppose I take a biology course at a good college and get an A, but I don’t go into biology. Still, the A contributes to my GPA and to my graduation from the good college, which is a job signal. You might count this as part of the one-third signaling. But that would be a mistake! You’re engaging in retrospective reasoning. Even if I never use that biology in my life, I didn’t know that when took the course. Taking that bio course was an investment. I invest the time and effort to learn some biology in order to decide whether to do more of it. And even if I don’t become a biologist I might end up working in some area that uses biology. I won’t know ahead of time. This is not a new idea, it’s the general principle of a “well-rounded education,” which is popular in the U.S. (maybe not so much in Europe, where their post-secondary education is more focused on a student’s major.) Also relevant on this “signaling” point is this comment:

4. Also, signaling is complicated and even non-monotonic! Consider this example (which I wrote up here:
“My senior year I applied to some grad schools (in physics and in statistics) and to some jobs. I got into all the grad schools and got zero job interviews. Not just zero jobs. Zero interviews. And these were not at McKinsey, Goldman Sachs, etc. (none of which I’d heard of). They were places like TRW, etc. The kind of places that were interviewing MIT physics grads (which is how I thought of applying for these jobs in the first place). And after all, what could a company like that do with a kid with perfect physics grades from MIT? Probably not enough of a conformist, eh?”
This is not to say your signaling story is wrong, just that I think it’s much more complicated than you’re portraying.

5. This is a minor point, but you write, “If the point of education is certifying the quality of labor, society would be better off if we all got less.” This is not so clear. From psychometric principles, more information will allow better discrimination. It’s naive to think of all students as being ranked on a single dimension so that employers just need to pick from the “top third.” There are many dimensions of abilities and it could take a lot of courses at different levels to make the necessary distinctions. Again, this isn’t central to your argument but you just have to be careful here because you’re saying something that’s not quite correct, statistically speaking.

6. You write, “Consider the typical high school curriculum. English is the international language of business, but American high school students spend years studying Spanish, or even French. During English, many students spend more time deciphering the archaic language of Shakespeare than practicing business writing. Few jobs require knowledge of higher mathematics, but over 80% of high school grads suffer through geometry.” I think all these topics could be taught better, but my real issue here is that this argument contradicts what you said back on page 6, that you were not going to just “complain we aren’t spending our money in the right way.”

To put it another way:

7. You write, “The Ivory Tower ignores the real world.” I think you need to define your terms. Is “Ivory Tower” all of education? All college education? All education at certain departments at certain colleges? Lots of teachers of economics are engaged with the real world, no? Actually it’s not so clear to me what you mean by the real world. I guess it does not include the world of teaching and learning. So what parts of the economy do count as real? I’m not saying you can’t make a point here, but I think you need to define your terms in some way to keep your statements from being meaningless!

And a couple things you didn’t talk about in your book, but I think you should:

– Side effects of Big Education: Big Ed provides jobs for a bunch of politically left-wing profs and grad students, it also gives them influence. For example, Paul Krugman without the existence of a big economics educational establishment would, presumably, not be as influential as the actual Paul Krugman. One could say the same thing about, say, Greg Mankiw, but the point is that academia as a whole, and prestigious academia in particular, contains lots more liberal Krugmans than conservative Mankiws. Setting aside one’s personal political preferences, one might consider this side effect of Big Ed to be bad (in that it biases the political system) or good (in that it provides a counterweight to the unavoidable conservative biases of Big Business) or neutral. Another side effect of Big Ed is powerful teachers unions. Which, once again, could be considered a plus, a minus, or neutral, depending on your political perspective. Yet another side effect of Big Ed is that it funds various things associated with schools, such as high school sports (they’re a big deal in Texas, or so I’ve heard!), college sports, and research in areas ranging from Shakespeare to statistics. Again, one can think of these extracurricular activities as a net benefit, a net cost, or a washout.

In any case, I think much of the debate over the value of education and the structuring of education is driven by attitudes toward its side effects. This is not something you discuss in your book but I think it’s worth mentioning. Where you stand on the side effects can well affect your attitude toward the efficacy of the education establishment. There’s a political dimension here. You’re a forthright guy and I think your book will be strengthened if you openly acknowledge the political dimension rather than leaving it implicit.

(Now that faculty aren’t coming into the office anymore) Will universities ever recover?

A few years ago I taught a course at Sciences Po in Paris. The classes were fine, the students were fine, but there was almost no academic community. I had an office in some weird building where they stuck visitors. The place was mostly pretty empty. Sometimes I’d go over to the department of economics, which was hosting me (but had no office space)—it was in a fancy building on, I think it was the 4th floor but maybe it was the 2nd floor and they were just very long flights of stairs—and most of the faculty were never there. So I didn’t bother to come by very often: what’s the point if there’s no office for you and no colleagues to talk with. I don’t know what it was like for the students who wanted intellectual experiences outside of class: maybe there were places where the grad students hung out? I don’t know.

Anyway, the American universities that I’ve attended and taught at have been nothing like that. Buzzing with faculty and grad students, lots of opportunities for spontaneous conversations.

Then came covid. Classes were moved online, then we weren’t allowed to come into the office or teach in person. At some point they started allowing in-person teaching but they were still discouraging us from showing up to the office or having in-person meetings outside of class. Eventually all become allowed, but then there became the new norm of zoom meetings, faculty who didn’t want to come into work if they didn’t have to, students who wanted to avoid the commute to school, etc. And then, as with Sciences Po those many years ago, I was less motivated to show up to work myself, which resulted in fewer spontaneous interactions with students and colleagues. Online can be convenient—hey, look at this blog!—but I still think something is missing.

So here’s the question: Will universities ever recover?

Sadly, I suspect the answer is no. It’s just too easy not so show up, also this is just the continuation of a decades-long trend of fewer weeks in the semester, fewer days of class in the week, much less need for the physical library, etc. Also, the people at Sciences Po back in 2009 seemed just fine with closed doors and empty corridors. So that arid academic environment seems like a stable equilibrium. It makes me sad. Obv it’s the least of our problems in the world today, but still.

Progress in post-publication review: error is found, author will correct it without getting mad at the people who pointed out the error

Valentin Amrhein points to this quote from a recent paper in the journal, Physical and Engineering Sciences in Medicine:

Some might argue that there is confusion over the interpretation and usage of p-values. More likely, its’ value is misplaced and subsequently misused. The p-value is the probability of an effect or association (hereafter collectively referred to as ‘effect’), and is most often computed through testing a (null) hypothesis like “the mean values from two samples have been obtained by random sampling from the same normal populations”. The p-value is the probability of this hypothesis. Full-stop.

Hey, they mispunctuated “its”! But that’s the least of their problems. The p-value mistake is a classic example of people not knowing what they don’t know, expressing 100% confidence in something that’s completely garbled. Or, as they say in Nudgeland, they’ve discovered a new continent.

But everybody makes mistakes. And it’s the nature of ignorance that it can be hard to notice your ignorance. It can happen to any of us.

The real test is what happens next: when people point out your error. The twitter people were kind enough to point out the mistake in the above article, and Sander Greenland informs me that the author will be sending a correction to the journal. So that’s good. It’s post-publication review working just the way it should. No complaining about Stasi or terrorists or anything, just a correction. Good to see this.

P.S. Confusion about p-values appears to have a long history. Greenland gives some references here.

What’s the origin of “Two truths and a lie”?

Someone saw my recently-published article, “‘Two truths and a lie’ as a class-participation activity,” and wanted to know what is the origin of that popular game.

I have no idea when the game first appeared, who invented it, or what were its precursors. I don’t remember it being around when I was a kid, so at the very least I think its popularity has increased in recent decades, but I’d be interested in the full story.

Can anyone help?

“The distinction between exploratory and confirmatory research cannot be important per se, because it implies that the time at which things are said is important”

This is Jessica. Andrew recently blogged in response to an article by McDermott arguing that pre-registration has costs like being unfair to junior scholars. I agree with his view that pre-registration can be a pre-condition for good science but not a panacea, and was not convinced by many of the reasons presented in the McDermott article for being skeptical about pre-registration. For example, maybe it’s true that requiring pre-registration would favor those with more resources, but the argument given seemed quite speculative. I kind of doubt the hypothesis made that many researchers are trying out a whole bunch of studies and then pre-registering and publishing on the ones where things work out as expected. If anything, I suspect pre-pre-registration experimentation looks more like researchers starting with some idea of what they want to see then tweak their study design or definition of the problem until they get data they can frame as consistent with some preferred interpretation (a.k.a. design freedoms). Whether this is resource-intensive in a divisive way seems hard to comment on without more context. Anyway, my point in this post is not to further pile on the arguments in the McDermott critique, but to bring up certain more nuanced critiques of pre-registration that I have found useful for getting a wider perspective, and which all this reminded me of.

In particular, arguments that Chris Donkin gave in a talk in 2020 about work with Aba Szollosi on pre-registration (related papers here and here) caught my interest when I first saw the talk and have stuck with me. Among several points the talk makes, one is that pre-registration doesn’t deserve privileged status among proposed reforms because there’s no single strong argument for what problem it solves. The argument he makes is NOT that pre-registration isn’t often useful, both for transparency and for encouraging thinking. Instead, it’s that bundling up a bunch of reasons why preregistration is helpful (e.g., p-hacking, HARKing, blurred boundary between EDA and CDA) misdiagnoses the issues in some cases, and risks losing the nuance in the various ways that pre-registration can help. 

Donkin starts by pointing out how common arguments for pre-registration don’t establish privileged status. For example, if we buy the “nudge” argument that pre-registration encourages more thinking which ultimately leads to better research, then we have to assume that researchers by and large have all the important knowledge or wisdom they need to do good research inside of them, it’s just that they are somehow too rushed to make use of it. Another is that the argument that we need controlled error rates in confirmatory data analysis and thus a clear distinction between explanatory and confirmatory research implies that the time at which things are said is important. But, if we take that seriously we’re implying there’s somehow a causal effect of saying what we will find ahead of time that makes it more true later. In other domains however, like criminal law, it would seem silly though to argue that because an explanation was proposed after the evidence came in, it can’t be taken seriously. 

The problem, Donkin argues, is that the role of theory is often overlooked in strong arguments for pre-registration. In particular, the idea that we need a sharp contrast between exploratory versus confirmatory data analysis doesn’t really make sense when it comes to testing theory. 

For instance, Donkin argues that we regularly pretend that we have a random sample in CDA, because that’s what gives it its validity, and the barebones statistical argument for pre-registration is that with EDA we no longer have a random sample, invalidating our inferences. However, in light of the importance of this assumption that we have a random sample in CDA, post-hoc analysis is critical to confirming that we do. We should be poking the data in whatever ways we can think up to see if we can find any evidence that the assumptions required of CDA don’t hold. If not, we shouldn’t trust any tests we run anyway. (Of course, one could preregister a bunch of preliminary randomization checks. But the point seems to be that there are activities that are essentially EDA-ish that can be done only when the data comes in, challenging the default). 

When we see pre-registration as “solving” the problem of EDA/CDA overlap, we invert an important distinction related to why we expect something that happened before to happen again. The reason it’s okay for us rely on inductive reasoning like this is because we embed the inference in theory: the explanation motivates the reason why we expect the thing to happen again. Strong arguments for pre-registration as a fix for “bad” overlap implies that this inductive reasoning is the fundamental first principle, rather than being a tool embedded in our pursuit of better theory. In other words, taking preregistration too seriously as a solution implies we should put our faith in the general principle that the past repeats itself. But we don’t use statistics because they create valid inferences, but because they are a tool for creating good theories.

Overall, what Donkin seems to be emphasizing in this is that there’s a rhetorical risk to too easily accepting that pre-registration is the solution to a clear problem (namely, that EDA and CDA aren’t well separated). Despite the obvious p-hacking examples we may think of when we think about the value of pre-registration, buying too heavily into this characterization isn’t necessarily doing pre-registration a favor, because it’s hiding a lot of nuance in ways that pre-registration can help. For example, if you ask people why pre-registration is useful, different people may stress different reasons. If you give preregistration an elevated status for the supposed reason that it “solves” the problem of EDA and CDA not being well distinguished, then, similar to how any nuance in intended usage of NHST has been lost, you may lose the nuance of preregistration as an approach that can improve science, and increase pre-occupation with a certain way of (mis)diagnosing the problems. Devezer et al. (and perhaps others I’m missing) have also pointed out the slipperiness of placing too much faith in the EDA/CDA distinction. Ultimately, we need to be a lot more careful in stating what problems we’re solving with reforms like pre-registration.

Again, none of this is to take away from the value of pre-registration in many practical settings, but to point out some of the interesting philosophical questions thinking about it critically can bring up.

What’s the difference between Derek Jeter and preregistration?

There are probably lots of clever answers to this one, but I’ll go with: One of them was hyped in the media as a clean-cut fresh face that would restore fan confidence in a tired, scandal-plagued entertainment cartel—and the other is a retired baseball player.

Let me put it another way. Derek Jeter had three salient attributes:

1. He was an excellent baseball player, rated by one source at the time of his retirement as the 58th best position player of all time.

2. He was famously overrated.

3. He was a symbol of integrity.

The challenge is to hold 1 and 2 together in your mind.

I was thinking about this after Palko pointed me to a recent article by Rose McDermott that begins:

Pre-registration has become an increasingly popular proposal to address concerns regarding questionable research practices. Yet preregistration does not necessarily solve these problems. It also causes additional problems, including raising costs for more junior and less resourced scholars. In addition, pre-registration restricts creativity and diminishes the broader scientific enterprise. In this way, pre-registration neither solves the problems it is intended to address, nor does it come without costs. Pre-registration is neither necessary nor sufficient for producing novel or ethical work. In short, pre-registration represents a form of virtue signaling that is more performative than actual.

I think this is like saying, “Derek Jeter is no Cal Ripken, he’s overrated, gets too much credit for being in the right place at the right time, he made the Yankees worse, his fans don’t understand how the game of baseball really works, and it was a bad idea to promote him as the ethical savior of the sport.”

Here’s what I think of preregistration: It’s a great idea. It’s also not the solution to problems of science. I have found preregistration to be useful in my own work. I’ve seen lots of great work that is not preregistered.

I disagree with the claim in the above-linked paper that “Under the guidelines of preregistration, scholars are expected to know what they will find before they run the study; if they get findings they do not expect, they cannot publish them because the study will not be considered legitimate if it was not preregistered.” I disagree with that statement in part for the straight-up empirical reason that it’s false; there are counterexamples; indeed a couple years ago we discussed a political science study that was preregistered and yielded unexpected findings which were published and were considered legitimate by the journal and the political science profession.

More generally, I think of preregistration as a floor, not a ceiling. The preregistered data collection and analysis is what you need to do. In addition, you can do whatever else you want.

Preregistration remains overrated if you think it’s gonna fix science. Preregistration facilitates the conditions for better science, but if you preregister a bad design, it’s still a bad design. Suppose you could go back in time and preregister the collected work of the beauty-and-sex-ratio guy, the ESP guy, and the Cornell Food and Brand Lab guy, and then do all those studies. The result wouldn’t be a spate of scientific discoveries; it would just be a bunch of inconclusive results, pretty much no different than the inconclusive results we actually got from that crowd but with the improvement that the inconclusiveness would have been more apparent. As we’ve discussed before, the benefits of procedural reforms such as preregistration are indirect—making it harder for scientists to fool themselves and others with bad designs—but not direct. Are these indirect benefits greater than the costs? I don’t know; maybe McDermott is correct that they’re not. I guess it depends on the context.

I think preregistration can be valuable, and I say that while recognizing that it’s been overrated and inappropriately sold as a miracle cure for scientific corruption. As I wrote a few years ago:

In the long term, I believe we as social scientists need to move beyond the paradigm in which a single study can establish a definitive result. In addition to the procedural innovations [of preregistration and mock reports], I think we have to more seriously consider the integration of new studies with the existing literature, going beyond the simple (and wrong) dichotomy in which statistically significant findings are considered as true and nonsignificant results are taken to be zero. But registration of studies seems like a useful step in any case.

Derek Jeter was overrated. He was a times a drag on the Yankees’ performance. He was still an excellent player and overall was very much a net positive.