Skip to content

Reflections on a talk gone wrong

The first talk I ever gave was at a conference in 1988. (This isn’t the one that went wrong.) I spoke on Constrained maximum entropy methods in an image reconstruction problem. The conference was in England, and I learned about it from a wall poster. They had travel funding for students. I sent in my abstract, and they liked it. All excited, I walked over to a travel agent office, reserved a round-trip to London, walked over to the bank, withdrew $300, and walked back to the travel agent to buy a ticket. Then I went to the Coop and spent another $100 or so on a suit. When I was preparing my talk, Hal Stern gave me some excellent advice. He said, don’t try to impress them or blow them away. Just explain what you did and say why you think it’s interesting. I flew to London and took the bus to Cambridge and went to the conference. I was the only statistician there so I ended up being the statistics expert. My talk was supposed to be 15 minutes long and was scheduled for the session before lunch on the last day. By the time my turn came up, the session was already running 20 minutes late. My talk went OK but I think that most of the audience was eager for lunch by then. (Recall this research finding.)

I’ve given a few zillion talks in the 30 years since, and they’ve mostly gone well. Usually when it doesn’t go well it’s because I don’t know the audience. Once I spoke at an applied math seminar. I was talking all about logistic regression but they didn’t know what logistic regression was! It wouldn’t have been hard for me to cover the background in 5 minutes, but I hadn’t thought to do that, so they didn’t see the point of what I was talking about. Another time I spoke to a group of people who worked in numerical analysis and, again, I didn’t motivate the statistical problems, so it wasn’t clear to the audience what were the difficult parts in our research. Another time I was giving a remote talk—this was a couple years ago, well before coronavirus—and there had been some miscommunication. I’d been hoping for an open conversation with lots of questions, but the organizers were expecting a 45-minute talk. What ended up happening was that I spoke for about 15 minutes and then asked for questions and discussion topics. Nobody had anything to say, so that was that!

Also, sometimes some people will like a talk and some won’t. It’s hard to satisfy everybody, and sometimes it’s hard to know how things went. People will tell you the positive feedback, but you don’t usually hear the negative reviews to your face. Once time I got a warning a couple days before my talk: the organizer contacted me and told me that there were certain topics I should not bring up because they might upset someone in the audience! The punch line is that person didn’t end up showing up.

The most recent time I gave a disaster of a presentation was a few months ago: it was in a session with many other speakers, I had a 20-minute slot and it was running 5 minutes late. 15 minutes should be enough time for a talk, but I somehow wasn’t in the right frame of mind for giving a short presentation. I spent the first 5 or 10 minutes getting warmed up—this can work fine in a long talk, see for example here—but in a short talk it just doesn’t work. I never really had a chance to get to the interesting part. My bad.

Reflections on a talk gone wrong

– Remote talk. No energy of the crowd. Limited opportunity for audience feedback via laughter and facial expressions. (You’d think that seeing faces on zoom would work. But, no, I’ve found that people on zoom have blank faces compared to a live audience.)

– I didn’t lead off with my target message. The audience didn’t know what my talk was about or where I was going. This can be a particular issue when your talk is one of several in a long session.

– Failing to engage the audience from the beginning. Again, a slow warmup can work well for an hourlong stand-alone talk, but not for something shorter.

– Not conveying the importance of the problem. This is related to not making the theme of the talk clearer.

– Poor flow. I didn’t have a plan for how the 15 minutes would go.

Sketching the distribution of data vs. sketching the imagined distribution of data

Elliot Marsden writes:

I was reading the recently published UK review of food and eating habits. The above figure caught my eye as it looked like the distribution of weight had radically changed, beyond just its mean shifting, over past decades. This would really change my beliefs!

But in fact the distributional data wasn’t available at earlier times, so a normal distribution was assumed, despite the fact that when the distributional data is in fact available, it looks little like a normal distribution.

This reminded me of your recent posts on scientific method as ritual: a table of the category shares for each decade would probably make the point better, but a modelled shifting distribution sends a signal of ability. Enough method to impress, but not enough method to inform.

Yes, there are lots of examples like this, even in textbooks. See the section “The graph that wasn’t there” in this article.

Routine hospital-based SARS-CoV-2 testing outperforms state-based data in predicting clinical burden.

Len Covello, Yajuan Si, Siquan Wang, and I write:

Throughout the COVID-19 pandemic, government policy and healthcare implementation responses have been guided by reported positivity rates and counts of positive cases in the community. The selection bias of these data calls into question their validity as measures of the actual viral incidence in the community and as predictors of clinical burden. In the absence of any successful public or academic campaign for comprehensive or random testing, we have developed a proxy method for synthetic random sampling, based on viral RNA testing of patients who present for elective procedures within a hospital system. We present here an approach under multilevel regression and poststratification (MRP) to collecting and analyzing data on viral exposure among patients in a hospital system and performing publicly available statistical adjustment to estimate true viral incidence and trends in the community. We apply our MRP method to track viral behavior in a mixed urban-suburban-rural setting in Indiana. This method can be easily implemented in a wide variety of hospital settings. Finally, we provide evidence that this model predicts the clinical burden of SARS-CoV-2 earlier and more accurately than currently accepted metrics.

This is a really cool project. Len Covello is a friend from junior high and high school who I hadn’t seen for decades, Yajuan is at the University of Michigan and collaborates with me on various survey research projects, and Siquan is a graduate student in biostatistics at Columbia. Len is a doctor in Indiana, and he contacted me because his hospital was performing coronavirus tests on all their incoming patients. He did some internet research and had the thought that they could use multilevel regression and poststratification (MRP) to adjust the sample to be representative of (a) the population of people who go to that hospital, and (b) the general population of the intake area. Both (a) and (b) are of interest, and they both involve smoothing over time. It should be better to do better than the raw data by performing this adjustment, as it should fix at least some of the problems of the patient mix varying over time. We did the analysis in Stan, adapting the model from my recent paper with Bob Carpenter.

This work is potentially important not just because of whatever we found in this particular study but because it could be done at any hospital or hospital system that’s doing these tests. Instead of just tracking raw positivity rates, you can track adjusted rates. Not perfect, but a step forward, I think.

“They adjusted for three hundred confounders.”

Alexey Guzey points to this post by Scott Alexander and this research article by Elisabetta Patorno, Robert Glynn, Raisa Levin, Moa Lee, and Krista Huybrechts, and writes:

I [Guzey] am extremely skeptical of anything that relies on adjusting for confounders and have no idea what to think about this. My intuition would be that because control variables are usually noisy that controlling for 300 confounders would just result in complete garbage but somehow the effect they estimate ends up being 0?

My reply: It’s a disaster to adjust (I say “adjust,” not “control”) for 300 confounders with least squares or whatever; can be ok if you use a regularized method such as multilevel modeling, regularized prediction, etc. In some sense, there’s no alternative to adjusting for the 300 confounders: once they’re there, if you _don’t_ adjust for them, you’re just adjusting for them using a super-regularized adjustment that sets certain adjustments all the way to zero.
Do I believe the published result that does some particular adjustment? That, I have no idea. What I’d like to see is a graph with the estimate on the y-axis and the amount of adjustment on the x-axis, thus getting a sense of what the adjustment is doing.

P.S. In googling, I found that I already posted on this topic. But I had something different to say this time, so I guess it’s good to ask me the same question twice.

“A Headline That Will Make Global-Warming Activists Apoplectic” . . . how’s that one going, Freakonomics team?

I saw this article in the newspaper today, “2020 Ties 2016 as Hottest Yet, European Analysis Shows,” and accompanied by the above graph, and this reminded me of something.

A few years ago there was a cottage industry among some contrarian journalists, making use of the fact that 1998 was a particularly hot year (by the standards of its period) to cast doubt on the global warming trend. Ummmm, where did I see this? . . . Here, I found it! It was a post by Stephen Dubner on the Freakonomics blog, entitled, “A Headline That Will Make Global-Warming Activists Apoplectic,” and continuing:

The BBC is responsible. The article, by the climate correspondent Paul Hudson, is called “What Happened to Global Warming?” Highlights:

For the last 11 years we have not observed any increase in global temperatures. And our climate models did not forecast it, even though man-made carbon dioxide, the gas thought to be responsible for warming our planet, has continued to rise. So what on Earth is going on?


According to research conducted by Professor Don Easterbrook from Western Washington University last November, the oceans and global temperatures are correlated. . . . Professor Easterbrook says: “The PDO cool mode has replaced the warm mode in the Pacific Ocean, virtually assuring us of about 30 years of global cooling.”

Let the shouting begin. Will Paul Hudson be drummed out of the circle of environmental journalists? Look what happened here, when Al Gore was challenged by a particularly feisty questioner at a conference of environmental journalists.

We have a chapter in SuperFreakonomics about global warming and it too will likely produce a lot of shouting, name-calling, and accusations ranging from idiocy to venality. It is curious that the global-warming arena is so rife with shrillness and ridicule. Where does this shrillness come from? . . .

No shrillness here. Professor Don Easterbrook from Western Washington University seems to have screwed up his calculations somewhere, but that happens. And Dubner did not make this claim himself; he merely featured a news article that featured this particular guy and treated him like an expert. Actually, Dubner and his co-author Levitt also wrote, “we believe that rising global temperatures are a man-made phenomenon and that global warming is an important issue to solve,” so I could never quite figure out in their blog he was highlighting an obscure scientist who was claiming that we were virtually assured of 30 years of cooling.

Anyway, we all make mistakes; what’s important is to learn from them. I hope Dubner and his Freaknomics colleagues learn from this particular prediction that went awry. Remember, back in 2009 when Dubner was writing about “A Headline That Will Make Global-Warming Activists Apoplectic,” and Don Easterbrook was “virtually assuring us of about 30 years of global cooling,” the actual climate-science experts were telling us that things would be getting hotter. The experts were pointing out that oft-repeated claims such as “For the last 11 years we have not observed any increase in global temperatures . . .” were pivoting off the single data point of 1998, but Dubner and Levitt didn’t want to hear it. Fiddling while the planet burns, one might say.

It’s not that the experts are always right, but it can make sense to listen to their reasoning instead of going on about apoplectic activists, feisty questioners, and shrillness.

Cute contrarian takes and “repugnant ideas” can be Freakonomically fun, but they don’t always have much to do with reality.

P.S. Yes, this has come up before.

P.P.S. At this point you might ask why are we picking on Freakonomics? Nobody cares about them anymore! Shouldn’t we be writing about Al Sharpton or Ted Cruz or whoever else happens to be polluting the public discourse? Or recent Ted talk sensations? Sleep is your superpower! Or we could see what’s been published lately in Perspectives on Psychological Science . . . that’s always always good for a laugh, or a cry. Or maybe the pizzagate guy or the disgraced primatologist are up to no good again? Well, we do pick on all those people too. But I don’t want to forget Freakonomics, as it’s been a model for much of the coverage of science and economics in the prestige media during the past fifteen years. And, yeah, I’m angry when they unleash their corporate-populist shtick to promote fringe science and when they don’t take the opportunity to confront their past errors (a problem that is not unique to them). I hate that this sort of drive-by commentary is a template for so much of our science and economics reporting, and one reason I pick on the Freaknomics people is that They. Could. Easily. Do. Much. Better. If. They. Only. Felt. Like. Doing. So. But, hey, instead they can “make global-warming activists apoplectic”! Why promote science when you can instead be a “rogue” and own the libs? Let’s keep our priorities straight here, guys.

P.P.P.S. There’s some discussion in the comments about climate change denial. Let me emphasize that the Freakonomics crew are not climate change deniers. They’ve explicitly stated their belief in climate change. For example, as noted above, they wrote, “we believe that rising global temperatures are a man-made phenomenon and that global warming is an important issue to solve.” If Dubner and Levitt were climate change deniers, I’d say that’s too bad, but it makes sense that they will promote whatever headlines they can find that push their agenda. But, no, they’re not deniers and they’re still promoting the junk. That’s what makes all this so sad. For them, triggering the libs appears to be more important than science or policy.

Include all design information as predictors in your regression model, then postratify if necessary. No need to include survey weights: the information that goes into the weights will be used in any poststratification that is done.

David Kaplan writes:

I have a question that comes up often when working with people who are analyzing large scale educational assessments such as NAEP or PISA. They want to do some kind of multilevel analysis of an achievement outcome such as mathematics ability predicted by individual and school level variables. The files contain the actual weights based on the clustered sampling design. There is a strong interest in using Bayesian hierarchical models and have asked me how to handle the weights. What I tell them is that if the data set contains the actual weighting variables themselves (the variables that went into creating the weights to begin with), then having those variables in the model along with their interactions is probably sufficient. The only thing is that they would not want an interaction if one or more of the main effects is null. I am not aware of a way to actually use the weights that are provided in a hierarchical Bayesian analysis. I am hoping that the advice I am giving is somewhat sound, because as with all surveys, the sampling design must be accounted for. What are your thoughts on this?

My reply: As you say, I think the right thing to do is to include the design factors as predictors in the analysis, and then poststratify if there is interest in population-level averages. In that case it is better not to use the weights. Or, to put it another way, the information that goes into the weights will be used in any poststratification that is done.

“Enhancing Academic Freedom and Transparency in Publishing Through Post-Publication Debate”: Some examples in the study of political conflict

Mike Spagat writes:

You’ll definitely want to see this interesting paper by Kristian Gleditsch.

Research and Politics, a journal for which Kristian Gleditsch is one of the editors, has hosted several valuable rounds of post-publication peer review.

One instance starts with a paper of mine and Stijn van Weezel which replicated, critiqued and improved earlier work on excess deaths in Iraq. This was followed by a critique or our critique by the authors of the original work and a rejoinder from us.

Another is a paper by Silvio Rendon that critiqued war-death estimates for Peru made by Peru’s Truth and Reconciliation Commission and proposed alternative estimates that substantially change our understanding of the war. This was followed by a response from two authors of the Commission’s statistical work and a rejoinder from Rendon. I’ve blogged rather extensively on this discussion.

An interesting aspect of the Peru debate is that the rejoinder is probably the most interesting and penetrating piece in the series. It was only with this piece that the really low quality of the Commission’s work was properly exposed. They were, for example, using perfectly fitting models to extrapolate beyond their data. It took the whole back and forth before we got to this.

I have not followed all the links, but I appreciate the general point of having thorough discussions with multiple exchanges.

Also, the bit about the Truth and Reconciliation Commission reminds me that we shouldn’t automatically accept conclusions just because they come from what sounds like an official source. Remember the problems with that Organization of American States report on the Bolivian election? The OAS sounds so respectable. But, no, they’re just people like any of us.

Weakliem on air rage and himmicanes

Weakliem writes:

I think I see where the [air rage] analysis went wrong.

The dependent variable was whether or not an “air rage” incident happened on the flight. Two important influences on the chance of an incident are the number of passengers and how long the flight was (their data apparently don’t include the number of passengers or duration of the flight, but they do include number of seats and the distance of the flight). As a starting point, let’s suppose that every passenger has a given chance of causing an incident for every mile he or she flies. Then the chance of an incident on a particular flight is approximately:


p is the probability of an incident, k is the chance per passenger-mile, n is the number of passengers, and d is the distance. It’s approximate because some incidents might be the second, third, etc. on a flight, but the approximation is good when the probabilities are small, which they are (a rate of about 2 incidents per thousand flights). When you take logarithms, you get

log(p)=log(k) + log(n) + log(d)

DeCelles and Norton used logit models–that is, log(p)/log(1-p) was a linear function of some predictors. (When p is small, the logit is approximately log(p)). So while they included the number of seats and distance as predictors, it would have been more reasonable to include the logarithms of those variables.

What if the true relationship is the one I’ve given above, but you fit a logit using the number of seats as a predictor? . . . there are systematic discrepancies between the predicted and actual values. That’s relevant to the estimates of the other predictors. . . . a model that adds variables for those qualities will find that first class with front boarding has higher rates than expected given the number of seats, which is exactly what DeCelles and Norton appeared to find. . . .

This is the same problem that led to spurious result in the hurricane names study.

In some sense this doesn’t matter to me because the air rage and himmicane studies were so bad anyway—and it won’t matter to NPR, Ted, PNAS, etc., either, because they follow the rule of publishing well-packaged work by well-connected authors that makes novel claims. Also, given that these researchers had the goals of publishing something dramatic from these research projects, I have no doubt that, even had they followed Weakliem’s particular advice here, they still had enough researcher degrees of freedom to pull some rabbits out of these particular hats.

I’m sharing this, because (a) it’s always fun to hear about air rage and himmicanes, and (b) Weakliem’s making a good general point about regression models. Usually what I say is that the most important thing is not how you model your data, but rather what data you include in your model. But Weakliem is illustrating here that sometimes it does make sense how you do it.

What is/are bad data?

This post is by Lizzie, I also took the picture of the cats.

I was talking to a colleague about a recent paper, which has some issues, but I was a bit surprised by her response that one of the real issues was that it ‘just uses bad data.’ I snapped back reflexively, ‘it’s not bad data, it just needs a better analysis.’

But it got me wondering, what is bad data?

I think ‘bad data’ is fake or falsified you don’t know is fake/falsified. Fake data is great, and I think real data is a fine thing. I don’t know that data can be good or bad, can it? It can be more or less accurate or precise. I can think of things that make it higher or lower quality, but I don’t think we should be assigning data as ‘bad’ or ‘good,’ and labeling it as such to me suggests a misguided relationship to data (which I do think we have in ecology).

The paper uses data from the Living Planet Index (LPI):

…is a measure of the state of the world’s biological diversity based on population trends of vertebrate species from terrestrial, freshwater and marine habitats. The LPI has been adopted by the Convention of Biological Diversity (CBD) as an indicator of progress towards its 2011-2020 target to ‘take effective and urgent action to halt the loss of biodiversity’.

These data have been used a lot to show wild species populations are declining. The new paper purports to show that most wild populations are not declining. The authors use a mixture model to find three parts to their mixture (I don’t know mixture models so encourage anyone who does to take a look and correct me, or just comment generally on the approach, but as best I can tell they looked a priori for three parts to the mixture) and then they took out the ‘extreme’ declining ones and find the rest don’t decline. Okay. Then they wrote a press release saying most populations aren’t declining and we should all be hopeful. Yay.

But the LPI data are best described (by Jonathan Davies) as polling data. We can’t measure most wild species populations and we tend to measure ones from well monitored areas, which are often less biodiverse (I could start saying which animal populations are like which human groups that answer polls a lot, but I will resist). The LPI knows this and they generally do a weighting scheme to try to correct for it (which isn’t great either), but this paper doesn’t seem to try to do much to correct the data. One author wrote a blog about it, noting:

We also note that these declines are more likely in regions that have a larger number of species. This is why the Living Planet Index uses a weighting system, otherwise it would be heavily weighted towards well-monitored locations.

I wish these researchers and others in biodiversity science would use some of the thoughtful-stratification and other approaches used on polling data to try to give us a useful estimate of the state of global biodiversity, instead of press-release-friendly estimates. If anyone needs a project, the data are public!

xkcd: “Curve-fitting methods and the messages they send”

We can’t go around linking to xkcd all the time or it would just fill up the blog, but this one is absolutely brilliant. You could use it as the basis for a statistics Ph.D.

I came across it in this post from Palko, which is on the topic of that Dow 36,000 guy who keeps falling up and up. But that’s another story, related to the idea, which we’ve discussed many times, that Gresham‘s law applies to science.

But really what I like about Munroe’s cartoon above is not the topical relevance to some stupid thing that some powerful person happens to be doing, but rather the amazing range of curves that look like reasonable fits to the exact same data points! I’m remaining of the examples discussed here.

NYT editor described columnists as “people who are paid to have very, very strong convictions, and to believe that they’re right.”

Enrico Schaar points out this news article from 2018 by Ashley Feinberg about the New York Times editorial page. Feinberg writes:

In the December meeting, [New York Times editorial page editor James] Bennet described columnists as “people who are paid to have very, very strong convictions, and to believe that they’re right.”

[A.G.] Sulzberger [now publisher of the Times] emphasized the need for columnists, who aren’t heavily edited, to “have everything buttoned up,” though he allowed that “we are not an organization that [has] fact-checkers.”

In the meeting with Bennet, an employee asked how he makes sure his writers aren’t misrepresenting facts. Bennet replied:

You know we do, I mean at a very basic level, we fact-check our work. So, there is a kind of layer there of having a — but the harder question is representation of fact. And that’s where we’re really, you know, are instilling rules of the road and kind of values for how we approach argumentation and hiring for people. And you know, the first-order value is intellectual honesty. And that means — and God knows we don’t succeed at this every day — but the goal is, you’re supposed to take on the hard arguments on the other side, not the easy arguments. Not the straw men, but the actual substantive, kind of toughest arguments and acknowledge when the other side has a point.

Try to make some sense of this word salad! Columnists are paid to believe that they are right, but intellectual honesty is also a first-order value …

But now you know: the NYT pays David Brooks to think that he is right! Nice work, if you can get it.

Actually, I don’t envy David Brooks at all. To make mistake after mistake, and be too insecure to ever admit it . . . that’s gotta be a horrible life. Every time he writes a column with a factual claim, he’s gotta worry that he’ll screw up and never be able to correct himself.

But, yeah, it seems like he’s simpatico with the boss, and that’s important. If the Times job weren’t available, he’d have to get a job with the Hoover Institution, and that would be kinda horrible.

In all seriousness, I strongly doubt there’s anything special about the Times here except that it has a larger influence than most other publications. It’s rare to see a magazine such as Scientific American whose editors really seem to care about correcting their errors.

Authors retract the Nature Communications paper on female mentors

The paper “The association between early career informal mentorship in academic collaborations and junior author performance” that I (Jessica) previously blogged about has been retracted from Nature Communications. 

Here’s the authors’ statement:

The Authors are retracting this Article in response to criticisms about the assumptions underpinning the Article in terms of the identification of mentorship relationships and the measure of mentorship quality, challenging the interpretation of the conclusions. These criticisms were raised by readers and confirmed by three experts post-publication as part of a journal-led investigation.

In this Article, we analysed publication records to identify pairs of junior and senior researchers working in the same discipline, at the same institution, who are co-authors on papers with no more than 20 authors. We use co-authorship, as defined above, as a proxy of mentorship by senior researchers, with the support of a survey that was targeted at a random sample of a recent cohort of researchers. We measure the quality of mentorship using the number of citations and the connectedness of the senior investigators.

The three independent experts commented on the validity of the approaches and the soundness of the interpretation in the Article. They supported previous criticisms in relation to the use of co-authorship as a measure of mentorship. Thus, any conclusions that might be drawn on the basis of co-authorship may not be automatically extended to mentorship. The experts also noted that the operationalisation of mentorship quality was not validated in the paper.

Although we believe that all the key findings of the paper with regards to co-authorship between junior and senior researchers are still valid, given the issues identified by reviewers about the validation of key measures, we have concluded that the most appropriate course of action is to retract the Article.

We are an interdisciplinary team of scientists with an unwavering commitment to gender equity, and a dedication to scientific integrity. Our work was designed to understand factors that influence the scientific impact of those who advance in research careers. We feel deep regret that the publication of our research has both caused pain on an individual level and triggered such a profound response among many in the scientific community. Many women have personally been extremely influential in our own careers, and we express our steadfast solidarity with and support of the countless women who have been a driving force in scientific advancement. We hope the academic debate continues on how to achieve true equity in science–a debate that thrives on robust and vivid scientific exchange.

All Authors agree with the retraction.


When I read the original study, I balked at how most of the implications suggested by the authors assumed a causal link between female mentors and poor career outcomes, at how the authors failed to mention the existing evidence that female authors are associated with lower citations, and at the label “mentorship” being applied so loosely to coauthorship. 

If I’d reviewed the paper, I would have wanted, at the very least, for the authors to soften the claims about mentorship to better acknowledge the limitations of the measurements. But, given that it was published, as I said in comments on the previous post, I didn’t think the paper needed to be retracted, since reading it, it wasn’t hard to tell how they were operationalizing “mentoring” and even the egregious interpretations about women being worse mentors at the end were hedged enough to be identifiable as speculation. I tend to think retraction should be reserved for cases where the discerning reader cannot tell they are being duped, because of fraud, plagiarism, or errors in analyses that they couldn’t be expected to find. 

I guess I’m comfortable with the idea of retraction having high precision but low recall. Otherwise there would be much more to retract than would seem possible. This would probably also open the door for all sorts of heuristic decisions based on political views, etc. in the less clear cut cases, which, at the extreme, could end up disincentivizing work on any controversial topics. I’m not sure how much retraction adds over issuing a correction attached to the original article in many cases, but if we’re going to do it, it makes sense to me to try to keep it relatively rare and our definition as simple as we can. 

Perhaps what makes me most uncomfortable with retractions for reasons like bad speculation is that retracting some, but not all, science with misleading labels or conclusions seems to play into a myth that the scientific record is a reflection of truth and we have to keep it pure. In cases like this one where a paper makes some seemingly bogus claims but the analysis itself seems to hold, if that paper made it through peer review I’d rather trust the reader figure out which claims or conclusions are exaggerated or not rather than trying to protect them from bad speculation. Realistically, we should expect to see sloppy unsubstantiated claims in much of the science we encounter. Corrections should happen a lot more than they do, and we should not treat them like a huge earthshaking deal when authors, or journals, make them. 

So initially, I was glad to see that the official retraction notice is from the authors themselves, not the journal. I assume that they will still publish their analysis else without those problematic claims. If they chose to hold themselves to a higher standard, good for them. 

But here’s where the impetus behind this retraction gets slightly confusing. While the retraction notice on the website is written by the authors as though they made the decision, it seems Nature Communications did appoint a three person committee to do an additional review after posting a notice on the paper saying they were examining it more closely in November. They describe this process here:

We followed our established editorial processes, which involved recruiting three additional independent experts to evaluate the validity of the approaches and the soundness of the interpretation. They supported previous criticisms and identified further shortcomings in relation to the use of co-authorship as a measure of informal mentorship. They also noted that the operationalisation of mentorship quality, based on the number of citations and network centrality of mentors, was not validated.

According to these criticisms, any conclusions that might be drawn on biases in citations in the context of co-authorship cannot be extended to informal mentorship. As such, the paper’s conclusions in their current form do not stand, and the authors have retracted the paper.

During the investigation, we also received further communications from readers highlighting issues with the paper and are grateful to all the researchers who have contacted us and who have invested their time in reviewing the work.

Simply being uncomfortable with the conclusions of a published paper, would and should not lead to retraction on this basis alone. If the research question is important, and the conclusions sound and valid, however controversial, there can be merit in sharing them with the research community so that a debate can ensue and a range of possible solutions be proposed. In this case, the conclusions turned out not to be supported, and we apologise to the research community for any unintended harm derived from the publication of this paper.

So maybe Nature Communications would have retracted the paper even if the authors didn’t. Or maybe they brought to the authors’ attention that they were considering retracting, and the authors then felt pressed to. I’m not sure.  

At any rate, the backlash against the Nature Communications paper made clear to me how ambiguous language can be in cases like this, and also how different people’s views can be about the responsibility that readers should have. I found the liberal use of the terms mentor and mentorship quality throughout the paper very annoying but not fatal, since the details of what was measured were there for the reader to see, and I read enough social science to be accustomed to authors at times adopting shorthand for what they measure, for better or worse. But if you read many of the sentences in the paper without the mental substitution, they are pretty problematic statements, so there is some space for judgment. Perhaps because of this ambiguity in language, the idea of protecting the reader from misleading claims by retracting everything that someone later points out makes misleading claims seems fraught with challenges to me. 

I’ve also seen some people suggesting it’s unfair for papers like the Nature Comm one to be subject to so much public criticism based on their topics striking a political chord. It seems naive to expect that papers that make strong claims counter to things that many people firmly believe wouldn’t get some extra scrutiny. People simply cared about the topic. I don’t think the paper warranted retraction in this case over something like a statement warning that the measures of mentorship quality weren’t validated. But suggesting that it shouldn’t have gotten so much scrutiny in the first place contributes to a belief that the published record should be venerated. I can sympathize with authors who feel singled out based in part on the topic of the work; I had one of my first published papers ever critiqued by a well-known practitioner in my field. It had won an award, and it made a controversial argument (that sometimes making graphs harder to read is better because it stimulates active cognitive processing on the part of the viewer). Getting critiqued for a paper that I suspected was singled out partly based on the topic wasn’t fun at all. As Andrew recently blogged about, there often seem to be worries in such cases about how public criticism will affect the academic career, and so if the process feels somewhat random it can be unsettling. But there were some valid points made in the critique, as there are in most critiques, and so I learned something, and I assume others did too. The idea that it should have been withheld out of respect or fairness didn’t make sense to me then or now. We shouldn’t be trying to reserve public scrutiny for only the most horrible papers.   

Typo of the day


Megan Higgs (statistician) and Anna Dreber (economist) on how to judge the success of a replication

The discussion started with this comment from Megan Higgs regarding a recent science replication initiative:

I [Higgs] was immediately curious about their criteria for declaring a study replicated. In a quick skim of the info in the google form, here it is:

In the survey of beliefs, you will be asked for (a) the probability that each of the 20 studies using each method will be successfully replicated (defined by the finding of a statistically significant result, defined by p < 0.05, in the same direction as in the original study in a meta-analysis of the three replications) and (b) the expected effect size for the replication.

Hmmm….Maybe a first step before going too far with this should be a deeper discussion of how to define “successfully replicated?”

Higgs had further discussion in comments with Olavo Amaral (one of the organizers of the project), and then I brought in Anna Dreber, another organizer, who wrote:

In the past when we have set up prediction markets on this binary outcome (effect in the same direction as original result and p<0.05 vs not) or effect sizes (for example in terms of Cohen's d), participants have shied away from the effect size ones and wanted to put their money in the binary ones. What do you think would be a better alternative to the binary one, if not effect sizes? The Brazilian team discussed the “95 percent prediction interval” approach (Patil, Peng and Leek, 2016) but I think that's even more problematic. Or what do you think?

I replied that I’m not sure the best thing to do. Maybe make a more complicated bet with a non-binary outcome? People could still bet on “replicate” vs “non-replicate” but the range of payoffs could be continuous? I think it’s worth thinking about this—maybe figuring something out and writing a theoretical paper on it—before doing the next replication study.

Dreber responded:

We just closed a new prediction market on effect sizes where we gave participants separate endowments from the binary markets – will see if that encouraged trading. Another thing we saw in the Social Science Replication Project is that various proposed replication indicators led to almost the same outcomes when we had pretty high power. For example, for the studies that did not replicate according to the binary “effect in the same direction as the original result and p<0.05", average relative effect sizes were around 0, whereas for studies that did replicate according to this criterion, average relative effect sizes were around 75%. "We" (this was something Wagenmakers worked on) had similar results with Bayesian mixture models, for example.

And then Higgs wrote:

I really haven’t thought about the market, betting, and crowd-sourcing aspects. I’m coming at this purely from a statistical inference perspective and based on my frustrations and worries around the misuse of statistical summaries (like p-values and point estimates) motivated by trying to make life simpler.

At the risk of preaching to the choir, I’ll just give a little more of my view on the concept of replication. The goal of replicating studies is not to result in a nice, clean binary outcome indicating whether the results match, or not. After a study is replicated (by this I just mean the design and analysis repeated), it is then a serious (and probably difficult in most cases) job to assess the consistency of the results between the first study and the second (taking into account things that changed). This checking of degree of consistency does not necessarily result in a clean binary outcome of either “successful” replication or “not successful” replication. We humans love to force dichotomization on things that are not inherently dichotomous — it brings comfort in terms of making things clearly tractable and simple, and we rarely acknowledge or discuss what we are losing in the process. I am very worried about the implications of tying great efforts, like yours, at promoting the practice of replicating studies to arbitrary p-value thresholds and signs of point estimates — as statisticians continue to argue for fundamental flaws in these overly-simplified approaches. Do we really want to adopt a structure that assumes the results of a repeated study match the first study, or not, with nothing in between? In reality, most situations will take effort to critically assess and come to a reasonable conclusion. I also think we should encourage healthy argument about the degree to which the results are consistent depending on their analysis and interpretation — and such arguement would be great for science. Okay — I’ll get off my soap box now, but just wanted to give you a better sense of where I’m coming from.

For your project, I definitely see why the binary distinction seems necessary and why having a well-defined and understood criterion (or set of criteria) is attractive. Maybe the simplest solution for now is simply not to attach the misleading labels of “successful replication” and “unsuccessful replication” to the criterion. We know the criteria you are proposing have serious faults related to assessing whether the results of a repeated study adequately match the results from an original study — and I think wording is incredibly important in this situation. I can get myself to the point where I see the fun in having people predict whether the p-value will be <0.05 and/or whether the sign of some point estimate will be the same. But, I see this more as a potentially interesting look into our misunderstandings and misuses of those criteria, assuming the overly-simplistic binary predictions could be shown against the backdrop of a more holistic and continuous approach to assess the degree to which the results of the two studies are consistent. So, I guess my proposed easy-fix solution is to change the wording. You are not having people predict whether a study will be “successfully replicated” or not, you are having them predict whether the p-values will fall on the same side of an arbitrary threshold and/or whether the point estimate will have the same sign. [emphasis added] This can provide interesting information into the behavior of humans relative to these “standard” criteria that have been used in the past, but will not provide information about how well researchers can predict “successful replication” in general. You might be able to come up with another phrase to capture these criteria? With different wording, you are not sending as strong of a message that you believe these are reasonable criteria to judge “successful replication.” I suspect my suggestion is not overly attractive, but trying to throw something out there that is definitely doable and gets around some of the potential negative unintended consequences that I’m worried about. I haven’t thought a lot about the issues with using a percentage summary for average relative effect sizes either, but would be interesting to look into more. In my opinion, the assessment should really be tied to practical implications of the work and an understanding of the practical meaning represented by different values of the parameter of interest — and this takes knowledge of the research context and potential future implications of the work.

On a related note, I’m not sure why getting consistent results is a “success” in the process of replicating a study — the goal is to repeat a study to gain more information, so it’s not clear to me what degree of consistency, or lack thereof, would be qualify as more success or more failure. It seems to me that any outcome is actually a success if the process of repeating the study is undertaken rigorously. I don’t mean to complicate things, but I think it is another something to think about in terms of the subtle messages being sent to researchers all over the world. I think this relates to Andrew’s blog post on replication from yesterday as well. We should try to set up an attitude of gaining knowledge through replicating studies, rather than a structure of success or failure that can seem like at attack on the original study.

I fear that I’m coming across in an annoying lecturing kind of way — it really just stems from years of frustrations working with researchers to try to get them to let go of arbitrary thresholds, etc. and seeing how hard it is for them to push back against how it is now built into the scientific culture in which they are trying to survive. I can vividly picture the meeting where I might discuss how I could work with a researcher to assess consistency of results between two similar studies, and the researcher pushes back with wanting to do something simpler like use your criteria and cites your project as the reason for doing so. This is the reality of what I see happening and my motivation for bringing it up.

A few weeks later, Dreber updated:

The blog post had an effect – the Brazilian team suggested that we rephrase the questions on the markets as “Will the replication of Study X obtain a statistically significant effect in the same direction as the original study?” – so we are avoiding the term “successfully replicated” and making it clear what we are asking for.

And Higgs replied:

I’m really glad it motivated some discussion and change in wording. Given how far they have already come with changing wording, I’m going to throw another suggestion out there. The term “statistically significant” suffers from many of the same issues as “successful replication.” Given our past discussions, I know “p-value < 0.05” is being used as the definition of “statistically significant”, but that is not (and should not be) a broadly accepted universal definition. It’s a slippery slope categorizing results as either significant or not significant (another false dichotomy) — and using a summary statistic that most statisticians agree should not be used in that way. So, my suggestion would be to go further and just replace “statistically significant” with what is meant by it -- “p-value < 0.05”, thus avoiding another big and related issue. So, here is my suggestion: "Will the replication of Study X obtain a [two-sided?] p-value of <0.05 and a point estimate with the same sign as in the original study?” In my mind, the goal is to make it very clear exactly what is being done — and then other people can judge it for what it’s worth without unnecessary jargon that generally has the effect of making things sound more meaningful than they are. I realize it doesn’t sound as attractive, but that’s the reality. This whole exercise feels a little like giving the ingredients for the icing on a cake without divulging the ingredients in the cake itself, but I do think it’s much better than the original version.

One great thing about blogs is we can have these thoughtful discussions. Let’s recall that the topic of replication in science is also full of bad-faith or ignorant Harvard-style arguments such as, “the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%.” I’ll keep pounding on that one because (a) it reminds us what we’re having to deal with here, and (b) it’s hard to figure out what is the replication rate, if we haven’t even defined what a successful replication is.

Our ridiculous health care system, part 734

I went to get a coronavirus test today. We had to get the test for work, and I had no problem with that. What I did have a problem was with that, to get this test, I needed to make an appointment, fill out three forms and take an online “course” (clicking through a set of slides), print out two receipts, show up at the specified time and go through five different people checking these forms—first the security person at the campus building, then he points me to the person who asks me the time of my appointment and checks my form, then she sends me through the gate (I need my ID for that), then I walk down the hall and someone checks my form again and points me to another person who scans my form and sends me into a large empty room where another person is there to show me a form and give me a sticker, which I then bring to yet another person who takes the sticker and gives me a swab which I move around in my nostrils and put in a test tube. Then I can go.

Here’s the point. If it takes this much paperwork just to get a goddam test, then no wonder we as a country are having problems getting people the vaccine. The paperwork is out of control.

We’ve discussed this before.

P.S. My test results arrived the next day by email. To get to the results, I needed to create an account on some hospital system, enter my name, address, sex, and birthdate, supply a password and two ID questions, then click on two more link to get the actual results.

And here’s the jargon-laden lab report that I received:
Continue reading ‘Our ridiculous health care system, part 734’ »

Most controversial posts of 2020

Last year we posted 635 entries on this blog. Above is a histogram of the number of comments on each of the posts. The bars are each of width 5, except that I made a special bar just for the posts with zero comments. There’s nothing special about zero here; some posts get only 1 or 2 comments, and some happen to get 0. Also, number of comments is not the same as number of views. I don’t have easy access to that sort of blog statistic, which is just as well or I might end up wasting time looking at it.

In any case, I wasn’t so thrilled with the histogram. I usually am not such a fan of histograms for displaying data, as the histogram involves this extra level of abstraction. Each bar is a category, not a data point. So I tried a time series plot. The posts are time-stamped but I was kind of lazy so I just plotted the number of comments in time order and then labeled the beginning, middle, and end of the time period on the x-axis:

And here’s a list of all the of last year’s posts, in decreasing order of number of comments. You can draw your own conclusions from all this.

  • “So the real scandal is: Why did anyone ever listen to this guy?” (636 comments)
  • Concerns with that Stanford study of coronavirus prevalence (431 comments)
  • Coronavirus in Sweden, what’s the story? (315 comments)
  • (Some) forecasting for COVID-19 has failed: a discussion of Taleb and Ioannidis et al. (224 comments)
  • So much of academia is about connections and reputation laundering (224 comments)
  • “I don’t want ‘crowd peer review’ or whatever you want to call it,” he said. “It’s just too burdensome and I’d rather have a more formal peer review process.” (203 comments)
  • Don’t kid yourself. The polls messed up—and that would be the case even if we’d forecasted Biden losing Florida and only barely winning the electoral college (197 comments)
  • Coronavirus Quickies (194 comments)
  • Reverse-engineering the problematic tail behavior of the Fivethirtyeight presidential election forecast (184 comments)
  • Coronavirus Grab Bag: deaths vs qalys, safety vs safety theater, ‘all in this together’, and more. (181 comments)
  • Math error in herd immunity calculation from CNN epidemiology expert (178 comments)
  • Years of Life Lost due to coronavirus (178 comments)
  • Unfair to James Watson? (177 comments)
  • 10 on corona (175 comments)
  • What would would mean to really take seriously the idea that our forecast probabilities were too far from 50%? (174 comments)
  • Facemasks in Germany (170 comments)
  • Coronavirus age-specific fatality ratio, estimated using Stan, and (attempting) to account for underreporting of cases and the time delay to death. Now with data and code. And now a link to another paper (also with data and code). (166 comments)
  • More coronavirus testing results, this time from Los Angeles (165 comments)
  • Hydroxychloroquine update (156 comments)
  • “RA Fisher and the science of hatred” (154 comments)
  • The p-value is 4.76×10^−264 1 in a quadrillion (153 comments)
  • I’m frustrated by the politicization of the coronavirus discussion. Here’s an example: (152 comments)
  • What’s the American Statistical Association gonna say in their Task Force on Statistical Significance and Replicability? (152 comments)
  • Given that 30% of Americans believe in astrology, it’s no surprise that some nontrivial percentage of influential American psychology professors are going to have the sort of attitude toward scientific theory and evidence that would lead them to have strong belief in weak theories supported by no good evidence. (151 comments)
  • Are GWAS studies of IQ/educational attainment problematic? (142 comments)
  • Association for Psychological Science takes a hard stand against criminal justice reform (141 comments)
  • In this particular battle between physicists and economists, I’m taking the economists’ side. (140 comments)
  • Male bisexuality gets Big PNAS Energy (137 comments)
  • “The Evidence and Tradeoffs for a ‘Stay-at-Home’ Pandemic Response: A multidisciplinary review examining the medical, psychological, economic and political impact of ‘Stay-at-Home’ implementation in America” (133 comments)
  • What are the best scientific papers ever written? (132 comments)
  • Would we be better off if randomized clinical trials had never been born? (131 comments)
  • Reverse-engineering priors in coronavirus discourse (131 comments)
  • New report on coronavirus trends: “the epidemic is not under control in much of the US . . . factors modulating transmission such as rapid testing, contact tracing and behavioural precautions are crucial to offset the rise of transmission associated with loosening of social distancing . . .” (130 comments)
  • “For the cost of running 96 wells you can test 960 people and accurate assess the prevalence in the population to within about 1%. Do this at 100 locations around the country and you’d have a spatial map of the extent of this epidemic today. . . and have this data by Monday.” (128 comments)
  • More on martingale property of probabilistic forecasts and some other issues with our election model (124 comments)
  • What can we learn from super-wide uncertainty intervals? (123 comments)
  • Retired computer science professor argues that decisions are being made by “algorithms that are mathematically incapable of bias.” What does this mean? (122 comments)
  • Don’t Hate Undecided Voters (122 comments)
  • “America is used to blaming individuals for systemic problems. Let’s try to avoid that this time.” (122 comments)
  • Association for Psychological Science claims that they can “add our voice and expertise to bring about positive change and to stand against injustice and racism in all forms” . . . but I’m skeptical. (121 comments)
  • University of Washington biostatistician unhappy with ever-changing University of Washington coronavirus projections (118 comments)
  • “What is the conclusion of a clinical trial where p=0.6?” (116 comments)
  • Discussion of uncertainties in the coronavirus mask study leads us to think about some issues . . . (115 comments)
  • The second derivative of the time trend on the log scale (also see P.S.) (115 comments)
  • Literally a textbook problem: if you get a positive COVID test, how likely is it that it’s a false positive? (114 comments)
  • NPR’s gonna NPR (special coronavirus junk science edition) (113 comments)
  • Do we really believe the Democrats have an 88% chance of winning the presidential election? (107 comments)
  • This one’s important: Designing clinical trials for coronavirus treatments and vaccines (106 comments)
  • Hilda Bastian and John Ioannidis on coronavirus decision making; Jon Zelner on virus progression models (106 comments)
  • Holes in Bayesian Statistics (106 comments)
  • Vaccine development as a decision problem (104 comments)
  • Problem of the between-state correlations in the Fivethirtyeight election forecast (103 comments)
  • The Economist not hedging the predictions (102 comments)
  • What about this idea of rapid antigen testing? (101 comments)
  • They want “statistical proof”—whatever that is! (101 comments)
  • No, I don’t believe that claim based on regression discontinuity analysis that . . . (101 comments)
  • Coronavirus: the cathedral or the bazaar, or the cathedral and the bazaar? (101 comments)
  • The seventy two percent solution (to police violence) (99 comments)
  • Coronavirus “hits all the hot buttons” for promoting the scientist-as-hero narrative (cognitive psychology edition) (99 comments)
  • Am I missing something here? This estimate seems off by several orders of magnitude! (98 comments)
  • What is the probability that someone you know will die from COVID-19 this year? (97 comments)
  • Flaxman et al. respond to criticisms of their estimates of effects of anti-coronavirus policies (96 comments)
  • Comparing election outcomes to our forecast and to the previous election (96 comments)
  • 17 state attorney generals, 100 congressmembers, and the Association for Psychological Science walk into a bar (95 comments)
  • Is vs. ought in the study of public opinion: Coronavirus “opening up” edition (95 comments)
  • Big trouble coming with the 2020 Census (94 comments)
  • Regression and Other Stories is available! (92 comments)
  • Where are the famous dogs? Where are the famous animals? (89 comments)
  • Where are the collaborative novels? (87 comments)
  • Thinking about election forecast uncertainty (87 comments)
  • Updates of bad forecasts: Let’s follow them up and see what happened! (85 comments)
  • Coronavirus PANIC news (85 comments)
  • Resolving the cathedral/bazaar problem in coronavirus research (and science more generally): Could we follow the model of genetics research (as suggested by some psychology researchers)? (84 comments)
  • Understanding Janet Yellen (83 comments)
  • Concerns with our Economist election forecast (83 comments)
  • Presidents as saviors vs. presidents as being hired to do a job (83 comments)
  • This controversial hydroxychloroquine paper: What’s Lancet gonna do about it? (83 comments)
  • Hey, I think something’s wrong with this graph! Free copy of Regression and Other Stories to the first commenter who comes up with a plausible innocent explanation of this one. (83 comments)
  • Debate involving a bad analysis of GRE scores (81 comments)
  • “Stay-at-home” behavior: A pretty graph but I have some questions (81 comments)
  • New analysis of excess coronavirus mortality; also a question about poststratification (81 comments)
  • Some wrong lessons people will learn from the president’s illness, hospitalization, and expected recovery (80 comments)
  • What happens to the median voter when the electoral median is at 52/48 rather than 50/50? (79 comments)
  • Moving blog to twitter (79 comments)
  • Estimating efficacy of the vaccine from 95 true infections (78 comments)
  • Is there a middle ground in communicating uncertainty in election forecasts? (77 comments)
  • Know your data, recode missing data codes (77 comments)
  • Conflicting public attitudes on redistribution (77 comments)
  • What’s Google’s definition of retractable? (76 comments)
  • OK, here’s a hierarchical Bayesian analysis for the Santa Clara study (and other prevalence studies in the presence of uncertainty in the specificity and sensitivity of the test) (75 comments)
  • Causal inference in AI: Expressing potential outcomes in a graphical-modeling framework that can be fit using Stan (74 comments)
  • Which experts should we trust? (73 comments)
  • “No one is going to force you to write badly. In the long run, you won’t even be rewarded for it. But, unfortunately, it is true that they’ll often let you get away with it.” (73 comments)
  • A better way to visualize the spread of coronavirus in different countries? (73 comments)
  • What went wrong with the polls in 2020? Another example. (72 comments)
  • The Pfizer-Biontech Vaccine May Be A Lot More Effective Than You Think? (72 comments)
  • So, what’s with that claim that Biden has a 96% chance of winning? (some thoughts with Josh Miller) (72 comments)
  • More on that Fivethirtyeight prediction that Biden might only get 42% of the vote in Florida (72 comments)
  • FDA statistics scandal update (71 comments)
  • Who were the business superstars of the 1970s? (71 comments)
  • Imperial College report on Italy is now up (71 comments)
  • Blog about a column about the Harper’s letter: Here’s some discourse about a discourse about what happens when the discourse takes precedence over reality (70 comments)
  • Comments on the new election forecast (69 comments)
  • The history of low-hanging intellectual fruit (69 comments)
  • New York coronavirus antibody study: Why I had nothing to say to the press on this one. (69 comments)
  • Why it can be rational to vote (68 comments)
  • That “not a real doctor” thing . . . It’s kind of silly for people to think that going to medical school for a few years will give you the skills necessary to be able to evaluate research claims in medicine or anything else. (68 comments)
  • Parking lot statistics—a story in three parts (68 comments)
  • RCT on use of cloth vs surgical masks (68 comments)
  • “How to be Curious Instead of Contrarian About COVID-19: Eight Data Science Lessons From Coronavirus Perspective” (68 comments)
  • One dose or two? This epidemiologist suggests we should follow Bugs Bunny and go for two. (67 comments)
  • Calibration and recalibration. And more recalibration. IHME forecasts by publication date (67 comments)
  • New coronavirus forecasting model (67 comments)
  • Thomas Basbøll will like this post (analogy between common—indeed, inevitable—mistakes in drawing, and inevitable mistakes in statistical reasoning). (66 comments)
  • The War on Data: Now we play the price (66 comments)
  • Statistical controversy on estimating racial bias in the criminal justice system (66 comments)
  • Why X’s think they’re the best (66 comments)
  • How scientists perceive advancement of knowledge from conflicting review reports (66 comments)
  • The best coronavirus summary so far (66 comments)
  • Hey! Let’s check the calibration of some coronavirus forecasts. (66 comments)
  • It’s kinda like phrenology but worse. Not so good for the “Nature” brand name, huh? Measurement, baby, measurement. (65 comments)
  • Randomized but unblinded experiment on vitamin D as a coronavirus treatment. Let’s talk about what comes next. (Hint: it involves multilevel models.) (65 comments)
  • No, they won’t share their data. (65 comments)
  • Are female scientists worse mentors? This study pretends to know (64 comments)
  • “Stop me if you’ve heard this one before: Ivy League law professor writes a deepthoughts think piece explaining a seemingly irrational behavior that doesn’t actually exist.” (64 comments)
  • We taught a class using Zoom yesterday. Here’s what we learned. (64 comments)
  • Are we constantly chasing after these population-level effects of these non-pharmaceutical interventions that are hard to isolate when there are many good reasons to believe in their efficacy in the first instance? (62 comments)
  • Decision-making under uncertainty: heuristics vs models (62 comments)
  • So . . . what about that claim that probabilistic election forecasts depress voter turnout? (62 comments)
  • MIT’s science magazine misrepresents critics of Stanford study (61 comments)
  • bla bla bla PEER REVIEW bla bla bla (61 comments)
  • Are informative priors “[in]compatible with standards of research integrity”? Click to find out!! (61 comments)
  • Coronavirus model update: Background, assumptions, and room for improvement (61 comments)
  • The Paterno Defence: Gladwell’s Tipping Point? (61 comments)
  • “Psychology’s Zombie Ideas” (60 comments)
  • Junk Science Then and Now (60 comments)
  • Cops’ views (59 comments)
  • Expert writes op-ed in NYT recommending that we trust the experts (59 comments)
  • Against overly restrictive definitions: No, I don’t think it helps to describe Bayes as “the analysis of subjective
 beliefs” (nor, for that matter, does it help to characterize the statements of Krugman or Mankiw as not being “economics”) (58 comments)
  • This one’s for the Lancet editorial board: A trolley problem for our times (involving a plate of delicious cookies and a steaming pile of poop) (58 comments)
  • Understanding the “average treatment effect” number (57 comments)
  • Fake MIT journalists misrepresent real Buzzfeed journalist. (Maybe we shouldn’t be so surprised?) (57 comments)
  • Information or Misinformation During a Pandemic: Comparing the effects of following Nassim Taleb, Richard Epstein, or Cass Sunstein on twitter. (57 comments)
  • Putting Megan Higgs and Thomas Basbøll in the room together (57 comments)
  • Steven Pinker on torture (57 comments)
  • No, I don’t think that this study offers good evidence that installing air filters in classrooms has surprisingly large educational benefits. (57 comments)
  • How to think about extremely unlikely events (such as Biden winning Alabama, Trump winning California, or Biden winning Ohio but losing the election)? (56 comments)
  • In case you’re wondering . . . this is why the U.S. health care system is the most expensive in the world (56 comments)
  • This is not a post about remdesivir. (56 comments)
  • Do these data suggest that UPS, Amazon, etc., should be quarantining packages? (56 comments)
  • Does this fallacy have a name? (55 comments)
  • The challenge of fitting “good advice” into a coherent course on statistics (55 comments)
  • “Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe” (55 comments)
  • Updated Santa Clara coronavirus report (55 comments)
  • Somethings do not seem to spread easily – the role of simulation in statistical practice and perhaps theory. (54 comments)
  • 2 perspectives on the relevance of social science to our current predicament: (1) social scientists should back off, or (2) social science has a lot to offer (54 comments)
  • How to get out of the credulity rut (regression discontinuity edition): Getting beyond whack-a-mole (54 comments)
  • BMJ update: authors reply to our concerns (but I’m not persuaded) (53 comments)
  • “Sorry, there is no peer review to display for this article.” Huh? Whassup, BMJ? (53 comments)
  • No, It’s Not a Prisoner’s Dilemma (the second in a continuing series): (52 comments)
  • “Inferring the effectiveness of government interventions against COVID-19” (52 comments)
  • Coding and drawing (52 comments)
  • Resolving confusions over that study of “Teacher Effects on Student Achievement and Height” (52 comments)
  • An open letter expressing concerns regarding the statistical analysis and data integrity of a recently published and publicized paper (52 comments)
  • Career advice for a future statistician (52 comments)
  • Quine’s be Quining (51 comments)
  • Political polarization of professions (51 comments)
  • Econ grad student asks, “why is the government paying us money, instead of just firing us all?” (51 comments)
  • “As a girl, she’d been very gullible, but she had always learned more that way.” (51 comments)
  • Four projects in the intellectual history of quantitative social science (51 comments)
  • “Dream Investigation Results: Official Report by the Minecraft Speedrunning Team” (50 comments)
  • Probabilistic forecasts cause general misunderstanding. What to do about this? (50 comments)
  • But the top graph looked like such strong evidence! (50 comments)
  • “Curing Coronavirus Isn’t a Job for Social Scientists” (50 comments)
  • 2 econ Nobel prizes, 1 error (49 comments)
  • No, average statistical power is not as high as you think: Tracing a statistical error as it spreads through the literature (49 comments)
  • What a difference a month makes (polynomial extrapolation edition) (49 comments)
  • “In the world of educational technology, the future actually is what it used to be” (48 comments)
  • Post-election post (48 comments)
  • Simple Bayesian analysis inference of coronavirus infection rate from the Stanford study in Santa Clara county (48 comments)
  • Negativity (when applied with rigor) requires more care than positivity. (47 comments)
  • Mike Pence and Rush Limbaugh on smoking, cancer, and the coronavirus (47 comments)
  • Linear or logistic regression with binary outcomes (47 comments)
  • The importance of descriptive social science and its relation to causal inference and substantive theories (46 comments)
  • Drunk-under-the-lamppost testing (46 comments)
  • Can we stop talking about how we’re better off without election forecasting? (45 comments)
  • Calibration problem in tails of our election forecast (45 comments)
  • How should those Lancet/Surgisphere/Harvard data have been analyzed? (45 comments)
  • Alexey Guzey’s sleep deprivation self-experiment (45 comments)
  • Some recommendations for design and analysis of clinical trials, with application to coronavirus (45 comments)
  • The NeurIPS 2020 broader impacts experiment (44 comments)
  • UX issues around voting (44 comments)
  • Thank you, James Watson. Thank you, Peter Ellis. (Lancet: You should do the right thing and credit them for your retraction. Actually, do one better and invite them to write a joint editorial in your journal.) (44 comments)
  • Who are you gonna believe, me or your lying eyes? (43 comments)
  • Don’t say your data “reveal quantum nature of human judgments.” Be precise and say your data are “consistent with a quantum-inspired model of survey responses.” Yes, then your paper might not appear in PNAS, but you’ll feel better about yourself in the morning. (43 comments)
  • “The Generalizability Crisis” in the human sciences (43 comments)
  • Statistics is hard, especially if you don’t know any statistics (FDA edition) (42 comments)
  • Shortest posterior intervals (42 comments)
  • Election 2020 is coming: Our poll aggregation model with Elliott Morris of the Economist (42 comments)
  • “Banishing ‘Black/White Thinking’: A Trio of Teaching Tricks” (42 comments)
  • Stan pedantic mode (42 comments)
  • I’m still struggling to understand hypothesis testing . . . leading to a more general discussion of the role of assumptions in statistics (42 comments)
  • Just some numbers from Canada (42 comments)
  • Bishops of the Holy Church of Embodied Cognition and editors of the Proceedings of the National Academy of Christ (41 comments)
  • (1) The misplaced burden of proof, and (2) selection bias: Two reasons for the persistence of hype in tech and science reporting (41 comments)
  • Coronavirus disparities in Palestine and in Michigan (41 comments)
  • Some thoughts inspired by Lee Cronbach (1975), “Beyond the two disciplines of scientific psychology” (41 comments)
  • Some thoughts on another failed replication in psychology (41 comments)
  • Some of you must have an idea of the answer to this one. (41 comments)
  • Interesting y-axis (41 comments)
  • Some questions from high school students about college and future careers (40 comments)
  • “The Intellectuals and the Masses” (40 comments)
  • “Positive Claims get Publicity, Refutations do Not: Evidence from the 2020 Flu” (40 comments)
  • New dataset: coronavirus tracking using data from smart thermometers (40 comments)
  • Is there any scientific evidence that humans don’t like uncertainty? (40 comments)
  • Votes vs. $ (40 comments)
  • Does regression discontinuity (or, more generally, causal identification + statistical significance) make you gullible? (39 comments)
  • Risk aversion is not a thing (39 comments)
  • Getting all negative about so-called average power (39 comments)
  • More than one, always more than one to address the real uncertainty. (39 comments)
  • Pandemic cats following social distancing (39 comments)
  • A new hot hand paradox (38 comments)
  • “Postmortem of a replication drama in computer science” (38 comments)
  • Low rate of positive coronavirus tests (38 comments)
  • Can someone build a Bayesian tool that takes into account your symptoms and where you live to estimate your probability of having coronavirus? (38 comments)
  • The intellectual explosion that didn’t happen (38 comments)
  • What are my statistical principles? (37 comments)
  • BMJ FAIL: The system is broken. (Some reflections on bad research, scientism, the importance of description, and the challenge of negativity) (37 comments)
  • The value of thinking about varying treatment effects: coronavirus example (37 comments)
  • “The good news about this episode is that it’s kinda shut up those people who were criticizing that Stanford antibody study because it was an un-peer-reviewed preprint. . . .” and a P.P.P.S. with Paul Alper’s line about the dead horse (37 comments)
  • Last post on hydroxychloroquine (perhaps) (37 comments)
  • Doubts about that article claiming that hydroxychloroquine/chloroquine is killing people (37 comments)
  • If the outbreak ended, does that mean the interventions worked? (Jon Zelner talk tomorrow) (37 comments)
  • “1919 vs. 2020” (37 comments)
  • Vaping statistics controversy update: A retraction and some dispute (37 comments)
  • Researcher offers ridiculous reasons for refusing to reassess work in light of serious criticism (37 comments)
  • Forget about multiple testing corrections. Actually, forget about hypothesis testing entirely. (37 comments)
  • Advice for a yoga studio that wants to reopen? (36 comments)
  • Harvard-laundering (the next stage of the Lancet scandal) (36 comments)
  • Breaking the feedback loop: When people don’t correct their errors (36 comments)
  • Woof! for descriptive statistics (36 comments)
  • “New research suggests Sanders would drive swing voters to Trump — and need a youth turnout miracle to compensate.” (36 comments)
  • Evidence-based medicine eats itself (36 comments)
  • “We’ve got to look at the analyses, the real granular data. It’s always tough when you’re looking at a press release to figure out what’s going on.” (35 comments)
  • My proposal is to place criticism within the scientific, or social-scientific, enterprise, rather than thinking about it as something coming from outside, or as something that is tacked on at the end. (35 comments)
  • Reasoning under uncertainty (35 comments)
  • Considerate Swedes only die during the week. (35 comments)
  • “Older Americans are more worried about coronavirus — unless they’re Republican” (35 comments)
  • What do Americans think about coronavirus restrictions? Let’s see what the data say . . . (34 comments)
  • “I Can’t Believe It’s Not Better” (34 comments)
  • The view that the scientific process is “red tape,” just a bunch of hoops you need to jump through so you can move on with your life (34 comments)
  • Coronavirus jailbreak (34 comments)
  • The value (or lack of value) of preregistration in the absence of scientific theory (34 comments)
  • Deterministic thinking meets the fallacy of the one-sided bet (33 comments)
  • Derived quantities and generative models (33 comments)
  • Update on social science debate about measurement of discrimination (33 comments)
  • Surgisphere scandal: Lancet still doesn’t get it (33 comments)
  • You don’t want a criminal journal… you want a criminal journal (33 comments)
  • Is causality as explicit in fake data simulation as it should be? (32 comments)
  • How much of public health work “involves not technology but methodicalness and record keeping”? (32 comments)
  • Challenges to the Reproducibility of Machine Learning Models in Health Care; also a brief discussion about not overrating randomized clinical trials (32 comments)
  • Baby alligators: Adorable, deadly, or endangered? You decide. (32 comments)
  • Updated Imperial College coronavirus model, including estimated effects on transmissibility of lockdown, social distancing, etc. (32 comments)
  • Are we ready to move to the “post p < 0.05 world”? (32 comments)
  • No, I don’t believe etc etc., even though they did a bunch of robustness checks. (31 comments)
  • Coronavirus dreams (31 comments)
  • “The Taboo Against Explicit Causal Inference in Nonexperimental Psychology” (31 comments)
  • Please socially distance me from this regression model! (31 comments)
  • Is JAMA potentially guilty of manslaughter? (31 comments)
  • Is data science a discipline? (31 comments)
  • Abuse of expectation notation (31 comments)
  • The latest Perry Preschool analysis: Noisy data + noisy methods + flexible summarizing = Big claims (31 comments)
  • What are the most important statistical ideas of the past 50 years? (30 comments)
  • How to think about correlation? It’s the slope of the regression when x and y have been standardized. (30 comments)
  • “Fake Facts in Covid-19 Science: Kentucky vs. Tennessee.” (30 comments)
  • Their findings don’t replicate, but they refuse to admit they might’ve messed up. (We’ve seen this movie before.) (30 comments)
  • The typical set and its relevance to Bayesian computation (30 comments)
  • In Bayesian inference, do people cheat by rigging the prior? (30 comments)
  • My theory of why TV sports have become less popular (29 comments)
  • An example of a parallel dot plot: a great way to display many properties of a list of items (29 comments)
  • “MIT Built a Theranos for Plants” (29 comments)
  • Who was the first literary schlub? (29 comments)
  • “Frequentism-as-model” (29 comments)
  • The “scientist as hero” narrative (29 comments)
  • In Bayesian priors, why do we use soft rather than hard constraints? (29 comments)
  • They added a hierarchical structure to their model and their parameter estimate changed a lot: How to think about this? (29 comments)
  • Merlin did some analysis of possible electoral effects of rejections of vote-by-mail ballots . . . (28 comments)
  • Everything that can be said can be said clearly. (28 comments)
  • Statistics controversies from the perspective of industrial statistics (28 comments)
  • The return of the red state blue state fallacy (28 comments)
  • How the election might have looked in a world without polls (27 comments)
  • All maps of parameter estimates are (still) misleading (27 comments)
  • Reference for the claim that you need 16 times as much data to estimate interactions as to estimate main effects (27 comments)
  • Florida. Comparing Economist and Fivethirtyeight forecasts. (27 comments)
  • Kafka comes to the visa office (27 comments)
  • Super-duper online matrix derivative calculator vs. the matrix normal (for Stan) (27 comments)
  • We need better default plots for regression. (27 comments)
  • Hey, you. Yeah, you! Stop what you’re doing RIGHT NOW and read this Stigler article on the history of robust statistics (27 comments)
  • Number of deaths or number of deaths per capita (27 comments)
  • Stasi’s back in town. (My last post on Cass Sunstein and Richard Epstein.) (27 comments)
  • Basbøll’s Audenesque paragraph on science writing, followed by a resurrection of a 10-year-old debate on Gladwell (26 comments)
  • Between-state correlations and weird conditional forecasts: the correlation depends on where you are in the distribution (26 comments)
  • His data came out in the opposite direction of his hypothesis. How to report this in the publication? (26 comments)
  • Here’s a question for the historians of science out there: How modern is the idea of a scientific “anomaly”? (26 comments)
  • “Why do the results of immigrant students depend so much on their country of origin and so little on their country of destination?” (26 comments)
  • Uncertainty and variation as distinct concepts (26 comments)
  • An article in a statistics or medical journal, “Using Simulations to Convince People of the Importance of Random Variation When Interpreting Statistics.” (26 comments)
  • Is it really true that candidates who are perceived as ideologically extreme do even worse if “they actually pose as more radical than they really are”? (26 comments)
  • I ain’t the lotus (25 comments)
  • “A better way to roll out Covid-19 vaccines: Vaccinate everyone in several hot zones”? (25 comments)
  • Here’s why rot13 text looks so cool. (25 comments)
  • Whassup with the dots on our graph? (25 comments)
  • The NBA strike and what does it take to keep stories in the news (25 comments)
  • Can the science community help journalists avoid science hype? It won’t be easy. (25 comments)
  • Probabilities for action and resistance in Blades in the Dark (25 comments)
  • How good is the Bayes posterior for prediction really? (25 comments)
  • How to describe Pfizer’s beta(0.7, 1) prior on vaccine effect? (24 comments)
  • I like this way of mapping electoral college votes (24 comments)
  • Alexey Guzey plays Stat Detective: How many observations are in each bar of this graph? (24 comments)
  • Association Between Universal Curve Fitting in a Health Care Journal and Journal Acceptance Among Health Care Researchers (24 comments)
  • “In any case, we have a headline optimizer that A/B tests different headlines . . .” (24 comments)
  • Marc Hauser: Victim of statistics? (24 comments)
  • How many patients do doctors kill by accident? (24 comments)
  • Why I Rant (24 comments)
  • IEEE’s Refusal to Issue Corrections (23 comments)
  • 53 fever! (23 comments)
  • Tracking R of COVID-19 & assessing public interventions; also some general thoughts on science (23 comments)
  • The Shrinkage Trilogy: How to be Bayesian when analyzing simple experiments (22 comments)
  • Follow-up on yesterday’s posts: some maps are less misleading than others. (22 comments)
  • A question of experimental design (more precisely, design of data collection) (22 comments)
  • “Figure 1 looks like random variation to me” . . . indeed, so it does. And Figure 2 as well! But statistical significance was found, so this bit of randomness was published in a top journal. Business as usual in the statistical-industrial complex. Still, I’d hope the BMJ could’ve done better. (22 comments)
  • “To Change the World, Behavioral Intervention Research Will Need to Get Serious About Heterogeneity” (22 comments)
  • Election odds update (Biden still undervalued but not by so much) (22 comments)
  • Computer-generated writing that looks real; real writing that looks computer-generated (22 comments)
  • Three unblinded mice (21 comments)
  • We want certainty even when it’s not appropriate (21 comments)
  • The flashy crooks get the headlines, but the bigger problem is everyday routine bad science done by non-crooks (21 comments)
  • Priors on effect size in A/B testing (21 comments)
  • We need to practice our best science hygiene. (21 comments)
  • Naming conventions for variables, functions, etc. (21 comments)
  • Different challenges in replication in biomedical vs. social sciences (21 comments)
  • Intended consequences are the worst (21 comments)
  • The fallacy of the excluded rationality (21 comments)
  • No, this senatorial stock-picking study does not address concerns about insider trading: (20 comments)
  • “It’s turtles for quite a way down, but at some point it’s solid bedrock.” (20 comments)
  • The rise and fall and rise of randomized controlled trials (RCTs) in international development (20 comments)
  • Why is this graph actually ok? It’s the journey, not just the destination. (20 comments)
  • Body language and machine learning (20 comments)
  • Estimated “house effects” (biases of pre-election surveys from different pollsters) and here’s why you have to be careful not to overinterpret them: (20 comments)
  • “Congressional Representation: Accountability from the Constituent’s Perspective” (20 comments)
  • From monthly return rate to importance sampling to path sampling to the second law of thermodynamics to metastable sampling in Stan (20 comments)
  • Bolivia election fraud fraud update (20 comments)
  • The New Yorker fiction podcast: how it’s great and how it could be improved (20 comments)
  • David Leavitt and Meg Wolitzer (20 comments)
  • Top 5 literary descriptions of poker (20 comments)
  • Open forensic science, and some general comments on the problems of legalistic thinking when discussing open science (20 comments)
  • Bayesian Workflow (19 comments)
  • “this large reduction in response rats” (19 comments)
  • Varimax: Sure, it’s always worked but now there’s maths! (19 comments)
  • New England Journal of Medicine engages in typical academic corporate ass-covering behavior (19 comments)
  • Roll Over Mercator: Awesome map shows the unreasonable effectiveness of mixture models (19 comments)
  • Retraction of racial essentialist article that appeared in Psychological Science (19 comments)
  • How unpredictable is the 2020 election? (19 comments)
  • Fit nonlinear regressions in R using stan_nlmer (19 comments)
  • Birthdays! (19 comments)
  • “Repeating the experiment” as general advice on data collection (19 comments)
  • Of Manhattan Projects and Moonshots (19 comments)
  • Greek statistician is in trouble for . . . telling the truth! (18 comments)
  • Stephen Wolfram invented a time machine but has been too busy to tell us about it (18 comments)
  • Sleep injury spineplot (18 comments)
  • Battle of the open-science asymmetries (18 comments)
  • Do we trust this regression? (18 comments)
  • The checklist manifesto and beyond (18 comments)
  • This study could be just fine, or not. Maybe I’ll believe it if there’s an independent preregistered replication. (18 comments)
  • BREAKING: MasterClass Announces NEW Class on Science of Sleep by Neuroscientist & Sleep Expert Matthew Walker – Available NOW (17 comments)
  • More on the Heckman curve (17 comments)
  • Pre-register post-election analyses? (17 comments)
  • She’s wary of the consensus based transparency checklist, and here’s a paragraph we should’ve added to that zillion-authored paper (17 comments)
  • Covid-19 -> Kovit-17 (following the himmicanes principle) (17 comments)
  • Automatic data reweighting! (17 comments)
  • Theorizing, thought experiments, fake-data simulation (17 comments)
  • Further debate over mindset interventions (17 comments)
  • Ugly code is buggy code (17 comments)
  • The two most important formulas in statistics (17 comments)
  • Create your own community (if you need to) (17 comments)
  • Estimating the mortality rate from corona? (17 comments)
  • How much of Trump’s rising approval numbers can be attributed to differential nonresponse? P.S. With more analysis of recent polls from Jacob Long (17 comments)
  • Royal Society spam & more (17 comments)
  • Response to a question about a reference in one of our papers (16 comments)
  • What does it take to be omniscient? (16 comments)
  • The turtles stop here. Why we meta-science: a meta-meta-science manifesto (16 comments)
  • “The Moral Economy of Science” (16 comments)
  • Here’s what academic social, behavioral, and economic scientists should be working on right now. (16 comments)
  • Make Andrew happy with one simple ggplot trick (16 comments)
  • Conference on Mister P online tomorrow and Saturday, 3-4 Apr 2020 (16 comments)
  • My best thoughts on priors (16 comments)
  • MRP Carmelo Anthony update . . . Trash-talking’s fine. But you gotta give details, or links, or something! (16 comments)
  • Which teams have fewer fans than their namesake? I pretty much like this person’s reasoning except when we get to the chargers and raiders. (16 comments)
  • Is it accurate to say, “Politicians Don’t Actually Care What Voters Want”? (16 comments)
  • Of book reviews and selection bias (16 comments)
  • More limitations of cross-validation and actionable recommendations (15 comments)
  • “Time Travel in the Brain” (15 comments)
  • One more Bolivia election fraud fraud thing (15 comments)
  • Using the rank-based inverse normal transformation (15 comments)
  • Come up with a logo for causal inference! (15 comments)
  • How to “cut” using Stan, if you must (15 comments)
  • This graduate student wants to learn statistics to be a better policy analyst (15 comments)
  • Don’t ever change, social psychology! You’re perfect just the way you are (14 comments)
  • Rob Kass: “The truth of a theory is contingent on both our state of knowledge and the purposes to which it will be put.” (14 comments)
  • David Spiegelhalter wants a checklist for quality control of statistical models? (14 comments)
  • Heckman Curve Update Update (14 comments)
  • Dispelling confusion about MRP (multilevel regression and poststratification) for survey analysis (14 comments)
  • If variation in effects is so damn important and so damn obvious, why do we hear so little about it? (14 comments)
  • Conway II (14 comments)
  • Webinar on approximate Bayesian computation (14 comments)
  • Noise-mining as standard practice in social science (14 comments)
  • He’s annoyed that PNAS desk-rejected his article. (14 comments)
  • A factor of 40 speed improvement . . . that’s not something that happens every day! (14 comments)
  • Advice for a Young Economist at Heart (14 comments)
  • Unlike MIT, Scientific American does the right thing and flags an inaccurate and irresponsible article that they mistakenly published. Here’s the story: (13 comments)
  • Today in spam (13 comments)
  • Bayesian Workflow (my talk this Wed at Criteo) (13 comments)
  • What can be our goals, and what is too much to hope for, regarding robust statistical procedures? (13 comments)
  • Age-period-cohort analysis. (13 comments)
  • Let’s do preregistered replication studies of the cognitive effects of air pollution—not because we think existing studies are bad, but because we think the topic is important and we want to understand it better. (13 comments)
  • “Lessons from First Online Teaching Experience after COVID-19 Regulations” (13 comments)
  • Monte Carlo and winning the lottery (13 comments)
  • Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond (12 comments)
  • The 200-year-old mentor (12 comments)
  • As a forecaster, how important is it to “have a few elections under your belt”? (12 comments)
  • Fiction as a window into other cultures (12 comments)
  • Quino y Mafalda (12 comments)
  • “Pictures represent facts, stories represent acts, and models represent concepts.” (12 comments)
  • Rethinking Rob Kass’ recent talk on science in a less statistics-centric way. (12 comments)
  • Hilarious reply-all loop (12 comments)
  • Hey, this was an unusual media request (12 comments)
  • Be careful when estimating years of life lost: quick-and-dirty estimates of attributable risk are, well, quick and dirty. (12 comments)
  • Best econ story evah (12 comments)
  • John Conway (12 comments)
  • Model building is Lego, not Playmobil. (toward understanding statistical workflow) (12 comments)
  • Bernie electability update (12 comments)
  • As usual, I agree with Paul Meehl: “It is not a reform of significance testing as currently practiced in soft-psych. We are making a more heretical point than any of these: We are attacking the whole tradition of null-hypothesis refutation as a way of appraising theories.” (12 comments)
  • What can we do with complex numbers in Stan? (12 comments)
  • Calling all cats (12 comments)
  • When I was asked, Who do you think is most likely to win the Democratic nomination?, this is how I responded . . . (12 comments)
  • The importance of measurement in psychology (12 comments)
  • How many infectious people are likely to show up at an event? (11 comments)
  • The likelihood principle in model check and model evaluation (11 comments)
  • From the Archives of Psychological Science (11 comments)
  • Sh*ttin brix in the tail… (11 comments)
  • Social science and the replication crisis (my talk this Thurs 8 Oct) (11 comments)
  • The U.S. high school math olympiad champions of the 1970s and 1980s: Where were they then? (11 comments)
  • Getting negative about the critical positivity ratio: when you talk about throwing out the bathwater, really throw out the bathwater! Don’t try to pretend it has some value. Give it up. Let it go. You can do this and still hold on to the baby at the same time! (11 comments)
  • This one quick trick will allow you to become a star forecaster (11 comments)
  • They want open peer review for their paper, and they want it now. Any suggestions? (11 comments)
  • Bayesian analysis of Santa Clara study: Run it yourself in Google Collab, play around with the model, etc! (11 comments)
  • BDA FREE (Bayesian Data Analysis now available online as pdf) (11 comments)
  • “Estimating Covid-19 prevalence using Singapore and Taiwan” (11 comments)
  • Estimates of the severity of COVID-19 disease: another Bayesian model with poststratification (11 comments)
  • The Road Back (11 comments)
  • Don’t talk about hypotheses as being “either confirmed, partially confirmed, or rejected” (11 comments)
  • My review of Ian Stewart’s review of my review of his book (11 comments)
  • “End of novel. Beginning of job.”: That point at which you make the decision to stop thinking and start finishing (10 comments)
  • If— (10 comments)
  • You can figure out the approximate length of our blog lag now. (10 comments)
  • A very short statistical consulting story (10 comments)
  • Further formalization of the “multiverse” idea in statistical modeling (10 comments)
  • Bees have five eyes (10 comments)
  • “Everybody wants to be Jared Diamond” (10 comments)
  • Information, incentives, and goals in election forecasts (10 comments)
  • Taking the bus (10 comments)
  • Why we kept the trig in golf: Mathematical simplicity is not always the same as conceptual simplicity (10 comments)
  • “I just wanted to say that for the first time in three (4!?) years of efforts, I have a way to estimate my model. . . .” (10 comments)
  • More on absolute error vs. relative error in Monte Carlo methods (10 comments)
  • Himmicanes again (10 comments)
  • “Which, in your personal judgment, is worse, if you could only choose ONE? — (a) A homosexual (b) A doctor who refuses to make a house call to someone seriously ill?” (10 comments)
  • This one’s important: Bayesian workflow for disease transmission modeling in Stan (10 comments)
  • New Within-Chain Parallelisation in Stan 2.23: This One‘s Easy for Everyone! (10 comments)
  • And the band played on: Low quality studies being published on Covid19 prediction. (10 comments)
  • Why We Sleep—a tale of non-replication. (9 comments)
  • Update on IEEE’s refusal to issue corrections (9 comments)
  • Publishing in Antarctica (9 comments)
  • An odds ratio of 30, which they (sensibly) don’t believe (9 comments)
  • If something is being advertised as “incredible,” it probably is. (9 comments)
  • Bill James is back (9 comments)
  • “100 Stories of Causal Inference”: My talk tomorrow at the Online Causal Inference Seminar (9 comments)
  • On deck through Jan 2021 (9 comments)
  • No, there is no “tension between getting it fast and getting it right” (9 comments)
  • Improving our election poll aggregation model (9 comments)
  • Two good news articles on trends in baseball analytics (9 comments)
  • Faster than ever before: Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation (9 comments)
  • Statistical Workflow and the Fractal Nature of Scientific Revolutions (my talk this Wed at the Santa Fe Institute) (9 comments)
  • Blast from the past (9 comments)
  • Standard deviation, standard error, whatever! (9 comments)
  • It’s “a single arena-based heap allocation” . . . whatever that is! (9 comments)
  • Rodman (9 comments)
  • Controversy regarding the effectiveness of Remdesivir (9 comments)
  • Himmicanes! (9 comments)
  • Upholding the patriarchy, one blog post at a time (9 comments)
  • “Non-disclosure is not just an unfortunate, but unfixable, accident. A methodology can be disclosed at any time.” (9 comments)
  • Conditioning on a statistical method as a “meta” version of conditioning on a statistical model (9 comments)
  • The hot hand fallacy fallacy rears its ugly ugly head (9 comments)
  • Rao-Blackwellization and discrete parameters in Stan (9 comments)
  • Are the tabloids better than we give them credit for? (9 comments)
  • Graphs of school shootings in the U.S. (9 comments)
  • How science and science communication really work: coronavirus edition (8 comments)
  • Stop-and-frisk data (8 comments)
  • “Day science” and “Night science” are the same thing—if done right! (8 comments)
  • Coronavirus corrections, data sources, and issues. (8 comments)
  • A Collection of Word Oddities and Trivia (8 comments)
  • Misleading vote reporting (8 comments)
  • Fugitive and cloistered virtues (8 comments)
  • Correctness (8 comments)
  • Progress in the past decade (8 comments)
  • Prediction markets and election forecasts (7 comments)
  • “Model takes many hours to fit and chains don’t converge”: What to do? My advice on first steps. (7 comments)
  • Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors (7 comments)
  • The point here is not the face masks; it’s the impossibility of assumption-free causal inference when the different treatments are entangled in this way. (7 comments)
  • “Worthwhile content in PNAS” (7 comments)
  • The Fall Guy, by James Lasdun (7 comments)
  • Structural equation modeling and Stan (7 comments)
  • “Sometimes research just has to start somewhere, and subject itself to criticism and potential improvement.” (7 comments)
  • “It just happens to be in the nature of knowledge that it cannot be conserved if it does not grow.” (7 comments)
  • Pocket Kings by Ted Heller (7 comments)
  • On deck for the first half of 2020 (7 comments)
  • Red Team prepublication review update (6 comments)
  • What George Michael’s song Freedom! was really about (6 comments)
  • Mister P for the 2020 presidential election in Belarus (6 comments)
  • Lying with statistics (6 comments)
  • Public health researchers explain: “Death by despair” is a thing, but not the biggest thing (6 comments)
  • Interactive analysis needs theories of inference (6 comments)
  • Stan receives its second Nobel prize. (6 comments)
  • Misrepresenting data from a published source . . . it happens all the time! (6 comments)
  • Post-stratified longitudinal item response model for trust in state institutions in Europe (6 comments)
  • Aki’s talk about reference models in model selection in Laplace’s demon series (6 comments)
  • Get your research project reviewed by The Red Team: this seems like a good idea! (6 comments)
  • “Then the flaming sheet, with the whirr of a liberated phoenix, would fly up the chimney to join the stars.” (6 comments)
  • More coronavirus research: Using Stan to fit differential equation models in epidemiology (6 comments)
  • OHDSI COVID-19 study-a-thon. (6 comments)
  • 100 Things to Know, from Lane Kenworthy (6 comments)
  • Nonparametric Bayes webinar (5 comments)
  • Piranhas in the rain: Why instrumental variables are not as clean as you might have thought (5 comments)
  • Election Scenario Explorer using Economist Election Model (5 comments)
  • Election forecasts: The math, the goals, and the incentives (my talk this Friday afternoon at Cornell University) (5 comments)
  • Parallel in Stan (5 comments)
  • My talk this Wed 7:30pm (NY time) / Thurs 9:30am (Australian time) at the Victorian Centre for Biostatistics (5 comments)
  • Usual channels of clinical research dissemination getting somewhat clogged: What can go wrong – does. (5 comments)
  • Corona virus presentation by the Dutch CDC, also some thoughts on the audience for these sorts of presentations (5 comments)
  • Prior predictive, posterior predictive, and cross-validation as graphical models (5 comments)
  • Smoothness, or lack thereof, in MRP estimates over time (5 comments)
  • Le Detection Club (4 comments)
  • Authors repeat same error in 2019 that they acknowledged and admitted was wrong in 2015 (4 comments)
  • “Valid t-ratio Inference for instrumental variables” (4 comments)
  • Uri Simonsohn’s Small Telescopes (4 comments)
  • Regression and Other Stories translated into Python! (4 comments)
  • StanCon 2020 program is now online! (4 comments)
  • Adjusting for Type M error (4 comments)
  • Inference for coronavirus prevalence by inverting hypothesis tests (4 comments)
  • Embracing Variation and Accepting Uncertainty (my talk this Wed/Tues at a symposium on communicating uncertainty) (4 comments)
  • Validating Bayesian model comparison using fake data (4 comments)
  • “Young Lions: How Jewish Authors Reinvented the American War Novel” (4 comments)
  • My talk Wednesday at the Columbia coronavirus seminar (4 comments)
  • Online Causal Inference Seminar starts next Tues! (4 comments)
  • The Great Society, Reagan’s revolution, and generations of presidential voting (4 comments)
  • Making differential equation models in Stan more computationally efficient via some analytic integration (4 comments)
  • Will decentralised collaboration increase the robustness of scientific findings in biomedical research? Some data and some causal questions. (4 comments)
  • Merlin and me talk on the Bayesian podcast about forecasting the election (3 comments)
  • “Election Forecasting: How We Succeeded Brilliantly, Failed Miserably, or Landed Somewhere in Between” (3 comments)
  • Stan’s Within-Chain Parallelization now available with brms (3 comments)
  • Recently in the sister blog (3 comments)
  • Nooooooooooooo! (3 comments)
  • “Laplace’s Demon: A Seminar Series about Bayesian Machine Learning at Scale” and my answers to their questions (3 comments)
  • StanCon 2020. A 24h Global Event. (More details, new talk deadline: July 1) (3 comments)
  • Making fun of Ted talks (3 comments)
  • “Partially Identified Stan Model of COVID-19 Spread” (3 comments)
  • “A Path Forward for Stan,” from Sean Talts, former director of Stan’s Technical Working Group (3 comments)
  • Recent unpublished papers (3 comments)
  • What up with red state blue state? (3 comments)
  • American Causal Inference May 2020 Austin Texas (3 comments)
  • You don’t need a retina specialist to know which way the wind blows (2 comments)
  • Hiring at all levels at Flatiron Institute’s Center for Computational Mathematics (2 comments)
  • “Statistical Models of Election Outcomes”: My talk this evening at the University of Michigan (2 comments)
  • Korean translation of BDA3! (2 comments)
  • Laplace’s Theories of Cognitive Illusions, Heuristics and Biases (2 comments)
  • “Note sure what the lesson for data analysis quality control is here is here, but interesting to wonder about how that mistake was not caught pre-publication.” (2 comments)
  • A COVID-19 collaboration platform. (2 comments)
  • How to embrace variation and accept uncertainty in linguistic and psycholinguistic data analysis (2 comments)
  • MRP Conference registration now open! (2 comments)
  • MRP Conference at Columbia April 3rd – April 4th 2020 (2 comments)
  • Some Westlake quotes (2 comments)
  • Covid crowdsourcing (1 comments)
  • Best comics of 2010-2019? (1 comments)
  • My scheduled talks this week (1 comments)
  • We are stat professors with the American Statistical Association, and we’re thrilled to talk to you about the statistics behind voting. Ask us anything! (1 comments)
  • They’re looking for Stan and R programmers, and they’re willing to pay. (1 comments)
  • Postdoc in Bayesian spatiotemporal modeling at Imperial College London! (1 comments)
  • Cmdstan 2.24.1 is released! (1 comments)
  • Some possibly different experiences of being a statistician working with an international collaborative research group like OHDSI. (1 comments)
  • This is your chance to comment on the U.S. government’s review of evidence on the effectiveness of home visiting. Comments are due by 1 Sept. (1 comments)
  • StanCon 2020 registration is live! (1 comments)
  • Update on OHDSI Covid19 Activities. (1 comments)
  • COVID19 Global Forecasting Kaggle (1 comments)
  • Sponsor a Stan postdoc or programmer! (1 comments)
  • Deep learning workflow (1 comments)
  • A normalizing flow by any other name (1 comments)
  • Summer training in statistical sampling at University of Michigan (1 comments)
  • StanCon 2020: August 11-14. Registration now open! (1 comments)
  • Exciting postdoc opening in spatial statistics at Michigan: Coccidioides is coming, and only you can stop it! (1 comments)
  • The Generalizer (1 comments)
  • To all the reviewers we’ve loved before (0 comments)
  • Postdoc at the Polarization and Social Change Lab (0 comments)
  • 2 PhD student positions on Bayesian workflow! With Paul Bürkner! (0 comments)
  • Postdoc in Ann Arbor to work with clinical and cohort studies! (0 comments)
  • Birthday data! (0 comments)
  • epidemia: An R package for Bayesian epidemiological modeling (0 comments)
  • The EpiBayes research group at the University of Michigan has a postdoc opening! (0 comments)
  • StanCon 2020 is on Thursday! (0 comments)
  • Jobzzzzzz! (0 comments)
  • Job opportunity: statistician for carbon credits in agriculture (0 comments)
  • Children’s Variety Seeking in Food Choices (0 comments)
  • Sequential Bayesian Designs for Rapid Learning in COVID-19 Clinical Trials (0 comments)
  • Laplace’s Demon: A Seminar Series about Bayesian Machine Learning at Scale (0 comments)
  • MRP with R and Stan; MRP with Python and Tensorflow (0 comments)
  • Coming in 6 months or so (0 comments)
  • Update: OHDSI COVID-19 study-a-thon. (0 comments)
  • Another Bayesian model of coronavirus progression (0 comments)
  • “Are Relational Inferences from Crowdsourced and Opt-in Samples Generalizable? Comparing Criminal Justice Attitudes in the GSS and Five Online Samples” (0 comments)
  • Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC (0 comments)
  • The 100-day writing challenge (0 comments)
  • “It’s not just that the emperor has no clothes, it’s more like the emperor has been standing in the public square for fifteen years screaming, I’m naked! I’m naked! Look at me! And the scientific establishment is like, Wow, what a beautiful outfit.” (0 comments)
  • Call for proposals for a State Department project on estimating the prevalence of human trafficking (0 comments)
  • Hey—the New York Times is hiring an election forecaster! (0 comments)

“Maybe the better analogy is that these people are museum curators and we’re telling them that their precious collection of Leonardos, which they have been augmenting at a rate of about one per month, include some fakes.”

Someone sent me a link to a recently published research paper and wrote:

As far as any possible coverage on your blog goes, this one didn’t come from me, please. It just looks… baffling in a lot of different ways.

OK, so it didn’t come from that person. I read the paper and replied:

Oh, yes, the paper is ridiculous. For a paper like that to be published by a scientific society . . . you could pretty much call it corruption. Or scientism. Or numerology. Or reification. Or something like that. I also made the mistake of doing a google search and finding a credulous news report on it.

Remember that thing I said a few years ago: In journals, it’s all about the wedding, never about the marriage.

For the authors and the journal and the journal editor and the twitter crowd, it’s all just happy news. The paper got published! The good guys won! Publication makes it true.

And, after more reflection:

I keep thinking about the couathors on the project and the journal editors and the reviewers . . . didn’t anyone want to call Emperor’s New Clothes on it? But then I think that I’ve seen some crappy PhD theses, really bad stuff where everyone on the committee is under pressure to pass the person, just to get the damn thing over with. And of course if you give the thesis a passing grade, you’re a hero. Indeed, the worse the thesis, the more grateful the student and the other people on the committee will be! [Just to be clear, most of the Ph.D. theses I’ve seen have been excellent. But, yes, there are some crappy ones too. That’s just the way it is! It’s not just a problem with students. I’ve taught some crappy classes too. — ed.]

So in this case I guess it goes like this: A couple of researchers have a clever, interesting, and potentially important idea. I’ll grant them that. Then they think about how to study it. It’s hard to study social science processes, where so much is hidden! So you need to find some proxy, they come up with some ideas that might be a little offbeat, but maybe they’ll work. . . . then they get the million data points, they do lots of hard work, they get a couple more coauthors and write a flashy paper–that’s not easy either!–maybe it gets rejected by a couple journals and gets sent to this journal.

Once it gets to there, ok, there are a couple possibilities here. One possibility is that one of the authors has a personal or professional connection to someone on the editorial board and so it gets published. I’m not saying it’s straight baksheesh here: they’re friends, they like the same sort of research, they recognize the difficulty of doing this sort of work and even if it’s not perfect it’s a step forward etc etc. The other possibility is they send the paper in cold and they just get lucky: they get an editor and reviewers who like this sort of high-tech social science stuff–actually it all seems a bit 2010-era to me, but, hey, if that’s what floats their boat, whatever.

Then, once the paper’s accepted, it’s happy time! How wonderful for the authors’ careers! How good for justice! How wonderful of the journal, how great for science, etc.

It’s like, ummm, I dunno, let’s say we’re all kinda sad that there have been over 50 Super Bowls and the Detroit Lions have never won it. They’ve never even been in the Super Bowl. But if they were, if they had some Cinderella story of an inspiring young QB and some exciting receivers, a defense that never gives up, a quirky kicker, a tough-but-lovable head coach, and an owner who wasn’t too evil, then, hey, wouldn’t that be a great story! Well, if you’re a journal editor, you not only get to tell the story, you get to write it too! So I guess maybe the NBA would be a better analogy, given that they say it’s scripted . . .

My anonymous correspondent replied:

I just have no idea where to start with this stuff. I find it to be profoundly confused, conceptually. For one thing, the idea that we should take seriously [the particular model posited in the article] is deeply essentialist. I can imagine situations in which is the case, but I can also imagine situations in which it isn’t the case because of interacting factors from people’s life history. . . . That’s how social processes work! But people do this weird move where they assume any discrepant outcome like that must be the result of one particular stage in the process rather than entrenched structures, which, to my mind, really misses the point of how this stuff works.

So I’m just so skeptical of that idea in the first place. And to then claim to have found evidence for it just because of these very indirect analyses?

I responded:

I’m actually less interested in the scientific claims of this paper than in the “sociology” of how it gets accepted etc. One thing that I was thinking of is that, to much of the scientific establishment, the fundamental unit of science is the career. And a paper in a solid journal written by a young scholar . . . what a great way to start a career. The establishment people [yes, I’m “establishment” too, just a different establishment — ed.] can’t imagine why someone like you or me would criticize a published scientific paper—it’s so destructive! Not destructive toward the research hypothesis. Destructive to the career. For us to criticize, this could only be from envy or racism or because we’re losers or whatever. Of course, they don’t seem to recognize the zero-sumness of all this: someone else’s career never gets going because they don’t get the publication, etc.

Anyway, that’s my take on it. To the Susan Fiskes of the world, what we are doing is plain and simple vandalism, terrorism even. A career is a precious vase, taking years to build, and then we just smash it. From that perspective, you can see that criticisms are particularly annoying when they are scientifically valid. After all, a weak criticism can be brushed aside. But a serious criticism . . . that could break the damn vase.

Maybe the better analogy is that these people are museum curators and we’re telling them that their precious collection of Leonardos, which they have been augmenting at a rate of about one per month, include some fakes. Or, maybe one or two of the Leonardos might be of somewhat questionable authenticity. But, don’t worry, the vast majority of their hundreds of Leonardos are just fine. Nothing to see here, move along. Anyway, such a curator could well be more annoyed, the more careful and serious the criticism is.

P.S. The story’s also interesting because the problems with this research have nothing to do with p-hacking, forking paths, etc. Really no “questionable research practices” at all—unless you want to count the following: creating a measurement that has just about nothing to do with what you’re claiming to measure, setting up a social science model that makes no sense, and making broad claims from weak evidence. Just the usual stuff. I don’t think anyone was doing anything wrong on purpose. More than anyone else, I blame the people who encourage, reward, and promote this sort of work. I mean, don’t get me wrong, speculation is fine. Here’s a paper of mine that’s an unstable combination of simple math and fevered speculation. The problem is when the speculation is taken as empirical science. Gresham, baby, Gresham.

“Translation Plagiarism”

Michael Dougherty writes:

Disguised plagiarism often goes undetected. An especially subtle type of disguised plagiarism is translation plagiarism, which occurs when the work of one author is republished in a different language with authorship credit taken by someone else.

I’ve seen this done, where the original language is statistics and the translated language is political science.

Translating ideas into another field can be useful, and I think there can be value in the acts of translation and rediscovery. For example, many, probably most, of the ideas in our path sampling paper already existed in the physics literature, but the connection to statistical problems was not always clear. We didn’t plagiarize anything in our paper—we worked it all out ourselves and cited everything relevant that we knew about—but there was still some translation going on, if only indirectly.

The problem is when the translator doesn’t acknowledge the original source, and then garbles the translation. Garbling the translation happens a lot: ideas that are worth translating, and that aren’t already known in the secondary field, can be complicated. And the bestest scholars aren’t the ones who plagiarize: plagiarism is, among other things, a shortcut to scholarly renown.

Here’s how Dougherty puts it:

Through no fault of their own, the authors of articles that engage [the translated article, without knowing about the original] fundamentally misidentify the author they are addressing, creating a fundamental corruption of scholarly communication. This point is often overlooked in discussions about the harm of plagiarism.

Basbøll and I discuss this in our article, “To throw away data: Plagiarism as a statistical crime”:

Much has been written on the ethics of plagiarism. One aspect that has received less notice is plagiarism’s role in corrupting our ability to learn from data: We propose that plagiarism is a statistical crime. It involves the hiding of important information regarding the source and context of the copied work in its original form. Such information can dramatically alter the statistical inferences made about the work. . . .

A statistical perspective on plagiarism might seem relevant only to cases in which raw data are unceremoniously and secretively transferred from one urn to another. But statistical consequences also result from plagiarism of a very different kind of material: stories. To underestimate the importance of contextual information, even when it does not concern numbers, is dangerous.

This point is key. Copying with appropriate citation is fine—it’s a way to get ideas out there to new audiences. Copying without appropriate citation—plagiarism—does get the ideas out there, but often in scrambled, contextless, and misleading form. See here and here for a couple of notorious examples done by a much-decorated statistician.

What we did in 2020, and thanks to all our collaborators and many more

Published or to be published articles:

Unpublished articles:


Thank you so much, Aki, Aileen, Alex, Alexey, Alexis, Balazs, Ben, Bob, Charles, Chris, Chris, Christian, Collin, Dan, Daniel, David, Dominik, Dustin, Elliott, Erik, Guido, Guillaume, Helen, Jann, Jessica, Jon, Jonah, Jonathan, Jeff, Jennifer, John, Josh, Julien, Kate, Koen, Lauren, Lex, Lizzie, Martin, Matthijs, Merlin, Michael, Pasi, Paul, Paul, Philip, Qixuan, Rob, Ruth, Sean, Shira, Shravan, Sonia, Swupnil, Tuomas, Yair, Yajuan, Yuling, Yutao, and journal editors and reviewers—and lots more collaborators including Ben, Bryn, Erik, Gustavo, Hagai, Len, Lu, Margaret, Manu, Michael, Mitzi, Rachael, Shiro, Siquan, Shiro, Steve, Steve, Susan, Tamara, Tom, Vivienne, Witold, Yotam, and others on ongoing projects that we haven’t yet written up or published. And thanks to all our collaborators of the past decade and to our blog authors and commenters—especially the ones who disagreed with us. And to all you lurkers out there who read the blog faithfully but haven’t found the need to comment yet. And even to the annoying people out there who misrepresented us or presented flat-out bad analyses: you kept us on our toes! And to the developers of the internet and maintainers of the WordPress software and the IT team at Columbia University. We also thank our closest collaborator, and all the 200-year-old mentors out there. And our predecessors and contemporaries who changed statistics in so many ways during the past half century. And thanks to Laura Dickson’s 8th grade students at Sky Ranch Middle School in Sparks, Nevada, who interviewed me for their class project. And to everyone who wrote everything we read this year. And to the developers of vaccines, the poll workers, the growers and developers of food, and everyone else who kept the world going for all of us. And to our loved ones, and to the people who intersected our lives in less pleasant ways as well. And to Mary Gray who was so supportive when I took that class many years ago, and to Grace Yang who taught me stochastic processes a few years after that. I couldn’t follow half of what was going on—I’d just keep scribbling in my notebook beyond whatever I could understand—but she conveyed a sense of the mathematical excitement of the topic.

To all the reviewers we’ve loved before

This post is by Lizzie (I might forget to say that again, when I forget you can see it in the little blue text under the title, or you might just notice it as out of form).

For the end of the year I am saluting the favorite review I received in 2020.

This comes from a paper that included a hierarchical model where we partially pooled by plant species. We had a low and variable replicate number per species, but a pretty good sample size across all species, and we wanted to estimate effects of experimental treatments (things like ‘warm temperature’ and ‘cool temperature’) across species.

We were working on invasive species (species native to somewhere else, in this case Europe, that have been introduced and grow quite well in that somewhere new — in this case North America). There’s a lot of interest in whether evolution post-introduction happens, more specifically evolution that helps the plants do so well somewhere new. We were looking for it in their germination response, by growing seeds we collected in North America or Europe (the seeds’ ‘origin’) in different conditions.

We didn’t find much of an effect of origin. We found effects of our treatments and a few other things, but we didn’t find an origin effect. Maybe because there isn’t a big origin effect, maybe because of the species we picked, or maybe because of lots of things.

In the main text we showed our model estimates, and showed raw data plots in the supplement. I generally think you should try to show raw data in the paper when possible, but this is the first time I got this response from doing it. The reviewer writes:

From [your main text model estimates figure] the main takeaway point that I could garner is that increasing temperatures impact growth and germination speed across species… I find Figure [in the supplement showing the raw data] to be interesting, because the raw data [often for the species with really low sample sizes] suggests to me that you may actually have some origin differences for some species in certain environmental contexts, which may not have come through so clearly in your global, many- leveled models [then details on what specific treatments and for which species this reviewer has discovered some trends].

It’s great to have robust models, but I think it’s worth taking a look at your data and ask whether some more interesting or nuanced stories might come out.

This is the first time I recall when I felt a reviewer was actually taking me by the hand and leading me back to the center of the garden and saying, ‘might you please consider this alternative path? Instead of going left at the fork, perhaps if you go right … you could get the origin effect we all so want to see.’

And to all the reviewers who’ve shared their thoughts
Who now are someone’s else’s reviewer
For helping me to grow
I owe a lot I know….