Skip to content

Should we judge pundits based on their demonstrated willingness to learn from their mistakes?

Palko writes:

Track records matter. Like it or not, unless you’re actually working with the numbers, you have to rely to some degree on the credibility of the analysts you’re reading. Three of the best ways to build credibility are:

1. Be right a lot.

2. When you’re wrong, admit it and seriously examine where you went off track, then…

3. Correct those mistakes.

I [Palko] hve been hard on Nate Silver in the past, but after a bad start in 2016, he did a lot to earn our trust. By the time we got to the general election, I’d pretty much give him straight A’s. By comparison, there were plenty of analysts who got straight F’s, and a lot of them are playing a prominent role in the discussion this time around. . . .

This seems like good advice from Palko. The difficulty is in applying it. With the exception of the admirable Nate Silver (no, Nate’s not perfect; nobody is; like Bill James he sometimes has fallen into the trap of falling in love with his own voice and making overly-strong pronouncements; sometimes he just mouths off about things he knows nothing about; but, hey, so do we all: overall Nate is sane and self-correcting, even if recently he’s been less than open about recognizing where his work has problems, perhaps not realizing that everyone’s work has problems and there’s no shame in admitting and correction them), I don’t know that there are any pundits who regularly assess their past errors.

Palko picks on NYT political analyst Nate Cohn, but really he could name just about anyone who writes regularly on politics who’s ever published an error. It’s not like Gregg Easterbrook is any better. So I’m not quite sure what can possibly be done here.

What is the relevance of “bad science” to our understanding of “good science”?

We spend some time talking about junk science, or possible junk science, most recently that book about sleep, but we have lots of other examples such as himmicanes, air rage, ages ending in 9, pizzagate, weggy, the disgraced primatologist, regression discontinuity disasters, beauty and sex ratio, the critical positivity ratio, slaves and serfs, gremlins, and lots more examples that I don’t happen to recall at this moment.

Why do I keep writing about this? Why do we care?

Here’s a quick reminder of several reasons that we care:

1. Some junk science is consequential. For example, Weggy was advising a congressional committee when he was making stuff up about research, and it seems that the Gremlins dude is, as the saying goes, active in the environmental economics movement.

2. The crowd-out, or Gresham, effect. Junk science appears in journals, careful science doesn’t. Junk science appears in PNAS, gets promoted by science celebrities and science journalists. The prominent path to success offered by junk science motivates young scientists to pursue work in that direction. Etc. There must be lots of people doing junk science who think they’re doing the good stuff, who follow all ethical principles and avoid so-called questionable research practices, but are still doing nothing in their empirical work but finding patterns in noise. Remember, honesty and transparency are not enough.

3. There’s no sharp dividing line between junk science and careful science, or between junk scientists and careful scientists. Some researchers such as Kanazawa and Wansink are purists and only seem to do junk science (which in their case is open-ended theory plus noisy experiments with inconclusive results), other people have mixed careers, and others of us try our best to do careful science but still can fall prey to errors of statistics and data collection. Recall the 50 shades of gray.

In short, we care about junk science because of its own malign consequence, because people are doing junk science without even realizing it—people who think that if they increase N and don’t “p-hack,” they’re doing things right—and because even those of us who are aware of the perils of junk science can still mess up.

A Ted talkin’ sleep researcher misrepresenting the literature or just plain making things up; a controversial sociologist drawing sexist conclusions from surveys of N=3000 where N=300,000 would be needed; a disgraced primatologist who wouldn’t share his data; a celebrity researcher in eating behavior who published purportedly empirical papers corresponding to no possible empirical data; an Excel error that may have influenced national economic policy; an iffy study that claimed to find that North Korea was more democratic than North Carolina; a claim, unsupported by data, that subliminal smiley faces could massively shift attitudes on immigration; various noise-shuffling statistical methods that just won’t go away—all of these, and more, represent different extremes of junk science.

None of us do all these things, and many of us try to do none of these things—but I think that most of us do some of these things much of the time. We’re sloppy with the literature, making claims that support our stories without checking carefully; we draw premature conclusions from so-called statistically significant patterns in noisy data; we keep sloppy workflows and can’t reconstruct our analyses; we process data without being clear on what’s being measured; we draw conclusions from elaborate models that we don’t fully understand.

The lesson to take away from extreme cases of scientific and scholarly mispractice is not, “Hey, these dudes are horrible. Me and my friends aren’t like that!”, but rather, “Hey, these are extreme versions of things that me and my friends might do. So let’s look more carefully at our own practices!”

P.S. Above cat picture courtesy of Zad Chow.

Postdoc in precision medicine at Johns Hopkins using Bayesian methods

Aki Nishimura writes:

My colleague Scott Zeger and I have a postdoc position for our precision medicine initiative at Johns Hopkins and we are looking for expertise in Bayesian methods, statistical computation, or software development.

Expertise in Stan would be a plus!

Rapid prepublication peer review

The following came in the email last week from Gordon Shotwell:

You posted about an earlier pilot trial of calcifidiol, so I wanted to send you this larger study. The randomization is a bit funky and if you were interested it would be great to hear what sorts of inferences we can make about this data.

And here’s all the vitamin D evidence to date.

I had a few spare minutes so I clicked on the first link, and this is what I saw:
Continue reading ‘Rapid prepublication peer review’ »

Luc Sante reviews books by Nick Hornby and Geoffrey O’Brien on pop music

From 2004. Worth a read, if you like this sort of thing, which I do, but I guess most of you don’t.

Ethics washing, ethics bashing

This is Jessica. Google continues to have a moment among those interested in tech ethics, after firing the other half (with Timnit Gebru) of their ethical AI leadership, Margaret Mitchell, who had founded the ethical AI team. Previously I commented on potential problems behind the review process that led to a paper that Gebru and Mitchell authored with researchers at University of Washington, On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? :parrot_emoji:, which played the role of an official reason for the disagreement with Gebru.

Awhile back when it became public I read the paper, which is interesting and in my opinion, definitely not worth the sort of scrutiny and uproar it seems to have caused. It’s basically a list of points made under the pretty non-threatening guise of putting the limitations of large scale language models (like BERT or GPT-3) in context, to “reduce hype which can mislead the public and researchers themselves regarding the capabilities of these LMs, but might encourage new research directions that do not necessarily depend on having larger LMs”. The main problems described are the environmental impact of training very large models, where they cite some prior work discussing the environmental cost of achieving deep learning’s accuracy gains; the fact that training models on data that enforce stereotypes or biases or don’t represent the deployment population accurately can perpetuate biases; the opportunity costs of researchers heavily investing in large scale LMs given that they don’t necessarily advance the natural language processing toward long-term goals like general language understanding; and the risks of text generated by LMs being mistaken for “meaningful and corresponding to the communicative intent of some individual or group of individuals.” I personally found the last point the most interesting. In combination with the second point about bias,  it made me think maybe we should repurpose the now hokey term “singularity” to refer instead to a kind of choking-on-our-own-vomit-without-realizing-it that recent research in algorithmic bias points to.

However, it seems doubtful that the paper was the real reason behind Gebru’s and now Mitchell’s dismissals, so I’m not going to discuss it in detail. Instead I’ll just make a few non-remarkable observations about attempts to change the make-up and values of computer scientists and big tech.

First, given how devoted Gebru and Mitchell seem to their causes of diversity and ethical oversight in AI and tech more broadly, I can’t help but wonder what Google executives had in mind when the Ethical AI team was being created, or when Mitchell or Gebru were hired. It strikes me as somewhat ironic that while many big tech companies are known for highly data driven hiring practices, they couldn’t necessarily foresee the seeming irreconcilable differences they had with the Ethical AI leadership. I recently came across this paper by Elettra Bietti which uses the term “ethics washing” to describe tech companies attempts to self-regulate ethics, and “ethics bashing” to refer to how these efforts get trivialized so that ethics comes to be seen as something as simple as having an ethics board or a plan for self-governance rather than an intrinsically valuable mode of seeking knowledge. These descriptions seem relevant to the apparent breakdown in the Google Ethical AI situation. From a pessimistic view that assumes any altruistic goals a big tech company has will always take a back seat to their business strategy, then Google would seem to be struggling to find the right balance of investing enough in ethics to appear responsible and trustworthy while avoiding the consequences when those they employ to help them with these initiatives become frustrated by the limitations put on their ability to change things.

Personally I question whether it’s possible to make real changes to the makeup of tech without it coming from the very top. My experience at least in academia has generally been that for initiatives like changing the makeup of the faculty or student body you need either the majority of faculty to be pushing for it (and a chair or dean who won’t overrule) or you need the chair or dean to be the one pushing for it. Diversity committees, which often tend to be comprised mostly of women and others from marginalized backgrounds, can make an environment more supportive for others like them, but have trouble winning anyone over to their cause without those in positions of power to reinforce the message and lend the resources. 

At any rate, what’s happening at Google has brought a lot more attention to the question of how to make big tech corporations accountable for the impact their systems have. I don’t know enough about policy to have a strong view on this, but I can understand why many researchers in algorithmic bias might think that regulation is the only way to overcome conflicts of interest between ethics advocates and businesses who make money off of algorithms. Though it’s clear from the existing work that’s been done on algorithmic bias that there are some significant challenges when it comes to defining bias, so this may not be right around the corner. I’m reminded of debates around methodological reform in psych, where there’s similarly a desire to prevent problems by putting in place top-down requirements, but how we define the problems and how we evaluate the proposed solutions are no small matters. 

Maybe requiring more transparency around all new Google products would be a reasonable first step. I really don’t know. I’m also not sure how regulation could address the lack of diversity in the company, especially in subdivisions like Google Brain, though some researchers in algorithmic bias including Gebru have argued that having a more diverse set of perspectives involved in developing algorithms is part of the solution to ethics problems in AI and ML. So I expect struggles like this will continue to play out. 

Statisticians don’t use statistical evidence to decide what statistical methods to use. Also, The Way of the Physicist.

David Bailey, a physicist at the University of Toronto, writes:

I thought you’d be pleased to hear that a student in our Advanced Physics Lab spontaneously used Stan to analyze data with significant uncertainties in both x and y. We’d normally expect students to use python and orthogonal distance regression, and STAN is never mentioned in our course, but evidently it is creeping into areas I wasn’t expecting.

The other reason for this email is that I’d be curious to (eventually) read your comments on this new article by Anne‐Laure Boulesteix, Sabine Hoffmann, Alethea Charlton, and Heidi Seibold in Significance magazine: “A replication crisis in methodological research?” I [Bailey] have done a bit of research on scientific reproducibility, and I am always curious to understand better how statistical methods are validated in the messy real world. I often say that physics is much easier to do than medical and social sciences because “electrons are all identical, cheap, and unchanging – people are not.”

I read the linked article by Boulesteix et al., and I agree with their general points, although they perhaps underestimate the difficulty of these evaluations. Here are three relevant articles from 2012-2014:

1. In a 2012 article, Eric Loken and I criticize statisticians for “not practicing what we preach,” regarding teaching rather than methodology, but a similar issue.

2. I ask how we choose our methods; see section 26.2 of this article from 2014.

3. Another take on what we consider as convincing evidence is this article from 2014 with Keith O’Rourke, which begins as follows:

The rules of evidence as presented in statistics textbooks are not the same as the informal criteria that statisticians and practitioners use in deciding what methods to use.

According to the official rules, statistical decisions should be based on careful design of data collection, reliable and valid measurement, and something approximating unbiased or calibrated estimation. The first allows both some choice of the assumptions and an opportunity to increase their credibility, the second tries to avoid avoidable noise and error and third tries to restrict to methods that are seemingly fair. This may be fine for evaluating psychological experiments, or medical treatments, or economic policies, but we as statisticians do not generally follow these rules when considering improvements in our teaching nor when deciding what statistical methods to use.

So, yes, it’s kind of embarrassing that statisticians are always getting on everybody else’s case for not using random sampling, controlled experimentation, and reliable and valid measurements—but then we don’t use these tools in our own decision making.

P.S. On Bailey’s webpage there’s a link to this page on “The Way of the Physicist”:

Our students should be able to
– construct mathematical models of physical systems
– solve the models analytically or computationally
– make physical measurements of the systems
– compare the measurements with the expectations
– communicate the results, both verbally and in writing
– improve and iterate
and to apply these skills to both fundamental and applied problems.

I like that! It’s kind of like The Way of the Statistician. Bailey also wrote this article about how we learn from anomalies, in a spirit similar to our God is in Every Leaf of Every Tree principle.

P.P.S. I sent the above comments to Boulesteix et al. (the authors of the above-linked article), who replied:

We agree with your discussion about the shortcomings of the sources of evidence on the effectiveness of statistical methods in Gelman (2013) and Gelman and O’Rourke (2014). Indeed, we generally expect methodological papers to provide evidence on the effectiveness of the proposed methods and this evidence is often given through mathematical theory or computer simulations “which are only as good as their as-sumptions” (Gelman, 2013). When evidence of improved performance is given in benchmarking studies, the number of benchmark data sets is often very limited and the process used to select datasets is unclear. In this situation, researchers are incentivized to use their researcher degrees of freedom to show their methods from the most appealing angle. We do not suggest any intentional cheating here, but in the fine-tuning of method settings, data sets and pre-processing steps, researchers (ourselves included) are masters of self-deception. If we do not at least encourage authors to minimize cherry-picking when providing this evidence, then it is just silly to ask for evidence.

We also agree with you that the difficulties of becoming more evidence-based should not be underestimated. At the same time, we are not arguing that it is an easy task, we say that it is a necessary one. As you say in Gelman and O’Rourke (2014), statistics is a young discipline and, in the current statistical crisis in science, we feel that we can learn from the progress that is being made in other disciplines (while keeping in mind the differences between statistics and other fields). Currently, “statisticians and statistical practitioners seem to rely on a sense of anecdotal evidence based on personal experience and on the attitudes of trusted colleagues” (Gelman, 2013). For much of its history, medicine was more of an art than a science and relied on this type of anecdotal evidence, which has only come to be considered insufficient in the last century. We might be able to learn important lessons from experiences in this field when it comes to evidence-based methodological research.

Starting from the list of evidence that you present in Gelman (2013) and Gelman and O’Rourke (2014), we might for instance establish a pyramid of evidence. The highest level of evidence in this pyramid could be systematic methodological reviews and pre-registered neutral comparison studies on a number of benchmark data sets determined through careful sample size calculation or based on neutral and carefully designed simulation studies. We have to admit that evidence on real data sets is easier to establish for methods where the principal aim is prediction rather than parameter estimation and we do not pretend to have the answer to all open questions. At the same time, it is encouraging to see that progress is being made on many fronts, ranging from the pre-registration of machine learning research (e.g. NeurIPS2020 pre-registration experiment,, to the replication of methodological studies (, and to the elaboration of standardized reporting guidelines for simulation studies (see, e.g., De Bin et al., Briefings in Bioinformatics 2020 for first thoughts on this issue).

We appreciate your argument in Gelman and Loken (2012) that we should be more research based in our teaching practice. Exactly as applied statisticians should, in an ideal world, select statistical methods based on evidence generated by methodological statisticians, teaching statisticians should, in an ideal world, select teaching methods based on evidence generated by statisticians committed to didactics research. In both fields – teaching and research, it seems deplorable that we do not practice what we preach for other disciplines. In the long run, we also hope that our teaching will benefit from more evidence-based methodological research. Our students are often curious to know which method they should apply on their own data and, in the absence of neutral, high-quality studies on the comparative effectiveness of statistical methods, it is difficult to give a satisfactory answer to this question. All in all, we hope that we can stop being “cheeseburger-snarfing diet gurus who are unethical in charging for advice we wouldn’t ourselves follow” (Gelman and Loken (2012)) regarding both teaching and research.

I responded that I think they are more optimistic than I am about evidence-based medicine and evidence-based education research. It’s not so easy to select teaching methods based on evidence-based research. And for some concerns about evidence based medicine, see here.

Is sqrt(2) a normal number?

In a paper from 2018, Pierpaolo Uberti writes:

In this paper we study the property of normality of a number in base 2. A simple rule that associates a vector to a number is presented and the property of normality is stated for the vector associated to the number. The problem of testing a number for normality is shown to be equivalent to the test of geometrical properties of the associated vector. The paper provides a general approach for normality testing and then applies the proposed methodology to the study of particular numbers. The main result of the paper is to prove that an infinite class of numbers is normal in base 2. As a further result we prove that the irrational number √2 is normal in base 2.

And here’s the background:

Given an integer b ≥ 2, a b-normal number (or a normal number) is a number whose b-ary expansion is such that any preassigned sequence of length k ≥ 1 occurs at the expected frequency 1. . . .

The interest in studying normal numbers lies not only in their randomness but also in the fact that they are extremely difficult to identify and obscure in many other aspects. Despite the appeal of the concept, its trivial interpretation and the proof that almost all real numbers are normal, the proof for given irrational numbers to be normal in some base is still elusive. . . .

For many years, ever since I heard about the idea of the distribution of the decimal (or, in this case, binary) expansion, I’ve wondered if sqrt(2) is a normal number. Indeed, I’ve always felt that if there was one mathematical theorem more than any other I’d like to prove, it’s this.

I was reminded of this topic yesterday when reading this comment by X, so I googled *Is sqrt(2) a normal number?* and came across the above-linked paper.

The question is, is Uberti correct? I guess the answer is no, he’s just bullshitting. Or, to put it another way, he’s probably correct that sqrt(2) is a normal number, but he’s incorrect that he’s proved it. I say this because:

1. The article is from 2018 on Arxiv but it still hasn’t been published anywhere, which would seem unlikely if it’s really a proof of this longstanding conjecture.

2. Another quick google search led to this short online discussion with complete skepticism. Nobody there goes to the trouble of completely dismantling Uberti’s argument but only because they don’t seem to feel it’s worth the time (in contrast, to, say, really bad statistical arguments which are worth shooting down because of their real-world implications).

3. The author is not a mathematician and he seems to have published nothing in pure math.

So . . . too bad! We still don’t know if sqrt(2) is a normal number.

I’m bummed. But, I guess, my bad for getting fooled by that just for a moment. We always have to remember: dead-on-arrival papers don’t just get published by Statistical Science, Journal of Personality and Social Psychology, and SSRN. They also appear on Arxiv.

As to the answer to the question posed in the title of this post: I think it’s gotta be Yes! But it sounds like the result may never be proved.

Simulated-data experimentation: Why does it work so well?

Someone sent me a long question about a complicated social science problem involving intermediate outcomes, errors in predictors, latent class analysis, path analysis, and unobserved confounders. I got the gist of the question but it didn’t quite seem worth chasing down all the details involving certain conclusions to be made if certain affects disappeared in the statistical analysis . . .

So I responded as follows:

I think you have to be careful about such conclusions. For one thing, a statement such as saying that effects “disappear” . . . they’re not really 0, they’re just statistically indistinguishable from 0, which is a different thing.
One way to advance on this could be simulate fake data under different underlying models and then apply your statistical procedures and see what happens.

That’s the real point. No matter how tangled your inferential model is, you can always just step back and simulate your system from scratch. Construct several simulated datasets, apply your statistical procedure to each, and see what comes up. Then you can see what your method is doing.

Why does fake-data simulation work so well?

The key point is that simulating from a model requires different information from fitting a model. Fitting a model can be hard work, or it can just require pressing a button, but in any case it can be difficult to interpret the inferences. But when you simulate fake data, you kinda have to have some sense of what’s going on. You’re starting from a position of understanding.

P.S. The first three words of the above title were originally “Fake-data simulation,” but I changed them to “Simulated-data experimentation” after various blog discussions.

Postdoc in Paris for Bayesian models in genetics . . . differential equation models in Stan

Julie Bertrand writes:

The BIPID team in the IAME UMR1137 INSERM Université de Paris is opening a one-year postdoctoral position to develop Bayesian approaches to high throughput genetic analyses using nonlinear mixed effect models.

The candidate will analyse longitudinal phenotype data using differential equation models on clinical trial data with Stan and perform simulation studies to compare different approaches to test for a genetic association (e.g. Laplace or horseshoe priors and prediction projections).

At BIPID, we design, perform and analyse clinical trials and cohorts in order to better understand variability in response to antimicrobial agents and for epidemiological description and prognosis assessment of infectious diseases. The group is based in the north of Paris, next to the Bichât Hospital.

A successful candidate has a PhD in Statistics/Biostatistics/Biomathematics, with a strong academic capacity in the form of publications and/or other scientific outputs. You are experienced in Bayesian modeling and inference. It would be desirable if you have experience in pharmacology or infectious disease. The gross salary range between 2 600 and 3000 euros per month.

Applications for this vacancy are to be sent to You will be required to provide a CV and a supporting statement. Only applications received before the 30th of March 2021 will be considered.

Cool! I’ve heard there’s all sorts of new developments coming with Stan’s differential equation modeling, and it’s always good to work on live applications.

Jordana Cepelewicz on “The Hard Lessons of Modeling the Coronavirus Pandemic”

Here’s a long and thoughtful article on issues that have come up with Covid modeling.

Jordana’s a staff writer for Quanta, a popular science magazine funded by the Simons Foundation, which also funds the Flatiron Institute, where I now work. She’s a science reporter, not a statistician or machine learning specialist. A lot of Simons Foundation funding goes to community outreach and math and science education. Quanta aims to be more like the old Scientific American than the new Scientific American; but it also has the more personal angle of a New Yorker article on science (my favorite source of articles on science because the writing is so darn good).

There’s also a film that goes along with Jordana’s article:

I found the comments on YouTube fascinating. Not as off the wall as replies to newspaper articles, but not the informed stats readership of this (Andrew’s) blog, either.

“Smell the Data”

Mike Maltz writes the following on ethnography and statistics:

I got interested in ethnographic studies because of a concern for people analyzing data without an understanding of its origins and the way it was collected. An ethnographer collects stories, and too many statisticians disparage them, calling them “anecdotes” instead of real data. But stories are important; although they only give you single data points, if you have a number of them you can see how different they seem to be. That helps you determine if there is more than one process generating the data. in fact, I’ve noted that the New York Times now often augments an article with a bunch of stories about how different people were affected by the focus of the article.

Here’s the way I [Maltz] described the benefit of ethnography in the introduction to a book (Doing Ethnography in Criminology, Springer 2018) that Steve Rice and I edited:

I’m best known for quantitative, not qualitative, research. An engineer by training, I had never taken any courses in social science when I began teaching in a criminal justice program in 1972. My entire criminal justice experience up to that point was based on my having been a staff member of the National Institute of Justice from 1969 to 1972, and I was hired by NIJ because of my engineering background and my experience in police communications—see below.

My introduction to social science and to the research techniques that were then used by its practitioners began when I joined the criminal justice faculty of the University of Illinois at Chicago. I was put to work teaching social science statistics and, not knowing much about it, used the books that others used before me. But even then I was mystified by the common practice of looking to achieve a low p-value as the be-all and end-all of such research. Untenured and with no experience in the field, I taught what others thought was important. But I soon wised up, and described my concern about the methods used, some 30 years ago (Maltz, 1984, p. 3):

“When I was an undergraduate in engineering school there was a saying: An engineer measures it with a micrometer, marks it with a piece of chalk, and cuts it with an axe. This expression described the imbalance in precision one sometimes sees in engineering projects. A similar phenomenon holds true for social scientists, although the imbalance is in the opposite direction. It sometimes seems that a social scientist measures it with a series of ambiguous questions, marks it with a bunch of inconsistent coders, and cuts it to within three decimal places. Some balance in precision is needed, from the initial measurement process to the final preparation of results.”

And I further expressed my concern about the focus on “statistical significance” in a subsequent article (Maltz, 1994). Ethnography is a welcome and much-needed departure from that type of research. It deals with individual and group behavior that doesn’t conform to statistical or spreadsheet analysis. Yes, an ethnography may just be a single data point, but it often serves as a marker of importance, an exploration of what additional factors should be considered beyond the usual statistics, or as a counterexample to some of the more positivist studies.

In this regard, three examples provide additional context to my strong belief in the need for a qualitative orientation. The first was my initial experience while consulting on police communication systems (true electrical engineering!) for the Boston Police Department from 1966 to 1969. To satisfy my curiosity about the ways of the police, I requested, and was granted, permission to conduct an “experiment” on police patrol. The number of patrol cars in one police district was doubled for a few weeks to see if it had any effect on crime. And it did: compared to the “control” district, which had no arrests, the “experimental” district had six arrests. Moreover, there were no arrests at all for the same time period in either district in the previous area, so I could calculate that p = 0.016, much less than 0.05. What a finding! Police patrol really works!

On debriefing one of the arresting officers, one of the first lessons I learned was that police officers are not fungible. There are no extra police officers hanging around the station that can be assigned to the experimental district: they have to be drawn from somewhere else. The additional officers, who made all of the arrests, were from the BPD’s Tactical Patrol Force—the Marines of the department—who were normally assigned to deal with known trouble spots, and the two districts selected for the study were generally low-crime areas.

In fact, the TPF officers already knew that a gang of car thieves/strippers was active in the experimental district and decided to take them out, which resulted in all of the arrests they made. They couldn’t wait to get back to working citywide, going after real crime, but took the opportunity to clean up what they considered to be a minor problem. So after that experience, I realized that you have to get under the numbers to see how they are generated or, as I used to explain to students, to “smell” the data.

Another example: Some years ago I was asked to be an expert (plaintiff’s) witness in a case in the Chicago suburbs, in which the defendant suburb’s police department was accused of targeting Latino drivers for DUI arrests to fill their arrest quotas. My job was to look at the statistical evidence prepared by another statistician (the suburb’s expert witness) and evaluate its merits. I was able to show that there were no merits to the analysis (the data set was hopelessly corrupted), and the case was settled before I had a chance to testify.

What struck me after the settlement, however, was the geography and timing of the arrests. Most of them occurred on weekend nights on the road between the bars where most of the Latinos went to drink and the areas where they lived. None were located on the roads near the Elks or Lions clubs, where the “good people” bent their elbows.

I blame myself on not seeing this immediately, but it helped me to see the necessity in going beyond the given data and looking for other clues and cues that motivate those actions that are officially recorded. While it may not be as necessary in some fields of study, in criminology it certainly is.

A third example was actually experienced by my wife, who carried out a long-term ethnographic study of Mexican families in Chicago (Farr, 2006) all of whom came from a small village in Michoacán, Mexico. Numerous studies, primarily based on surveys, had concluded that these people were by and large not literate. One Saturday morning in the early 1990s, she was in one of their homes when various children began to arrive, along with two high school students. One of the students then announced (in Spanish, of course), “Ok, let’s get to work on the doctrina (catechism),” and slid open the doors on the side of the coffee table, revealing workbooks and pencils, which she distributed to the kids.

On another occasion, my wife was drinking coffee in the kitchen when all of the women (mothers and daughters) suddenly gathered at the entrance to the kitchen as someone arrived with a plastic supermarket bag full of something—which turned out to be religious books (in Spanish) on topics such as Getting Engaged and After the Children Come Along. Each woman eagerly picked out a book, and one of them said, “I am going to read this with my daughter.”

Clearly these instances indicate that children in the catechism class and the women in the kitchen were literate. The then-current questionnaires that evaluated literacy practices, however, asked questions such as “Do you subscribe to a newspaper? Do you have a library card? Do you have to read material at work?” In other words, the questionnaires (rightly so) didn’t just ask people outright “Can you read?” but rather focused on the domains they thought required reading. Yet no questions dealt with religious literacy, since literacy researchers at the time did not include a focus on religion. The result? The literacy practices of these families were “invisible” to research.

These anecdotes are but three among many that turned me off the then-current methods of learning about social activity, in these cases via (unexamined) data and (impersonal) questionnaires. Perhaps this has to do with my engineering (rather than scientific) background, since engineers deal with reality and scientists propound theories. To translate to the current topic, it conditioned me to take into consideration the social context, a recognition that context matters and that not all attributes of a situation or person can be seen as quantifiable “variables.” This means, for example, that a crime should be characterized by more than just victim characteristics, offender characteristics, time of day, etc. and that an individual should be characterized by more than just age, race, ethnicity, education, etc. or “so-so” (same-old, same-old) statistics. These require a deeper understanding of the situation, which ethnography is best suited, albeit imperfectly, to do—to put oneself in the position, the mindset, of the persons whose actions are under study.

Computation+Journalism 2021 this Friday

This post is by Jessica. Last year I was program chair for Computation+Journalism, a conference that brings together computer scientists and other researchers with journalists to brainstorm about the future of journalism. I spent a bunch of time organizing a program around the theme of uncertainty communication and then massive uncertainty due to covid-19 hit a few weeks before it was scheduled so we canceled it. 

But this Friday Feb 19 we’re back with the same program plus more, including:  

Keynote by Amanda Cox (editor at NYT Upshot), 10 am ET

Keynote by Deen Freelon (Assoc. Prof at UNC), 11:30 am ET

Keynote by David Rothschild (economist at Microsoft Research NYC), 3:30 pm ET

A panel on election forecasting and coverage with David Byler (WaPo), Micah Cohen (FiveThirtyEight), Natalie Jackson (PRRI), and Nick Diakopolous (Northwestern, moderator) 1:30 pm ET

A panel on visualizing data while acknowledging uncertainty with Jen Christiansen (SciAm), Catherine D’Ignazio (MIT), Albert Cairo (UMiami, moderator), and me 5pm ET

Plus contributed papers and sessions on computational social science and politics, algorithmic bias and fairness, uncertainty communication, data privacy, and reporting on covid, among others.

Register here (free), virtually hosted by Northeastern University.

COVID and Vitamin D…and some other things too.

This post is by Phil Price, not Andrew.

Way back in November I started writing a post about my Vitamin D experience. My doctor says I need more, in spite of the fact that I spend lots of time outdoors in the sun. I looked into the research and concluded that nobody really knows how much I need, but on the other hand the downside of taking a supplement is small. Anyway I started to write all of this up, thinking this blog’s readers might be interested in both the specifics (where do the Vitamin D recommendations come from, for instance) and the general approach (how one can, and perhaps should, consider the pros and cons of medical advice). But I never got around to finishing that post and thus it never appeared, and it’s not going to now because someone else has written a Vitamin D post that is much more topical, interesting, and current than mine was going to be: it looks at the question of whether Vitamin D protects against COVID. It is also, I think, a great example of how to think when faced with different sources of information that suggest different things. Some studies say this, some say that, common sense suggests X, but on the other hand it also suggests Y. We all face this kind of situation all the time.

So I’m just going to link to the blog post, later in this post, and recommend that you all go read it. But I want you to read the rest of my post first, so please do that. As a teaser I’m going to post the conclusions from the post I’m sending you to, but the real value of the post isn’t in these conclusions, it’s in the reasoning and research.

Does Vitamin D significantly decrease the risk of getting COVID?: 25% chance this is true. The Biobank and Mendelian randomization studies are strong arguments against this; the latitude, seasonal, and racial differences are only weak evidence in favor.

Does Vitamin D use at a hospital significantly improve your chances?: 25% chance this is true. I trust the large Brazilian study more than the smaller Spanish one, but aside from size and a general bias towards skepticism I can’t justify this very well.

Do the benefits of taking a Vitamin D supplement at a normal dose equal or outweigh the costs for most people?: 75% chance this is true. The risks are pretty low, and it will probably bring you closer to rather than further from a natural range if you’re a modern indoor worker (side effects are few; the most serious is probably kidney stones, so don’t take it if you have any tendency towards that). And maybe some day, after countless false leads and stupid red herrings, one of the claims people make about this substance will actually pan out. Who knows?


Those are the assessments of the blog’s author, Scott Siskind, they aren’t from me. But I think, given what he says in his post, that they’re quite reasonable.

I’m going to say a few words about the blog I’m sending you to, because there’s an interesting story there. Siskind is the guy who used to have the blog called Slate Star Codex. Here is a sample post from that blog that I think might interest the readership of this blog. I was late ‘discovering’ SSC: a friend turned me onto it about a year ago. It is entertaining and informative, and the author (who wrote under the pseudonym Scott Alexander) is great at both thinking about a wide range of topics and explaining how he thinks. But several months ago a New York Times reporter contacted ‘Scott Alexander’ and said the NYT was going to publish a piece about the blog and its readership, and would give Scott’s real name (Scott Siskind). Siskind objected, saying he used a pseudonym because he sometimes wrote about controversial topics and/or said controversial things and that revealing his real name would expose him to repercussions such as losing clients at his business. The NYT did not relent, so Siskind took down his blog, hoping that that would make the story sufficiently irrelevant that the Times wouldn’t run it. And indeed that seems to have happened, although it’s also possible that the editors of the Times took Siskind’s feelings into account. But now Siskind is back, with a new blog published under his own name. And the New York Times has run their article.

That NYT article is…strange. If you read the article, the impression you get about Slate Star Codex is nothing like the impression you get by actually reading Slate Star Codex. The friend who suggested SSC to me a year ago thinks this is an example of the biases of NYT journalists being reflected in their reporting, a suggestion I would have dismissed a couple of years ago but which I now give a fair amount of credence: there is some unhealthy Political Correctness in the Times’s newsroom and my friend has me pretty much convinced that it is having too much influence on the stories they write and the way they write them. Specifically, I suspect that the fact that Siskind wrote about some controversial topics in ways that the Times reporter didn’t like may have led to the odd description of Slate Star Codex in the Times article.

Be that as it may, Siskind has a new blog called Astral Codex Ten, and you should all go read this piece about the evidence about how much Vitamin D does or doesn’t protect against COVID. 


Is the right brain hemisphere more analog and Bayesian?

Oliver Schultheiss writes:

I recently commented one of your posts (I forgot which one) with a reference to evidence suggesting that the right brain hemisphere may be in a better position to handle numbers and probabilistic predictions. Yesterday I came across the attached paper by Filipowicz, Anderson, & Danckert (2016) that may be of some interest to you. It suggests at least 2 things:

First, there is actually a lot more research than I knew about that shows that the right hemisphere is better at intuitive statistics. If parts of it are missing, people have severe problems adapting to probability (changes) and making the right guesses. You don’t get that when the left hemisphere is compromised. In fact, the right hemisphere appears to work like a real Bayesian, with prior beliefs and updating as the data come in (s. Figures 7, 8, and 9). In general, it adapts in a graded, analog fashion to incoming information. This in stark contrast to the left hemisphere, which of course is also capable of predicting what happens next, but does so in a much more digital, either/or, black/white manner (or should I say: significant/non-significant manner?). The approach to statistics that you espouse on your blog and in your books (e.g., Regression & other stories) is decidedly one that is more closely aligned to how the right hemisphere deals with probability and uncertainty than the approach of the left hemisphere.

Second, the authors provide a wonderful demonstration of how we deal behaviorally with probability and uncertainty using the game Plinko (for a demo please see here: — but you need PsychPy to run it). It’s illustrated in Figure 7 and requires players to first state their prior beliefs about how a ball will fall through a grid of pins and how often it will end up in a variety of bins underneath the grid. You can then study how people update their beliefs as the data from the first test runs come in. The beauty of this example is, of course, that the actual probability distribution that emerges over time as close to a Gaussian. But that’s not what everybody expects. Some peoples’ priors are bimodal, some believe in a rather jagged kind of distribution, and I guess other priors are possible too. This might be a nice teaching tool for the kind of intuitive Bayesianism our right hemisphere engages in (and which vanishes or becomes distorted after right-hemisphere damage).

Perhaps you’ve already seen either the paper or the Plinko game before. I was very impressed by this review paper, because I hadn’t been aware how much is already known about hemispheric differences in statistical reasoning.

I know nothing about this! But it seems interesting, so I’ll share it. I hadn’t thought about Regression and Other Stories as being a right-brain-style book!

Who are the culture heroes of today?

When I was a kid, the culture heroes were Hollywood and TV actors, pop musicians, athletes and coaches, historical political and military figures, then I guess you could go down the list of fame and consider authors, artists, scientists and inventors . . . . that’s about it, I think.

Nowadays, we still have actors, athletes, and some historical figures—but it’s my impression that musicians are no longer “rock stars,” as it were. Sure, there are a few famous pop musicians and rappers at any given time, along with legacy figures like Bruce etc., but I don’t feel like musicians are culture heroes the way they used to be. To put it another way: there are individual pop stars, but just a few, not a whole galaxy of them as there used to be.

The big replacement is business executives. 40 or 50 years ago, you’d be hard pressed to name more than two or three of these guys. Lee Iacocca, Steve Jobs, . . . . that was about it. Maybe the guy who owned McDonalds, and some people like Colonel Sanders and Crazy Eddie who advertised on TV. Nowadays, though, there’s a whole pantheon of superstar executives, kind of parallel to the pantheon of Hollywood actors or sports stars. Cuddly executives in the Tom Hanks mode, no-nonsense get-the-job-done Tom Brady types, trailblazing Navratilovas, goofballs, heroes, heels, the whole story. We have executives who some people worship and others think are ridiculously overrated.

That’s part of the point, I guess. Culture heroes and villains don’t stand alone; they’re part of a pantheon of characters as with Olympian gods or a superhero universe, each with his or her unique characteristics. We love (or love to hate) Bill Gates or Elon Musk, not just for their own accomplishments and riches but also for how they fit into this larger story. We can define ourselves in part with who we root for in the Tesla/GM/Volkswagen struggle, or where we fall on the space bounded by corporate normies like Bill Gates, outlaws like John McAfee, and idealists like Richard Stallman. And people like Martin Shkreli and Elizabeth Holmes are not just failed businesspeople; they’re “heels” who we can root against or root for in the latest business throwdown. The particular examples you care about might differ, but in whatever arena you care about, the ever-changing pantheon of execs at the top make for a set of story arcs comparable to those of Joan Crawford and other movie stars from the 1950s.

As noted above, we also still have actors, athletes, and historical figures. There have been some changes here. The “actors” category used to be some mix of movie stars, TV stars, talk show hosts, and sex symbols. These are still there, but I feel like it’s blurred into a more general “celebrity” category. The “athletes” category seems similar to before, even if it’s not always the same sports being represented. Similarly with the historical figures: we’re now more multicultural about it, but I think it’s the same general feeling as before.

Also, I feel like we hear more about politicians than we used to. Back in the 1970s you’d hear about whoever was the current president, and some charismatic others such as Ronald Reagan, and . . . that was about it. I don’t recall the Speaker of the House or the Senate majority or minority leader being household names. I guess that part of this was that congress had one-party control back then, which gave the party leaders less important as individuals.

P.S. The above could use some systematic social science thought and measurement, but I thought there’d be some value in starting by throwing these ideas out there.

P.P.S. Carlos reminds us that we had a related discussion a few months aga. I guess it really is time for me to move from the speculation to the social-science stage of the investigation already.

Webinar: Some Outstanding Challenges when Solving ODEs in a Bayesian context

This post is by Eric.

This Wednesday, at 12 pm ET, Charles Margossian is stopping by to talk to us about solving ODEs using Bayesian methods. You can register here.

If you want to get a feel for the types of issues he will be discussing, take a look at his (and Andrew’s) recent case study: “Bayesian Model of Planetary Motion: exploring ideas for a modeling workflow when dealing with ordinary differential equations and multimodality.”


Many scientific models rely on differential equation-based likelihoods. Some unique challenges arise when fitting such models with full Bayesian inference. Indeed as our algorithm (e.g. Markov chain Monte Carlo) explores the parameter space, we must solve, not one, but a range of ODEs whose behaviors can change dramatically with different parameter values. I’ll present two examples where this phenomenon occurs: a classical mechanics problem where the speed of the solver differs for different parameter values; and a pharmacology example, wherein the ODE behaves in a stiff manner during the warmup phase but becomes non-stiff during the sampling phase. We’ll then have a candid discussion about the difficulties that arise when developing a modeling workflow with ODE-based models and brainstorm some ideas on how to move forward.

The video is now available here.

Creatures of their time: Shirley Jackson and John Campbell

I recently read two excellent biographies of literary figures:

“Shirley Jackson: A Rather Haunted Life,” by Ruth Franklin,

“Astounding: John W. Campbell, Isaac Asimov, Robert A. Heinlein, L. Ron Hubbard, and the Golden Age of Science Fiction,” by Alec Nevala-Lee.

Franklin’s is a traditional literary biography, going through Jackson’s life in fine detail and focusing on her writing, while Nevala-Lee offers more of a view from 30,000 feet, telling lots of great stories but in some places skipping quickly over decades of his subjects’ lives—unavoidable, I guess, given that he’s writing about four authors, not just one.

Both of these are books about cult figures in literature, and the biographers handle this in different ways. Franklin’s particularly interested in Jackson’s literary output; she writes a lot about Jackson’s style, content, and influences; and she’s a partisan, arguing that Jackson deserves respect and should not simply be considered as an upmarket horror writer. For me to understand this argument better, I’d like to see comparisons to some other authors such as V. C. Andrews or Steven King who have more of a mass-market feel, not to mention modern young-adult novels such as Twilight, Gone, etc. I’m curious if Franklin thinks that Jackson’s novels are serious and these others are mere pulp, of if she (Franklin) would argue in favor of the literary merits of the entire genre.

Nevala-Lee goes in the opposite direction, almost never considering the literary quality, or even the experience of reading, the short stories and novels that come up in his narrative. Nevala-Lee’s book is very readable and has lots of fascinating material on the life and times of his subjects, but I was kinda disappointed not to hear more about the science fiction stories themselves—what made them work or not work, how readable are they today, etc. I’m not just talking here about discussions of literary style; also I’d like to see more on the actual content of these stories. There was lots of fascinating stuff on the collaboration between editor and authors, just not so much on the final product. The other difference compared to Franklin is that Nevala-Lee is not a partisan of the authors he writes about; indeed, he spends a lot of time on their various personal and political flaws. Of course Nevala-Lee values these writers—otherwise he wouldn’t have written a book about them—but he doesn’t spend much time trying to bolster their status.

I recommend both books, even though they’re very different. We’ve talked before about the lack of overlap in the communities of literary and genre fiction, and you see that complete lack of overlap in these two books as well.

Shirley Jackson actually published a story in Fantasy & Science Fiction magazine, so I guess some connection could be made, but Nevala-Lee doesn’t really discuss the non-SF world at all, while Franklin mentions science fiction on only one page her biography, as a lead-in to the non-realistic elements of Jackson’s novel, The Sundial.

The most interesting thing I noticed when reading these two biographies, though, was something not explicitly mentioned in either book, and this is how much Jackson and Campbell et al. were people of their time.

Campbell was born in 1910, Jackson in 1916, and they both had success in their thirties and forties, smack in the middle of the twentieth century. And what charmingly mid-century people they were! They drank like John Cheever characters. Jackson and her husband were bohemians who listened to jazz records. As for Campbell et al. . . . I didn’t take notes when reading the book, and I can’t pick out any particular bits, but let me assure you that, when I was reading it, I kept thinking about Shirley Jackson. No similarities between the people, but they were just so “of their time” in how they lived and expressed themselves. I guess this struck me because, as authors, Jackson and Campbell etc. were writing stories that were not particularly time-bound. If you tell me John O’Hara was a man of his time, I’d say, sure, that’s what I’d expect, given that he was a sort of literary sociologist. But writers of parables or science fiction, that’s different.

One other thing. In the second half of his life, Campbell became an enthusiast for all sorts of pseudoscience. Regarding one particularly ridiculous idea, Campbell wrote, “I have a Campbell Machine, derived from the Hieronymus Machine, that works, too. Only it’s based on something so insane that it makes the Hieronymus Machine look as conventional as a shovel.”

“Something so insane,” indeed.

But here’s the kicker. According to Nevala-Lee, “There were inquiries from Bell Aircraft and the RAND Corporation, and Claude Shannon offered to test it, although the timing never worked out.”

People were such suckers back then! Now I understand why Martin Gardner felt the need to write that book. Back in the 1950s, educated people believed all sorts of ridiculous things that they wouldn’t believe today, unless they had some sort of political motivation.

The whole thing gives me a new take on those Heinlein stories where a genius builds a time machine in his basement. It’s like they thought this was a realistic scenario.

“Our underpowered trial provides no indication that X has a positive or negative effect on Y”

It’s rare to see researchers say flat-out that an experimental result leaves them uncertain. There seems to be such a temptation to either declare victory with statistical significance (setting the significance level to 0.1 if necessary to clear the bar) or to claim that weak and noisy results are “suggestive” or, conversely, to declare non-significance as evidence of no effect.

But . . . hey! . . . check this out:

Under the heading, “The one med paper in existence that was somewhat ok with uncertainty?,” Zad points to this article, “Randomized Trial of Nocturnal Oxygen in Chronic Obstructive Pulmonary Disease,” by Yves Lacasse in the New England Journal of Medicine:

Abstract Conclusions: “Our underpowered trial provides no indication that nocturnal oxygen has a positive or negative effect on survival or progression to long-term”

Full-text Discussion: “Our trial did not show evidence of an effect of nocturnal oxygen therapy on survival or progression to long-term oxygen therapy in patients with COPD with isolated nocturnal oxygen desaturation. Because enrollment in the trial was stopped before we had reached our proposed sample size, the trial was underpowered, with the consequence of a wide confidence interval around the point estimate of the absolute difference in risk between the trial groups at 3 years of follow-up. The data that were accrued could not rule out benefit or harm from nocturnal oxygen and included the minimal clinically important difference determined before the trial. However, nocturnal oxygen had no observed effect on secondary oucomes, including exacerbation and hospitalization rates and quality of life. Furthermore, the duration of exposure to nocturnal oxygen did not modify the overall effect of therapy. Because our trial did not reach full enrollment, it makes sense to view our results in the context of other results in the field. A systematic review of the effect of home oxygen therapy in patients with COPD with isolated nocturnal desaturation identified two published trials that examined the effect of nocturnal oxygen on survival and progression to long-term oxygen therapy.”


What’s your “Mathematical Comfort Food”?

Darren Glass, editor of the book review section of the American Mathematical Monthly, writes,

For this month’s column, I [Glass] thought that, rather than provide an in-depth review of a new monograph, I would ask a number of members of our community about some of the “mathematical comfort food” that they have turned to or that they might recommend people seek out.

A couple days ago I shared my recommendation, Proofs and Refutations.

Here are the books and other products recommended by the math teachers who were asked to contribute to this review:

Number Theory, by Edit Gyarmati, Paul Turán, and Róbert Freud

The Symmetries of Things by John H. Conway, Heidi Burgiel, and Chaim Goodman- Strauss

Black Mathematicians and Their Works, edited by Virginia K. Newell, Joella H. Gipson, L. Waldo Rich, and Beauregard Stubblefield

The movies Hidden Figures and October Sky

The article Computing Machinery and Intelligence, by Alan Turing, and the book Alan Turing: The Enigma, by Andrew Hodges

Origami Journey: Into the Fascinating World of Geometric Origami, by Dasa Severova.

Crocheting Adventures With Hyperbolic Planes, by Daina Taimina.

The game Prime Climb, by Dan Finkel and friends

An Imaginary Tale: The Story of √−1, by Paul Nahin

A Topological Picturebook, by George Francis

Proofs and Refutations by Imre Lakatos.

These were not what I would’ve expected! I thought my suggestion of Lakatos was a little bit offbeat, but the others are even more surprising.

If you’d asked me what a bunch of mathematicians would’ve selected as their mathematical comfort food, I’d’ve thought the list would have included some of these:

Mathematical Snapshots, by Hugo Steinhaus

Just about anything by Martin Gardner

Maybe something by Smullyan too

Can one hear the shape of a drum, by Mark Kac

How long is the coast of Britain, by Benoit Mandelbrot.

I’d add Eric Temple Bell’s Men of Mathematics but maybe that’s too old-fashioned to get a recommendation.

I don’t have any problem with the items in the first list above; they’re just not what first comes to mind when I think of mathematical comfort food. But it could be that the others were just like me and were trying to avoid the obvious choices. Maybe next month the journal can run a set of reviews, Canonical Mathematical Comfort Food.

Where can you find the best CBD products? CBD gummies made with vegan ingredients and CBD oils that are lab tested and 100% organic? Click here.