Skip to content

Pittsburgh by Frank Santoro

Last year we discussed a silly study, and that lead us to this interesting blog by Chris Gavaler, which pointed me to a recent picture storybook, Pittsburgh, by Frank Santoro. The book was excellent. I don’t have any insights to share here; I just wanted to thank Santoro for writing the book and Gavaler for his thoughtful review.

P.S. Gavaler reviews Imbattable! His discussion is reasonable, but there’s something disorienting about seeing the words translated into English.

Meta-meta-science studies

August Wartin asks:

Are you are familiar with any (economic) literature that attempts to model academia or the labor market for researchers (or similar), incorporating stuff like e.g. publication bias, researcher degrees of freedom, the garden of forking paths etcetera (and that perhaps also discusses possible proposals/mechanisms to mitigate these problems)? And perhaps you might know any empirical (economic) literature evaluating the effect of some policy measures to mitigate these problems?

I send this along to Paul Smaldino who wrote a paper, The Natural Selection of Bad Science, a few years ago, and who has more papers on the topics on his website. Smaldino added:

There are an increasing number of models of these processes. Here’s a recent paper I like that I think is a nice, simple model.

This article, by George Akerlof and Pascal Michaillat, seems similar to another paper by those authors that we discussed here a few years ago.

Now that there’s a whole subfield of meta-science studies where researchers construct and compute theoretical models of the scientific process, the next step is meta-meta-science, where we have models of the incentive and communication structure of meta-science. And then meta-meta-meta-science . . .

Also I’m interested in any answers to the question that Wartin asked in the last sentence of his above-quoted email.

She’s thinking of buying a house, but it has a high radon measurement. What should she do?

Someone wrote in with a question:

My Mom, who has health issues, is about to close on a new house in **, NJ. We just saw that ** generally is listed as an area with high radon. If the house has a radon measurement over 4 and the seller puts vents to bring it into compliance, how likely is it that the level will return to 4 shortly thereafter?

Also is 4 a safe amount? I understand that is the EPA guideline while the World Health Organization suggests 2.7. Which level do you consider appropriate?

I forwarded this to my colleague Phil Price (known as “Phil” to you blog readers), who worked for many years as a scientist in the Indoor Environment Department at Lawrence Berkeley National Laboratory. Phil replied:

Unfortunately nobody knows exactly how dangerous it is to live in a house with a radon concentration around 4 pCi/L. Different countries have standards as low as 2 and as high as 10. It’s clear that at levels as high as 12 or 20 pCi/L the residents are at substantially increased risk of lung cancer, but at lower concentrations the danger is not high enough to stick out definitively from the noise, so it’s really hard to know how to extrapolate.

Here are a few facts and observations that might help you decide for yourself, and then I’ll follow up with my personal recommendation which is based on what I [Phil] would do and is not necessarily a recommendation that is good for you or your mom.

Actually, let me start with an answer to your question: if the house has acceptable radon concentrations now (averaged over a full year) then that is unlikely to change spontaneously over time; however, changes to the house or its operation could change things, e.g. if your mom starts heating or cooling parts of the house that have previously been unoccupied. On shorter timescales, the radon concentration is always varying, so a measurement made at one moment

Now, on to those facts I mentioned:
1. The relevant number is the radon concentration in the living areas of the home. That seems obvious, but a lot of measurements are made in unfinished basements or in other areas where people don’t spend much time, such as a crawlspace under the house.
2. Also, the relevant number is the radon concentration averaged over a long period of time — months or years — not the concentration at a given moment.
3. The radon concentration in a basement is normally substantially higher than on the ground floor or above.
4. The indoor radon concentration can vary a lot with time: it might be twice as high in winter as in summer, on average, and it might be four times as high in the highest winter week as it is in the lowest.
5. For most homes, radon mitigation can be performed for under $2500 that will work for many years; you can google “radon sub slab depressurization” to read about this.

Finally, I’ll tell you what I would do. But my choice would not just depend on facts about radon, but also on my personality, so this might not be what your mom should do.

I would buy the house and move in. I would perform a year-long radon test on the lowest level of the home in which I spend time (for instance, the basement if I spent time in a hobby room down there or something, otherwise the ground floor) and after a year I would check the results and hire a radon mitigation company if the result was higher than I thought was safe. I think the 4 pCi/L level is reasonable for making that decision, but if it came back at 3.6 pCi/L or something, maybe I would mitigate even though that is below the ‘action level’.

The idea of waiting that long, knowing that for the whole year you’re being exposed at above the recommended concentration, would freak some people out. If I felt that way, then in addition to the long-term test I might do a short-term test every few months, and if any of the results were really high, I’d hire a mitigation company. But even if the long-term average is fine, the measurement over a short period might be pretty high, so I wouldn’t be too bothered by a single short-term test coming in at 4 or 6 pCi/L, I’d just wait until I had the long-term result. Again, that’s just me. Here is a company that offers various combinations of short- and long-term tests. I have no relationship with them whatsoever, I just looked them up on the internet like anyone could do.

Finally: your mom could consider calling a radon mitigation company and seeing what they say. Most likely they will say they can mitigate just about any house to below 4 pCi/L in the living area; they might even offer a guarantee. Then she could buy the house and go ahead and have a sub-slab depressurization system installed. Odds are pretty good that that would be a waste of a few thousand dollars, but it would give her peace of mind and if it lets her buy a house she likes, it might well be worth it.

Evidence-based medicine eats itself in real time

Robert Matthews writes:

This has just appeared in BMJ Evidence Based Medicine. It addresses the controversial question of whether lowering LDL using statins leads to reduced mortality and CVD rates.

The researchers pull together 35 published studies, and then assess the evidence of benefit – but say a meta-analysis is inappropriate, given the heterogeneity of the studies. Hmmm, ok, interesting….But what they do instead is astounding. They just tot up the numbers of studies that failed to reach statistical significance, and interpret each as a “negative” study – ie evidence that the LDL reduction had no effect. They ignore the point estimates completely.

They then find that only 1 of the 13 (8%) of the studies that achieved an LDL target showed statistically significant benefit in mortality risk, and 5 of the 13 (39%) showed the same for CVD. They do the same for those studies that failed to reach the LDL target.

Incredibly, this leads them to conclude: “The negative results [sic] of numerous cholesterol lowering randomised controlled trials call into question the validity of using low density lipoprotein cholesterol as a surrogate target for the prevention of cardiovascular disease”.

The paper has several other major flaws (as others have pointed out). But surely the stand-out blunder is the “Absence of evidence/evidence of absence” fallacy. I cannot recall seeing a more shocking example in a “serious” medical journal. Whatever the paper does call into question, top of the list must be how this came to be published – seemingly without irony – in BMJ Evidence Based Medicine.

If nothing else, the paper might serve as useful teaching material..

P.S. For what it’s worth, even a simple re-analysis focusing solely on the point estimates produces a radically different outcome.

BTW, for some reason, the authors have included studies whose mortality benefit is stated as “NR” (Not Reported) in their tables.

Also, this means the percentages used to calculate their bar charts are also incorrect (eg of those studies that met their LDL reduction targets, there are only 10, not 13, studies that allow the mortality benefit to be calculated).

“British Medical Journal Evidence Based Medicine,” huh? Remember our earlier post, Evidence-based medicine eats itself.

Evidence Based Medicine is more than a slogan. It’s also a way to destroy information!

Stan short course in July

Jonah Gabry is teaching a Stan short course! He’s done it before and I’ve heard that it’s excellent. Here’s the information:

Dates: Wed Jul 14 – Fri Jul 16
Location: online

Learn Bayesian Data Analysis and Stan with Stan Developer Jonah Gabry

The course consists of three main themes: Bayesian inference and computation; the Stan modeling language; applied statistics/Bayesian data analysis in practice. There will be some lectures to cover important concepts, but the course will also be heavily interactive, with much of the time dedicated to hands-on examples. We will be interfacing with Stan from R, but users of Python and other languages/platforms can still benefit from the course as all of the code we write in the Stan language (and all of the modeling techniques and concepts covered in the course) can be used with any of the Stan interfaces.

Lumley on the Alzheimer’s drug approval

Last week we discussed the FDA’s controversial approval of a new Alzheimer’s drug. Here’s more on the topic from statistician Thomas Lumley, who knows more about all of this than I do:

[Cautious optimism is] a very sensible attitude, in the abstract: if the drug turns out to be effective it could be valuable, but it’s too early to know if that will be the case. What’s surprising is that this is the situation we’re in after the drug has been approved, and when its manufacturer is planning to charge US$56000/year for it.

The drug (or, technically, the ‘biologic’ since it’s an antibody) has been through a lot of ups and downs in its clinical trial history. There were two main trials that were supposed to show it was effective. They failed. A re-analysis of one of them suggested that it might actually work, at least for some patients. Normally, this would be the cue to do a confirmatory trial to see if it does actually help an identifiable group of people. And the FDA did mandate this trial — but they will let the manufacturer sell and promote the medication for nine years while the trial goes on. Given that the the market for aducanumab is conservatively estimated at tens of millions of dollars per day, and there’s only a possible downside to getting trial results, the trial is unlikely to end a day sooner than it has to; it’s not unheard of for these post-approval trials to just never recruit enough participants and drag on longer than ‘allowed’.

The FDA takes external expert advice on drug approvals. In this case, there were 11 people on the panel. Exactly none of them thought there was good enough evidence for approval; one was uncertain, ten were against. Three of the panel members have since resigned. It’s not unprecedented for the FDA to disagree with the panel when the panel vote is split, but it’s pretty bloody unusual for them to disagree with a unanimous panel. It’s notable that the FDA approval does not say they think there’s evidence drug improves memory or cognition or ability to live independently or anything like that . . .

I saw the link from Joseph Delaney, who adds some thoughts of his own.

Guttman points out another problem with null hypothesis significance testing: It falls apart when considering replications.

Michael Nelson writes:

Re-reading a classic from Louis Guttman, What is not what in statistics, I saw his “Problem 2” with new eyes given the modern replication debate:

Both estimation and the testing of hypotheses have usually been restricted as if to one-time experiments, both in theory and in practice. But the essence of science is replication: a scientist should always be concerned about what will happen when he or another scientist repeats his experiment. For example, suppose a confidence interval for the population mean is established on the basis of a single experiment: what is the probability that the sample mean of the next experiment will fall in this interval? The level of confidence of the first experiment does not tell this. … The same kind of issue, with a different twist, holds for the testing of hypotheses. Suppose a scientist rejects a null hypothesis in favour of a given alternative: what is the probability that the next scientist’s experiment will do the same? Merely knowing probabilities for type I and type II errors of the first experiment is not sufficient for answering this question. … Here are some of the most realistic problems of inference, awaiting an answer. The matter is not purely mathematical, for the actual behaviour of scientists must be taken into account. (p.84)

This statement, literally as old as me [Nelson], both having been “issued” in 1977, is more succinct and more authoritative than most summaries of the current controversy. Guttman is also remarkably prescient in his intro as to the community’s reaction to this and other problems he highlights with conventional approaches:

An initial reaction of some readers may be that this paper is intended to be contentious. That is not at all the purpose. Pointing out that the emperor is not wearing any clothes is in the nature of the case somewhat upsetting. … Practitioners…would like to continue to believe that “since everybody is doing it, it can’t be wrong”. Experience has shown that contentiousness may come more from the opposite direction, from firm believers in unfounded practices. Such devotees often serve as scientific referees and judges, and do not refrain from heaping irrelevant criticisms and negative decisions on new developments which are free of their favourite misconceptions. (p. 84)

Guttman also makes a point I hadn’t really considered, nor seen made (or refuted) in contemporary arguments:

Furthermore, the next scientist’s experiment will generally not be independent of the first’s since the repetition would not ordinarily have been undertaken had the first retained the null hypothesis. Logically, should not the original alternative hypothesis become the null hypothesis for the second experiment?

He also makes the following, almost parenthetical statement, cryptic to me perhaps because of my own unfamiliarity with the historical arguments against Bayes:

Facing such real problems of replication may lead to doubts about the so-called Bayesian approach to statistical inference.

No one is perfect!

My reaction: Before receiving this email, I’d never known anything about Guttman, I’d just heard of Guttman scaling, that’s all. The above-linked article is interesting, and I guess I should read more by him.

Regarding the Bayes stuff: yes, there’s a tradition of anti-Bayesianism (see my discussions with X here and here), and I don’t know where Guttman fits into that. The specific issue he raises may have to do with problems with the coherence of Bayesian inference in practice. If science works forward from prior_1 to posterior_1 which becomes prior_2, then is combined with data to yield posterior_2 which becomes prior_3, and so forth, then this could create problems for analysis of an individual study, as we’d have to be very careful about what we’re including in the prior. I think these problems can be directly resolved using hierarchical models for meta-analysis, but perhaps Guttman wasn’t aware of then-recent work in that area by Lindley, Novick, and others.

Regarding the problems with significance testing: I think Guttman got it right, but he didn’t go far enough, in my opinion. In particular, he wrote, “Logically, should not the original alternative hypothesis become the null hypothesis for the second experiment?”, but this wouldn’t really work, as null hypotheses tend to be specific and alternatives tend to be general. I think the whole hypothesis-testing framework is pointless, and the practical problems where it’s used can be addressed using other methods based on estimation and decision analysis.

Neel Shah: modeling skewed and heavy-tailed data as approximate normals in Stan using the LambertW function

Neel Shah, one of Stan’s Google Summer of Code (GSOC) interns, writes:

Over the summer, I will add LambertW transforms to Stan which enable us to model skewed and heavy-tailed data as approximate normals. This post motivates the idea and describes the theory of LambertW × Z random variables.

Though the normal distribution is one of our go-to tools for modeling, the real world often generates observations that are inconsistent with it. The data might appear asymmetric around a central value as opposed to bell-shaped or have extreme values that would be discounted under a normality assumption. When we can’t assume normality, we often have to roll up our sleeves and delve into a more complex model. But, by using LambertW × Z random variables it is possible for us to model the skewness and kurtosis from the data. Then, we continue with our model as if we had a normal distribution. Later, we can back-transform predictions to account for our skewness and kurtosis.

In the first part, we introduce the LambertW function, also known as the product logarithm. Next, we discuss skewness and kurtosis (measures of asymmetry and heavy-tailedness), define the LambertW × Z random variables, and share our implementation plans. Finally, we demonstrate how LambertW transforms can be used for location-hypothesis testing with Cauchy-simulated data.

To simplify matters, we are focusing on the case of skewed and/or heavy-tailed probabilistic systems driven by Gaussian random variables. However, the LambertW function can also be used to back-transform non-Gaussian latent input. Because Stan allows us to sample from arbitrary distributions, we anticipate that LambertW transforms would naturally fit into many workflows.

The full story, with formulas and graphs, is here. It’s great to see what’s being done in the Stan community!

Wow, just wow. If you think Psychological Science was bad in the 2010-2015 era, you can’t imagine how bad it was back in 1999

Shane Frederick points us to this article from 1999, “Stereotype susceptibility: Identity salience and shifts in quantitative performance,” about which he writes:

This is one of the worst papers ever published in Psych Science (which is a big claim, I recognize). It is old, but really worth a look if you have never read it. It’s famous (like 1400 citations). And, mercifully, only 3 pages long.

I [Frederick] assign the paper to students each year to review. They almost all review it glowingly (i.e., uncritically).

That continues to surprise and disappoint me, but I don’t know if they think they are supposed to (a politeness norm that actually hurts them given that I’m the evaluator) or if they just lack the skills to “do” anything with the data and/or the many silly things reported in the paper? Both?

I took a look at this paper and, yeah, it’s bad. Their design doesn’t seem so bad (low sample size aside):

Forty-six Asian-American female undergraduates were run individually in a laboratory session. First, an experimenter blind to the manipulation asked them to till out the appropriate manipulation questionnaire. In the female-identity-salient condition, participants (n = 14) were asked [some questions regarding living on single-sex or mixed floors in the dorm]. In the Asian- identity-salient condition, participants (n = 16) were asked [some questions about foreign languages and immigration]. In the control condition, participants (n = 16J were asked [various neutral questions]. After the questionnaire, participants were given a quantitative test that consisted of 12 math questions . . .

The main dependent variable was accuracy, which was the number of mathematical questions a participant answered correctly divided by the number of questions that the participant attempted to answer.

And here were the key results:

Participants in the Asian-identity-salient condition answered an average of 54% of the questions that they attempted correctly, participants in the control condition answered an average of 49% correctly, and participants in the female-identily-salient condition answered an average of 43% couectly. A linear contrast analysis testing our prediction that participants in the Asian-identity-salient condition scored the highest, participants in the control condition scored in the middle, and participants in the female-identity-salient condition scored the lowest revealed that this pattern was significant, t(43) = 1.86, p < .05. r = .27. . . .

The first thing you might notice is that a t-score of 1.86 is not usually associated with “p < .05"--in standard practice you'd need the t-score to be at least 1.96 to get that level of statistical significance--but that's really the least of our worries here. If you read through the paper, you'll see lots and lots of researcher degrees of freedom, also lots of comparisons of statistical significance to non-significance, which is a mistake, and even more so here, given that they’re giving themselves license to decide on an ad hoc basis whether to count each particular comparison as “significant” (t = 1.86), “the same, albeit less statistically significant” (t = 0.89), or “no significant differences” (they don’t give the t or F score on this one). This is perhaps the first time I’ve ever seen a t score less than 1 included in the nearly-statistically-significant category. This is stone-cold Calvinball, of which it’s been said, “There is only one permanent rule in Calvinball: players cannot play it the same way twice.”

Here’s the final sentence of the paper:

The results presented here cleariy indicate that test performance is both malleable and surprisingly susceptible to implicit sociocultural pressures.

Huh? They could’ve saved themselves a few bucks and not run any people at all in the study, just rolled some dice 46 times and come up with some stories.

But the authors were from Harvard. I guess you can get away with lots of things if you’re from Harvard.

Why do we say this paper is so bad?

Why do we say this paper is so bad? There’s no reason to suspect the authors are bad people, and there’s no reason to think that the hypothesis they’re testing is wrong. If they could do a careful replication study with a few thousand students at multiple universities, the results could very well turn out to be consistent with their theories. Except for the narrow ambit of the study and the strong generalizations made from just tow small groups of students, the design seems reasonable. I assume the experiments were described accurately, the data are real, and there were no pizzagate-style shenanigans going on.

But that’s my point. This paper is notably bad because nothing about it is notable. It’s everyday bad science, performed by researchers at a top university, supported by national research grants, published in a top journal, cited 1069 times when I last checked—and with conclusions that are unsupported by the data. (As I often say, if the theory is so great that it stands on its own, fine: just present the theory and perhaps some preliminary data representing a pilot study, but don’t do the mathematical equivalent of flipping a bunch of coins and then using the pattern of heads and tails to tell a story.)

Routine bad science using routine bad methods, the kind that fools Harvard scholars, journal reviewers, and 1600 or so later researchers.

From a scientific standpoint, things like pizzagate or that Cornell ESP study or that voodoo doll study (really) or Why We Sleep or beauty and sex ratio or ovulation and voting or air rage or himmicanes or ages ending in 9 or the critical positivity ratio or the collected works of Michael Lacour—these are miss the point, as each of these stories has some special notable feature that makes them newsworthy. Each has some interesting story, but from a scientific standpoint each of these cases is boring, involving some ridiculous theory or some implausible overreach or some flat-out scientific misconduct.

The case described above, though, is fascinating in its utter ordinariness. Scientists just going about their job. Cargo cult at its purest, the blind peer-reviewing and citing the blind.

I guess the Platonic ideal of this would a paper publishing two studies with two participants each, and still managing to squeeze out some claims of statistical significance. But two studies with N=46 and N=19, that’s pretty close to the no-data ideal.

Again, I’m sure these researchers were doing their best to apply the statistical tools they learned—and I can only assume that they took publication in this top journal as a signal that they were doing things right. Don’t hate the player, hate the game.

P.S. One more thing. I can see the temptation to say something nice about this paper. It’s on an important topic, their results are statistically significant in some way, three referees and a journal editor thought it was worth publishing in a top journal . . . how can we be so quick to dismiss it?

The short answer is that the methods used in this paper are the same methods used to prove that Cornell students have ESP, or that beautiful people have more girls, or embodied cognition, or all sorts of other silly things that the experts used to tell us “have no choice but to accept that the major conclusions of these studies are true.”

To say that the statistical methods in this paper are worse than useless (useless would be making no claims at all; worse than useless is fooling yourself and others into believing strong and baseless claims) does not mean that the substantive theories in the paper are wrong. What it means is that the paper is providing no real evidence for its theories. Recall the all-important distinction between truth and evidence. Also recall the social pressure to say nice things, the attitude that by default we should believe a published or publicized study.

No. This can’t be the way to do science: coming up with theories and then purportedly testing them by coming up with random numbers and weaving a story based on statistical significance. It’s bad when this approach is used on purpose (“p-hacking”) and it’s bad when done in good faith. Not morally bad, just bad science, not a good way of learning about external reality.

Job opening at the U.S. Government Accountability Office

Sam Portnow writes:

I am a statistician at the U.S. Government Accountability Office, and we are hiring for a statistician. The full job announcement is below. Personally, I think our office is a really great place to do social science research within the federal government.

———————————————————————-

The U.S. Government Accountability Office (GAO) has two vacancies for senior statisticians / mathematical statisticians to support teams on the selection and implementation of appropriate statistical methodologies for engagements in accordance with GAO and professional standards. There are separate links for federal employees and the general public.

Current/former Federal Employees:
https://www.usajobs.gov/GetJob/ViewDetails/603773100

Open to the Public:
https://www.usajobs.gov/GetJob/ViewDetails/603773300

This position is located in the Applied Research and Methods (ARM), Center for Statistics and Data Analysis (CSDA). ARM is made up of professionals with expertise in designing and executing appropriate methodologies that help GAO improve the performance and ensure the accountability of the federal government. ARM’s Centers offer expertise in many areas including, engagement design, qualitative and quantitative social science research methods, economics, data analysis, evaluation, statistics, surveys, and future-oriented analyses. In addition, technical chiefs provide expertise in economics, actuarial science and accounting. For more see https://www.gao.gov/about/careers/our-teams

We strive for a workplace that values, respects, and treats all employees fairly and a culture that fosters diversity and inclusion. Read more about why GAO is a great place to work: https://www.gao.gov/about/careers

Please note the closing date of June 23.

Questions? Please contact: Jared Smith, CSDA Director at:
email:SmithJB@gao.gov;
phone: (202) 512-3572

Cool!

Progress!

This came in a mass email:

Statistical Horizons is excited to present Applied Bayesian Data Analysis taught by Dr. Roy Levy on Thursday, February 18–Saturday, February 20. In this seminar, you will get both a practical and theoretical introduction to Bayesian methods in just 3 days.

Topics include:
Model construction
Specifying prior distributions
Graphical representation of models
Practical aspects of Markov chain Monte Carlo (MCMC) estimation
Model comparisons
Evaluating hypotheses and data-model fit
Each day will follow this schedule:
10:00am-2:00pm ET: Live lecture via Zoom
4:00pm-5:00pm ET: Live “office hour” via Zoom (Thursday and Friday only)

I don’t know Statistical Horizons, Dr. Roy Levy, and I have no idea whether this course is any good. It could be wonderful, or not; I just don’t know.

But what makes me happy is the list of topics that they’re covering. Back when I was a student, if there were a short course on Bayesian statistics, it would’ve been all about the theory of optimal estimators, Jeffreys priors, crap like that. But now its all so . . . applied. And relevant. They even have something on evaluating fit of model to data, a topic that never would’ve come up in the bad old days. I’m so happy to see this! Even more so because I don’t know this group, which implies that this Bayesian data analysis perspective is in places I hadn’t even been aware of.

Webinar: Theories of Inference for Data Interactions

This post is by Eric.

This Thursday, at 12 pm ET, Jessica Hullman is stopping by to talk to us about theories of inference for data interactions. You can register here.

Abstract

Research and development in computer science and statistics have produced increasingly sophisticated software interfaces for interactive and exploratory analysis, optimized for easy pattern finding and data exposure. But design philosophies that emphasize exploration over other phases of analysis risk confusing a need for flexibility with a conclusion that exploratory visual analysis is inherently “model free” and cannot be formalized. I will motivate how without a grounding in theories of human statistical inference, research in exploratory visual analysis can lead to contradictory interface objectives and representations of uncertainty that can discourage users from drawing valid inferences. I will discuss how the concept of a model check in a Bayesian statistical framework unites exploratory and confirmatory analysis, and how this understanding relates to other proposed theories of graphical inference. Viewing interactive analysis as driven by model checks suggests new directions for software and empirical research around exploratory and visual analysis, as well as important questions about what class of problems visual analysis is suited to answer.

About the speaker

Jessica Hullman is an Associate Professor of Computer Science at Northwestern University. Her research looks at how to design, evaluate, coordinate, and theorize representations for data-driven decision making. She co-directs the Midwest Uncertainty Collective, an interdisciplinary group of researchers working on topics in visualization, uncertainty communication and human-in-the-loop data analysis, with Matt Kay. Jessica is the recipient of a Microsoft Faculty Fellowship, NSF CAREER Award, and multiple best papers at top visualization and human-computer interaction conferences.

The University of California statistics department paid at least $329,619.84 to an adjunct professor who did no research, was a terrible teacher, and engaged in sexual harassment

I have one of the easy jobs at the university, well paid with pleasant working conditions. It’s not so easy for adjuncts. Ideally, an adjunct professor has a main job and teaches a course on the side, to stay connected to academia and give back something to the next generation. But in an all-too-common non-ideal setting, adjuncts are paid something like $3000 per course and have to run around among multiple institutions to teach a bunch of classes each semester and pay the bills.

But not at the University of California, as I recently learned from a comment thread where somebody wrote:

One amusing case involved a statistics professor emailing a student with an invitation to Hawaii for a “dirty smoke-filled weekend of unadulterated guilty pleasure and sins.” I [my correspondent] personally knew someone who had received some inappropriate advances from said professor, but simply rejected his proposal and decided to move on with their life and not deal with the messiness. I had also heard of quite a few incidents that were not tied to any later public reckoning.

I did some googling and found the story:

OPHD found Howard D’abrera’s repeatedly emailed a student with invitations to Hawaii and other destinations, mentioning orgies and threatening to lower the student’s grade if they did not accept the invitation. Though D’abrera denies any sexual intent to his emails and communications, OPHD found D’abrera had more likely than not violated UC sexual harassment policy. He was placed on administrative leave the day the complaint was filed and later resigned in January.

I love the bit about how he “denies any sexual intent to his emails and communications.” Actually, I don’t love it at all. I think it’s horrible.

He was about 70 years old when he was planning this “dirty smoke-filled weekend of unadulterated guilty pleasure and sins.” I don’t think it’s good for a statistics teacher to promote smoking. It’s really bad for you!

At this point I was curious and I looked up his salary history at at the University of California. Here it is:

2010 30,099.96
2011 60,199.92
2012 30,099.96
2013 55,183.00
2014 60,200.00
2015 80,121.00
2016 13,716.00

The database doesn’t go back before 2010.

2016 was the year he was fired resigned so that’s why he didn’t pull in another 80K that year.

To pay $80,000 per year for an adjunct to teach introductory statistics . . . what’s that all about??? It’s just not so hard to find someone who can teach introductory statistics.

On the downside, dude had no research record and was a terrible teacher. But, on the upside, he was a well-connected member of the old boy’s network. I’m still trying to picture in my mind this 70-year-old guy trying to get his 19-year-old students to party with him in Hawaii. None of it makes sense to me—except the bit about the University of California paying hundreds of thousands of dollars to an unqualified creep who happens to have the right friends. That part I can believe. It’s not like the administrators were spending their own money, after all!

The UC Berkeley statistics department has had two famous cases of sexual harassment by tenured professors, and that’s even worse, in the sense that the tenured faculty have more power, making it harder for the harassed students to say no. But at least in those cases it was understandable why these professors were making the big bucks, as they were renowned researchers, world-class in their fields, so up until the moment it was revealed that they were abusing their power and breaking the rules, there was a clear rationale to keep them on the payroll. That’s a bit different from the case here, where the teacher in question’s only qualification seems to have been his personal connections. I dunno, perhaps his mediocrity was a sort of qualification too, in that someone who doesn’t know anything will be pliable and teach exactly the way he’s told to teach.

When it comes to unqualified people getting paid big bucks, I guess things are much worse in industry, at least the sorts of industries that have money just sloshing around.

Still, I’d heard so much about low-paid adjuncts that I was a little surprised to hear they were paying $80,000 for this unqualified person to teach intro statistics. I do think they could’ve found many people in town who would’ve done a much better job for a lot less money. Think of all the local businesses that hire statisticians: wouldn’t it be cool for the students to take a class from one of these people?

And then, with the money that was freed up by not paying $80,000 to someone’s pal, the university could’ve done something useful, like a personal trainer for the upper administrators on campus.

P.S. There was confusion in comments so let me clarify one point. I’m very angry when I hear about adjuncts being paid $2000 a class, running around going broke teaching in 8 different places at once. It’s a real problem! That’s what makes me really mad about this unqualified guy from the old boy’s network getting paid $80,000 a year. Actual qualified hard-working teachers who in many cases also have a research record, are getting paid peanuts; meanwhile this guy who had nothing going for himself except some well-positioned friends gets paid a bundle. To me these are two sides of the same coin, which is that teaching is not being taken seriously at the university, so that actual teachers get the shaft while the money that is being made available for teaching is wasted by paying it to people with personal connections.

P.P.S. Above I described that professor as unqualified to teach introductory statistics. Let me change that to “he had no apparent qualifications . . .” It’s possible he had qualifications that I’m not aware of.

MRP and Missing Data Question

Andy Timm writes:

I’m curious if you have any suggestions for dealing with item nonresponse when using MRP. I haven’t seen anything particularly compelling in a literature review, but it seems like this has to have come up. It seems like a surprisingly large number of papers just go for a complete cases analysis, or don’t mention clearly how missingness in predictors was handled. I can generally see how that could make sense, but I’m not sure if that does in my case.

Specifically, I’m dealing with a set of polls on support for a border wall with some refusals/unknowns on demographic questions like race, income, and so forth. The refusals seem very informative—those who refused at least 1 demographic question are ~5 points less likely to oppose a border wall, driven mostly by people who refused the income question.

If I was poststratifying to the voter file, I could just do what you did with Yair Ghitza recently, and model the unknowns as their own category, and poststratify to the unknowns in the voter file. But for poststratification to PUMS or similar where we don’t have poststratification data for the unknowns, this indicator variable strategy wouldn’t be helpful I believe.

The alternative would be something like multiple imputation, but imputation for data that is all of 1) like missing not at random, 2) multilevel, 3) coming from surveys with weights seems like a challenge. I found some work that has imputation strategies for multilevel survey data with weights when you have most of the design information, but that’s not really the case with a large set of polls from iPoll—the construction of weights isn’t fully explained.

My thought is some sort of multiple imputation where the uncertainty could be propagated forward would be the optimal solution (and most conceptually elegant one), so for now I’m continuing to try various imputation models to see what produces sensible imputations.

My reply:

1. Yeah, this comes up a lot! The right approach has got to be to jointly model the predictors and the outcome and then do the imputations using this joint model. The multilevel structure would be included in this model. Liu, King, and I actually did this in 1998 using a hierarchical multivariate normal model, and it seemed to work well in our application (see also here). The model and code for this example are so old that it would be best just to redo it from scratch.

2. You mention survey weights. This should present no problem at all: just include in your regression model all the variables used in the survey weights, and then it’s appropriate to do an unweighted analysis. The information in the weights is encoded in population totals that you would use for poststratification for your ultimate inferences for your population of interest.

3. You also mention selection bias (missing not at random). If this missingness only depends on the variables in the model, you’ll be ok with the joint modeling approach described above. If the missingness can depend on unobserved variables, then you’d want to include these as latent variables in your multivariate model.

4. Another issue that can arise is imputation of discrete missing variables. This can be done using a conditional imputation approach (as we discuss here) or using a joint model with latent continuous variables.

It’s been a long time since I’ve done this sort of hierarchical multivariate imputation modeling, but it seems like the right thing to do, actually!

Considering this as a research project, I’d proceed along two tracks:

– A generic multilevel multivariate normal model for imputing missing values from multiple polls, using latent continuous variables for discrete responses. This modeling-based approach can be compared with off-the-shelf multiple imputation programs that don’t include the multilevel structure.

– A Bayesian latent-variable model for informative nonresponse, focusing on this border wall question.

Then, when each of these two parts is working, you can put them together. It should be possible to do this all in Stan.

At this point, you can approach the questions of interest (distribution of survey responses given demographics, poststratified across the population) directly, by including this regression in your big Stan model that you’re fitting; or you can use the above procedure as a method for multiple imputation, then construct some imputed datasets, and go on and do MRP with those.

I sent the above note to Timm, and he responded:

Here are some initial results from testing out a few imputation models based on your suggestions and some recent more recent multilevel imputation literature.

Your point about the weights make perfect sense, and simplifies things quite a bit. Since the imputation models use a superset of the variables that were used to build the weights, that part should be resolved for the most part.

In addition to your MIMS paper, Stef Van Buuren’s chapter and recommendations on multilevel imputation strategies and this simulation study paper from Grund, Lüdtke, and Robitzsch (2018) were helpful, particularly in suggesting FCS approaches with passive imputation of group means and interaction terms. Grund, Lüdtke, and Robitzsch mention that estimates of interaction terms from JM imputation can be biased, hence the preference for FCS in this context. I’m not sure JM vs. FCS would make a huge difference given similar models otherwise, but I’m trusting the above authors recommendation on that. So I took their suggestion for model form, but also roughly followed what you did in your MIMS paper, pooling over surveys.

The 4 models I tried were:
1. A simple FCS model in mice with no interaction terms (to start with a baseline model that should have problems)
2. Similar to your paper’s idea, a FCS model with a random intercept at the survey level, and lots of two way interactions using passive imputation.
3. Similar to 2, but also with a random intercept on state.
4. A random forest based imputation with predictive mean matching.

As you might expect, the 1st model didn’t work too well. For example, it struggled to impute a larger proportion of hispanics in polls with hispanic oversamples. The other three imputation models all performed fairly well, suggesting that a major gain in imputation reasonableness came from including the interactions. Building 50 such imputed datasets, fitting MRP models in brms on them, and then mixing the draws across imputations appears to have worked well for making the final inferences.

Since there’s no population level ground truth for “support for a border wall” though, unlike say vote share, it’s hard to rigorously compare the quality of the final predictions from brms models built on top of each type of imputation. Thus, I’m currently presenting them in a sort of “robustness check” framework, where the final predictions I’m most interested in are fairly robust to different imputation models.

Whassup with the FDA approval of that Alzheimer’s drug? A “disgraceful decision” or a good idea?

Andrew Klaassen writes:

Any chance you’ll be weighing in on your blog on the apparently wobbly studies supporting the FDA’s approval of Aduhelm? I’m hearing angry things being said about it by the random people I know in medical research, but don’t know much beyond that.

Here’s the one link on the story [by Beth Mole] that I’ve read so far.

And Deborah Mayo points to this discussion by Geoff Stuart that is critical of the FDA’s reasoning.

Beyond all that, I talked with someone who works more on the policy side who said she thought the drug approval was a terrible idea and asked me how the FDA could ever have made this decision.

Here are my two thoughts.

First, I defer to the experts on this. If everybody thinks the FDA approval was a bad idea, it probably was. I base this conclusion not on the statistical details I’ve seen but rather on a more general impression of rules and fairness, that if other drugs with similar trial results wouldn’t get approved, that some better justification would be needed. Again, I say this not based on any analyses of mine, just based on my respect for all the people who expressed this view. Also the usual concerns about burdening the taxpayer (if this drug is approved for Medicare payments), diverting resources from other treatments, etc.

Second, I can see a rationale for approving a drug in this case even if there’s no evidence that it work. It goes like this. If you approve the drug, some people will try it. If some people will try it, we’ll get some data. Not randomized data—but it’s not clear that randomized data are really what we need. If the treatment is approved now, then we’ll get real-life data, and in 2022 we’ll have one year of real-life data, in 2026 we’ll have five years, etc. It seems likely that this treatment probably won’t do so much to help people right now, but some real-world longitudinal data could help understand what it does do, and that could be valuable in helping to develop future treatments.

If this argument is correct, then the rationale for approval is not about whether this particular drug does the job, but rather whether this line of research might ultimately be successful. Approve the first crude attempt now, and then this will put us on the escalator to developing the thing that really works. Don’t approve, and you delay this future development.

Again, I’m not saying I think the FDA made the right decision in approving the drug—I’ll defer to the experts who say otherwise—I’m just saying I can see a justification in general terms.

It is perhaps helpful to consider this future-looking justification when evaluating the arguments for and against approval. For example, the above-linked news article shared this post from the director of the FDA’s Center for Drug Evaluation and Research:

We ultimately decided to use the Accelerated Approval pathway—a pathway intended to provide earlier access to potentially valuable therapies for patients with serious diseases where there is an unmet need, and where there is an expectation of clinical benefit despite some residual uncertainty regarding that benefit… [T]reatment with Aduhelm was clearly shown in all trials to substantially reduce amyloid beta plaques. This reduction in plaques is reasonably likely to result in clinical benefit.

From the other side is this statement from Mark Dallas, a neuroscientist at University of Reading:

This sets a dangerous precedent for future drugs in the fight to combat Alzheimer’s and other complex diseases. In many ways the clinical trials undertaken do not present a clear picture that this medicine will be of tangible benefit to individuals living with dementia.

And this from Robert Howard, a psychiatry professor at University College London:

As a dementia clinician and researcher with personal family experience of Alzheimer’s disease, I want to see effective dementia treatments as much as anyone. I consider the approval of aducanumab represents a grave error that will have only negative impact on patients and their families and that could derail the ongoing search for meaningful dementia treatments for a decade.

This last quote is interesting because he offers a forward-looking argument in the No direction. If I’m interpreting his statement correctly, Howard is saying not just that the new drug has not been proven effective but also that he suspects this line of research is a dead end.

This to me connects to a general issue of statistics and policy analysis:

Decisions are made based on immediate questions that are (partially) resolvable by available data: Did this particular drug work on this particular group of people? But the best reasons for the decision are long-term: Will approving this drug take us in a useful direction going forward? It’s tricky because government decisions should be based on some principles, and once we move away from the hard numbers there is the risk of bad decisions and political interference.

But still. When considering, say, a construction project, the government will do some cost-benefit analysis and this will be forward-looking: some estimate, for example, of the number of people who in five years will be driving over this hypothetical bridge or riding this particular train or whatever. These projections will be based on assumptions, and these assumptions should be stated clearly, but at least the accepted framing is about the future. I am bothered that much of the discussion of drug approval is backward-looking, all about details of a particular study that’s already done. This seems related to the general problem of statistical studies being analyzed in isolation and a focus on noisy summaries such as statistical significance.

Just to be clear: I believe that the FDA and its critics are ultimately thinking about long-term benefits. I just feel that much of the available language for this discussion is focused on static analyses of a single study, so that the long-term questions are implicit and not foregrounded.

P.S. Further discussion here by Gary Schwitzer who quotes some hypey headlines (“Aducanumab offers Alzheimer’s patients a new lease on life,” “A breakthrough drug, Aducanumab,” “Game-changing new dementia drug”) and writes, “The proposed cost for this low-evidence drug could bankrupt Medicare.” This is a good point. I guess that approval is a separate decision from whether Medicare would cover it, but there must be some connection.

Bayesian forecasting challenge!

EJ writes:

A student of mine—Maximilian Maier—has designed a brief Bayesian forecasting challenge. I think it’s a nice idea and we’re looking for people that will complete the task.

The relevant information is here.

The link to the study is here.

Feel free to try it out. If you don’t like the survey, that’s fine too; I’m sure they’d be interested in your feedback!

This one is for fans of George V. Higgins

I don’t think there are many remaining fans of George V. Higgins: he died 20 years ago, his popularity had been in decline for decades, and his only bestseller was his first book, in 1970, which was also made into a well-received but not particularly popular or well-remembered movie. His writing was extremely mannered, and he was a follower of the once-huge-but-now-nearly-forgotten John O’Hara. Along similar lines was John Marquand, who, way back when, was so successful that various critics felt the had to go to the trouble of explaining to the world that he wasn’t all that. I like Marquand, but now he’s down there with O’Hara in the forgotten bin.

Nonetheless, for those of you who share this niche interest, I have two short books (130 and 200 pages) to recommend to you. Neither book is new; I just happen to be recommending them now.

Peter Wolfe, Havoc in the Hub: A Rereading of George V. Higgins. “What enables him to evoke a setting, convey the essence of a situation, and glimpse the inmost heart of a character with such economy, is his mastery of the intimate detail . . . And it’s spun by Higgins’s command of colloquial speech. Slang and cliche in Higgins poeticize the prosaic. . . . A Higgins novel isn’t so much a substitute for good literature as an important aspect of it.” But, also, “As they stand, many of his descriptions . . . distract as much as they illuminate. . . . his passion for inclusiveness and explicitness frequently cloud his writing. . . . Despite its virtues, a Higgins novel can can say too much about itself and not enough about the reality we’re all struggling to make sense of.”

Erwin Ford, George V. Higgins: The Life and Writings. I learned a lot about Higgins from reading this book. Much of his personal and professional life during the 1980s was distorted by his getting conned into a tax-evasion scam to protect his profits from his celebrated first novel so he could continue to live large. Check this out:

He [Higgins] needed to find a way to keep some of his money. He consulted a specialist in retirement and tax shelters named Carmen Elio who had a scheme to save the large sums of money from books and movies. Elio suggested a way to hide several hundred thousand dollars from the sale of movie rights to The Digger’s Game. He bought several mainframe computers with Higgins’s movie profits. The computers were to be leased out, Higgins would claim depreciation on them for a number of years, and later sell them in South America for a profit. It seemed a perfect way to keep the small fortune Higgins would otherwise lose to the Internal Revenue Service. Higgins agreed and signed over the money for Elio to invest.”

“Sell them in South America for a profit” . . . what could possibly go wrong???

It did not go well, not just financially—and, yes, he ultimately did have to pay back the IRS, and he felt pressure to keep churning out books no matter what the quality—but, also, the whole long-drawn-out episode seemed to instill in him a bitterness and feeling of victimization. If Higgins had more of a sense of humor about himself, he maybe could’ve written a great novel about an author whose greed and delusions of grandeur led him to be swindled and then embittered—he was very angry at the Internal Revenue Service, but they were just doing their job!—but, no, that never happens. Higgins wrote some excellent books, but never that one.

This system too often rewards cronyism rather than hard work or creativity — and perpetuates the gross inequalities in representation …

This post is by Lizzie. I started this a while ago, but Andrew’s Doll House post pushed me to finally get it up on the blog.

The above quote comes from a recent article on the revelation that the person Philip Roth decided should write his authorized biography has a history of sexual harassment accusations (I mean, the irony…). It reads more fully, “this system too often rewards cronyism rather than hard work or creativity — and perpetuates the gross inequalities in representation that disfigure the American literary landscape”, but I think it certainly applies to lots of other `landscapes’ including the one within which I exist much of the time: ecology and evolutionary biology (or EEB).

It relates to a quote that has been rolling around in my head for many months now, from a student in my lab: “It feels like the message is that the careers of these [double-digit] people were worth less than the career of this one [purportedly] brilliant man.”

And I really did not have a good response to this student other than ‘ummm.’ I have asked around for a better response from my colleagues around me and I am not sure any of us have one.

It’s a relevant question, that could be posed for many publicized and less publicized events in EEB. Two recent publicized examples include the just-risen star of Jonathan Pruitt, a spider behavioral biologist, who has been accused of fabricating most of his high-profile data, passing out said data for junior folk to write up with him as senior author, with multiple papers now retracted or expressions of concern issued. On the rising-star front is Denon Start. He’s had recent accusations of suspicious data, but before those he had two awards abruptly (to me at least) rescinded from the Canadian Society for Ecology and Evolutionary Biology (CSEE). You can fall down the Twittersphere to try to figure out why, but CSEE never said.

There are similarities and differences in these cases galore, but the one similarity that has grabbed me is how much those around me, especially my faculty colleagues, seem to pin as much blame as possible on each man — I get the feeling 99.9% would be a good amount to many, leaving just enough room to acknowledge ‘we could all and should all do better.’ I agree a lot of blame falls on the perpetrators, but I also think that, by not leaving enough room to blame ourselves and the community we create, we open the gates for the behavior to continue.

Both of my examples had remarkable publication records (potentially falling under The Armstrong Principle), but they also were promoted by many people to get where they were (and are perhaps, Pruitt at least seems to still be employed by McMaster). Academia seems good at passing along and promoting people where we should perhaps have cause for concern. This to me means that either we are too out of it to know what is happening, or we look the other way because it seems easier, and it protects the careers of the rising star and all those connected to their star. There’s always some talk about the former — how can we build a better community where senior people know what is going on and can intervene? But what I want to talk about is: How do we hold a community of researchers accountable when they may have known there were concerns?

I suggest a step forward to is to hold the letter writers more accountable. Academia does function on reputation and a big part of your reputation is formalized in your letters of reference. American letters are often so positive as to feel almost useless. But they’re not. When they are short, brief or otherwise feel perfunctory, they can say a lot. Here’s a letter that might raise eyebrows:

Dear committee of special award or position:

I have known [X] in [this way] for [this long]. S/he/they have published [Y] papers, taught (or TA-ed) Z classes.

If you have more questions about this applicant, I encourage you to call me at the number below.

Sincerely,
Dr. Especially-eminent

So, that’s something we could start to do if you ask me. I also suggest the following after-the-fact potential actions:

(1) If you read a glowing letter for someone whom shortly after you hear had major concerns, go look at that letter. Did you miss something? If it seems like you didn’t, why not call up the letter writers and ask about the disconnect? Letter writers feel more okay writing these letters because there is generally no consequence for their careers.

(2) We could formalize some of this. Department chairs who receive glowing letters and there are issues later could be expected to contact the department who employs the letter writer and express some concern. We effectively write reference letters as part of our service, so it’s part of our job; if we’re doing that part of our job poorly shouldn’t it count against us somehow?

(3) The other thing I suggest we all do, after the fact, is be very cautious in how much we push back on whether the system needs to change by focusing on the perpetrator or fears that someone or some organization will be sued. I don’t think it feels like you’re saying you don’t want the system to change when you focus so strongly on the individual perpetrator or when you say, as many did to me, ‘what if [insert some society or senior person] is sued?’ These are both important things to do, but when they are most of what you do — then you just did trade in these action items of fear and blame for an actual closer look at the system.

Which brings me back to my student. Was my student reading the message correctly? ‘That the careers of these [double-digit] people were worth less* than the career of this one brilliant man’ in the particular case we were discussing.

It’s a good question to ask, if you ask me, as there are some implicit weighting and numerical assumptions here. Every time we don’t question the letter writers, or worry about what will happen to some established person or society, I think we effectively do send the message my student felt they had received. I even heard a colleague recently say we should not scrutinize too strongly the high fliers with the many, many publications, because ‘what if we discourage them? What if we lost [insert name of new NAS member] to that extra scrutiny?’ I have two replies to this. One is that if they are that great they will stand up to a little extra scrutiny.

And the other is to think more on how implicitly we undervalue those we lose when we say this. Ask people to explicitly count up those we lost along the way — who drop out, leave or otherwise are valued less, and how we value the creativity and exciting science we never got to see from them. We either think we aren’t losing many, or that they’re not worth the one great man we saved.

*In earlier version of this post I mistakenly wrote “worth more than the career of this one brilliant man,” which led to much understandable confusion.

When MCMC fails: The advice we’re giving is wrong. Here’s what we you should be doing instead. (Hint: it’s all about the folk theorem.)

In applied Bayesian statistics we often use Markov chain Monte Carlo: a family of iterative algorithms that yield approximate draws from the posterior distribution. For example, Stan uses Hamiltonian Monte Carlo.

One annoying thing about these iterative algorithms is that they can take awhile, but on the plus side this spins off all sorts of auxiliary information that can be used to identify and diagnose problems.

That’s what today’s post is about. What to do when your MCMC has problems.

I think much of current default practice is wrong. What do I mean by that? What I mean is that the typical default behavior with MCMC issues is to just throw more computational resources at the problem. And that’s wrong, because problems with your computation often imply problems with your model (that’s the folk theorem of statistical computing).

Whassup?

When you have problems with MCMC mixing, the usual first step is to run your chains longer. 10,000 iterations, 100,000 iterations, etc. Or to tweak parameters in your HMC such as adapt_delta and max_treedepth to make it explore more carefully. Or to get into heavy math things like reparameterizing your model. I’m not saying these steps are never useful, but they should not be the first thing you try. Or the second thing. Or the third.

Rather, when you have mixing problems you should immediately try to figure out what went wrong. Some common possibilities:

1. Priors on some parameters are weak or nonexistent, data are too weak to identify all the parameters in the model, and the MCMC drifts all over the place. This could be a flat-out improper posterior, or it could just be so poorly constrained that it includes all sorts of weird regions that you don’t really care about but which make it harder for the chains to mix. You should be able to notice this sort of problem because the posterior simulations go to some really weird places, things like elasticity parameters of -20 or people with 8 kg livers.

2. Bugs, plain and simple. For example coding a Poisson regression without using poisson_log so you’re not including the log link. Coding the standard deviation as sigma^2 rather than sigma. Forgetting a line of code. Screwing up your array indexing. All sorts of things. You should be able to notice this sort of problem by debugging, going through the code line by line and saving intermediate parameters.

3. Minor modes in the tails of the distribution. An example is in section 11 of our Bayesian workflow paper. You should be able to notice this sort of problem because different chains will cluster in different places, and the target function (“lp__”) will be much lower in some modes than others. The problem can sometimes be fixed by just using starting points near the main mode, or else you can use stacking to discard minor modes. Soon we should have Pathfinder implemented and this will get rid of lots of these minor modes automatically.

4. Sometimes an apparent problem isn’t a problem at all. You can get computational overflows in warmup, during the period where the chains are exploring all sorts of faraway places that they then wisely give up on. Overflows, divergences, etc., during warmup are typically nothing to be concerned about. Dan Simpson refers to this as the “awkward adolescence” period of the simulation, and all we need to worry about is the “sensible adult” stage. (To continue the analogy, there’s no need to run these simulations all the way to senility.)

5. The model can be reparameterized. The two big ways of doing this are: (a) rescaling so parameters are roughly on unit scale. For example, don’t have a country-level regression where one of the predictors is the country’s population, so that you’d have coefficients like 0.000002 or whatever. Instead, use log population or population in millions or whatever. In general try to have parameters be scale-free, introducing a multiplier in the model if necessary. And (a) rescaling hierarchical models so that the group-level errors are unit-scale error terms, the so-called non-centered parameterization. Most of the time when you have mixing problems, it’s not a parameterization issue, but these do arise so I’m mentioning them here. In any case it’s good practice to model your parameters on unit scale.

6. And, of course, sometimes you have legit slow mixing. But, even in those cases, often you can change your model (not just “reparameterizing” the model while keeping it the same, but actually changing it) and the chains will mix better. I’m not saying you should make your model worse just to improve computation. What I’m saying is that typically there’s a lot of arbitrariness to your model, and if some arbitrary feature is making your model run slowly, what’s the point of that? This loops back to the point 1 above. An improved model can often be thought of as a (softly-) constrained version of what you’ve been fitting before. You can always go back and relax the constraints later to see what’s going on.

And that brings us to our final point:

7. You’re not fitting just one model, nor should you want to fit just one model. Statistical workflow involves fitting a sequence of models in order to understand a problem. You’re gonna be fitting lots of models already, so don’t obsess on this one model that’s mixing poorly. Rather, take the poor mixing as a signal that something’s going on that you don’t understand, and go figure it out—using the ideas of multiple modeling that you should be doing anyway!

Why the current default behavior is bad

You might ask, What’s so bad about running longer and more slowly? What’s bad about it is two things:

– It’s a waste of time. Time spent running and running and running a model that you don’t want to be running at all—if it is the case, as it typically is, that your model has major problems.

– By just running longer and more slowly, you’re just putting off the inevitable moment where you have to go back and debug your model. Better to go to it right away. Get into the workflow already.

Leaving a Doll’s House, by Claire Bloom

I read Leaving a Doll’s House, the autobiography of actress Claire Bloom, and, as promised (see P.P.P.S. here), here are my reactions.

Bloom’s book is famous because of its chapters on her relationship with author Philip Roth. Actually, though, it throughly covers all her life, with a bit more than half of the book taking us from birth to the mid-1970s, and the rest going to the mid-1990s and focusing on the Roth relationship. She had an interesting life, and I don’t think the Roth material would’ve worked so well without the background on her earlier experiences. Sometimes when people write autobiographies they skip entire chunks of their lives (that’s what Bertrand Russell did, for example), so I appreciated that she straightforwardly went through it all.

The book is well written. Her style is more analytical than storytelling. Her goal seems to be, first to get down the facts of her life and second to try to figure out the motivations of herself and the people close to her. Telling stories or entertaining the reader is not really the goal. Don’t get me wrong: I found the book readable and interesting. It just was clearly an autobiography in the sense of a biography written for herself, rather than a journalistic or novelistic telling of a life.

You won’t be surprised that, after finishing Bloom’s book, I felt pretty angry at Roth. Not so much for the affairs and the prenup—I guess all that comes with the territory if you marry “Portnoy”—but for the way he was so cold to Bloom, and especially the way he was so destructive of her relationship with her teenage daughter. That part was really horrible! Of course this then pushes the question back one step, to why Bloom stayed with Roth for so long, which is a question asked by many people who read her book; see for example this thoughtful review by Jonathan Yardley. All I can say is that this is not unique to this particular couple. People staying with each other too long . . . that happens all the time.

I do get the impression from everything I’ve read about Roth (including Bloom’s book) that he could be a very charming person when he wanted to be. Indeed, when he wasn’t handing Bloom creepy letters, he’d say all these nice things about how he loved her, needed her, etc. I guess that part of this was him being manipulative, telling people what they wanted to hear to get things out of them, but I wonder if part of it was a kind of surfeit of sincerity. All of us have a mixture of positive and negative thoughts within us, but we usually perform some operations of averaging or truncation: with our friends we will typically average over time, suppressing our annoyances and masking our peak emotions in order to present an equanimous, positive aspect; with our romantic partners we will truncate the troughs so we can present some mix of emotions ranging from acceptance to joy. Roth, it seems, was not much of an averager or a truncater. The result was that his friends got those peaks, and I guess the learned to handle the troughs. Being married to someone who acts this way, though . . . that’s another story. But then I think that Roth, knowing this about himself, could use these peaks and troughs manipulatively. I guess if I’d known Roth personally I would’ve liked him because I would’ve focused on the good stuff, the things he could deliver in a friendship that nobody else could, and maybe the bad stuff wouldn’t seem so relevant to me because I could just ignore him when he was acting like an asshole

I wanted to get some other takes on Bloom and Roth, so I did some googling and found this entertaining and thoughtful review by James Wolcott. He mentions that Nobel prize thing that we discussed in our earlier post.

I also came across this excerpt from the recent Roth biography by the now-disgraced Blake Bailey. This was useful in getting Roth’s side of the story.

Bailey starts by describing Bloom’s autobiography as “scurrilous,” which at first I thought was unfair, but then I looked up the word, and it literally means, “making or spreading scandalous claims about someone with the intention of damaging their reputation.” That seems fair enough! The claims can all be true and it’s still scurrilous. So I learned something. I always thought of “scurrilous” as a bad thing, but it doesn’t have to be bad at all. Someone could write a fair-minded biography of some villain like Mao Zedong or Jack Welch or whatever, and to the extent that these people really did bad things, the biography could be scurrilous, without any implication that this is a bad thing.

Anyway, I guess the reviews of Bailey’s book were accurate in that it really doesn’t make Roth look good. He gave Roth just enough rope to hang himself. For example he quotes Roth as describing Bloom’s bedroom in London as “slightly whorish.” I think the “slightly” makes it even worse! If Roth had just called the room whorish, you could say it’s old-school hyperbole, but “slightly” makes it sound like he really thought about this one! Also Bailey mentions that Bloom had stuffed animals on her bed, and I’m like, OK, so what’s your point here? Anyway, when it comes to the facts, Bailey/Roth are pretty much on the same page as Bloom: Roth was only rarely committed to the relationship, Bloom put up with his antics (the troughs were worth it because the peaks were so great), thus Roth had little motivation to be reasonable with her—so he wasn’t.

I also went back to Claudia Roth Pierpoint’s book on Roth to see what she said about all this. I agreed with Pierpoint when she wrote that Bloom in her memoir was “contending against herself—struggling to be less passive, more independent, a better mother—as much as against the frustrating men in her life.” Indeed, that struggle is a lot of the reason that I found Bloom’s book interesting to read—including all the pre-Roth chapters. But I disagree with Pierpoint when she says, “None of the men come off well, with the possible exception of Yul Brynner.” I thought several of the men came off well! Richard Burton, Rod Steiger, and, yes, Brynner—they’re all portrayed positively. None are portrayed as heroes, but they’re portrayed as good, if complicated, men. Pierpoint also reports that Roth “thought of bringing a lawsuit” against Bloom. That sounds pretty piggy of him, especially considering how easy he got off with that prenup. But I guess that brings us to the mysterious bit that Bloom still expresses many positive emotions about Roth, even at the end of her memoir, after the way he treated her, after the way he treated her daughter. It’s that charm again.

The thing that keeps bugging me is, when thinking about Roth, should I add the charm or subtract it? What I mean is, after reading all this material, I think a lot worse of Roth as a person. But then I can go in two directions. One direction is to add the charm, to say that despite his nasty behavior, he had many loyal friends—Veronica Geng, even!—and that implies that he was not such a bad guy after all. The other direction is to subtract the charm and say that, without it, nobody would’ve put up with him at all.

This sort of moral calculus is nothing unique to Roth, of course. It comes up with any celebrity, whether it be Albert Einstein or Michael Jackson or Joan Crawford or LeBron James or whoever. When someone is famous and talented, you hear all sorts of things about them. Or, to look at it the other direction, I’m being manipulated by Bloom to take her side and get angry at Roth—and I do get angry, even though I know I’m being manipulated. Of course, it’s possible to get manipulated to legitimate ends.

Anyway, I’m glad Roth wrote his books. Learning about his good and bad sides as a person can help us understand his work. And I’m glad Bloom wrote her book: I can’t imagine it was easy to open up like that, she had an interesting life, and I feel like a lot of the criticism of her book came because people just didn’t want to hear bad things about a literary lion. A charming guy who couldn’t keep it in his pants, fine; a creepy dude who wrote creepy notes and tried to manipulate the best friend of his wife’s daughter, that’s something that many people just don’t want to hear about.

P.S. How is this relevant to statistical modeling, causal inference, and social science? In many ways. First, we’re making inference from incomplete and noisy data. Second, we’re embracing variation (of people’s behavior over time) and uncertainty. Third, we’re distinguishing between evidence and interpretation: in particular, Roth/Bailey take issue with Bloom’s interpretations but they don’t really contest her facts.

P.P.S. As I wrote earlier, I think Philip Roth’s writing is ok, but it doesn’t move me that much. I’d rather be writing about George V. Higgins or Veronica Geng or John Updike (sorry!) or just about anyone discussed in that book by Anthony West, or Anthony West himself, for that matter. But Philip Roth gets lots of discussion, I notice it, and it sucks me in. So here we are.

P.P.P.S. In the meantime since I first wrote this post, I read Blake Bailey’s biography of Roth—I got it out of the library, just like with Bloom’s book! OK, I didn’t read the biography from cover to cover. It had lots of interesting detail but overall I didn’t find it as thoughtful or moving as Leaving a Doll’s House, so I jumped around a bit. The biography was OK for what it was. I agree with the reviews that it was awkward how Bailey was always promoting Roth and taking Roth’s side in every dispute, but, hey, usually you want a biographer to like his subject, so that’s OK.

The one thing I really didn’t like about Bailey’s book is how he played both sides of the street regarding the autobiographical nature of Roth’s fiction. Sometimes he took the line that fiction is fiction not autobiography, other times he used the fiction as a kind of defense of Roth’s behavior, for example writing that Roth “was rarely less than forthcoming (except with whoever happened to be his main female companion at the time) on the subject of straying: ‘God, I’m fond of adultery,’ he and Mickey Sabbath liked to say. ‘Aren’t you?'” So these characters aren’t Roth, except when they are. I was also creeped out that Roth and Bailey just didn’t seem to understand what was wrong about interfering in the relationship between Bloom and her 18-year-old daughter. To Bailey’s credit, though, in many places he lets Roth hang himself with his own words, for example recounting the story of how Roth tried to have an affair with a friend of Bloom’s daughter who was living in their house: after the young woman angrily turned him down, Roth says he left a note on her bed saying something like, “This is pure sexual hysteria.” Doesn’t quite fit Roth’s lusty irreverent image. Grabbing a young woman without her consent, sure, but acting nasty and not accepting rejection, not so much. I agree with Bailey that Roth seems well summarized by the famous passage from Portnoy’s Complaint: “A disorder in which strongly felt ethical and altruistic impulses are perpetually warring with extreme sexual longings, often of a perverse nature …” It does seem that Roth tried to be ethical much of the time—and then when he did things that he shouldn’t have done, he’d feel bad about it and this would be transformed to anger which he’d target at Bloom. I can well imagine that seeing her caused a sort of self-reproach for Roth, motivating him to lash out at her, making her an enemy, which could then retroactively justify his behavior. Not a mellow adulterer like Updike, Roth would have these emotional swings. I could see how he could be loyal to his friends but then nasty to his loved ones when he was angry at himself. Good thing, perhaps, that he had no kids. And, yes, I’m doing amateur psychoanalysis here—but if you can’t do amateur psychoanalysis of Philip Roth, who can you do it for? It’s interesting that Roth, the introspective novelist, had difficulty expressing and understanding his own contradictions, whereas Bloom, the actress, was much more searching and when writing about her life. But perhaps Roth’s blinders when looking at himself were necessary for him to succeed as an author; maybe too much self-knowledge would’ve blinded him. Similarly, it may be that Bailey’s habit of staying on the surface and accepting Roth’s justifications at face value allowed him to write a biography that captured the perspective of his subject.

Where can you find the best CBD products? CBD gummies made with vegan ingredients and CBD oils that are lab tested and 100% organic? Click here.