Do doctors get too little respect nowadays? Or too much?

This news article laments that doctors don’t get enough respect:

‘Kind of Awkward’: Doctors Find Themselves on a First-Name Basis . . . Female doctors were more than twice as likely as male doctors to be addressed by their first names . . . Patients were more likely to address general practitioners by their first names than specialists. . . . “Use of formal titles in medicine and many other professions is a linguistic signal of respect and professionalism” . . . Studying this issue, which they refer to as “untitling,” poses a number of challenges . . . The changing behavior they saw in the emails differs from even the recent past when it was all but unheard-of to call doctors by their first names . . .

The article, by science reporter Gina Kolata, continues:

Doctors may not enjoy the real world’s tilt toward informality. The survey in 2000 showed that 61 percent were annoyed when patients addressed them by their first name.

OK, so that’s one take. Here’s another perspective, from Martha Smith commenting recently on our blog:

Self-policing in medicine is one that particularly irks me — I’ve had too many experiences with physicians who do ridiculous things — things like prescribing a medication for ulcers, when the problem is a pulled rectus abdominis; or trying to remove a growth in the nose by freezing it off with liquid nitrogen.

I have mixed feelings on all this. On one hand, I call my doctor Doctor So-and-so, not using his first name, and as student I called all my teachers Mister and Missus and Professor. On the other hand, I much prefer when students call me Andy or Andrew or even Mister rather than Professor, and I agree with Martha that doctors get too much deference in our society and it is.

What bugged me about the above-linked news article is that it just seemed to accept the idea of deference to doctors as a matter of course, without presenting any countervailing arguments or even a quotation from someone on the other side to make the point that calling doctors by formal titles is a distancing that can reinforce the idea that doctors are infallible. Or even the more basic issue that this sort of deference contradicts the democratic ideal of American society. Much of the article discusses discrepancies between how men and women are treated. Maybe the right way to solve this is not to be giving more deference to female doctors but rather by treating the male doctors more like everyday people.

Successful randomization and covariate “imbalance” in a survey experiment in Nature

Last year I wrote about the value of testing observable consequences of a randomized experiment having occurred as planned. For example, if the randomization was supposedly Bernoulli(1/2), you can check that the number of units in treatment and control in the analytical sample isn’t so inconsistent with that; such tests are quite common in the tech industry. If you have pre-treatment covariates, then it can also make sense to test that they are not wildly inconsistent with randomization having occurred as planned. The point here is that things can go wrong in the treatment assignment itself or in how data is recorded and processed downstream. We are not checking whether our randomization perfectly balanced all of the covariates. We are checking our mundane null hypothesis that, yes, the treatment really was randomized as planned. Even if there is just a small difference in proportion treated or a small imbalance in observable covariates, if this is high statistically significant (say, p < 1e-5), then we should likely revise our beliefs. We might be able to salvage the experiment if, say, some observations were incorrectly dropped (one can also think of this as harmless attrition not being so harmless after all).

The argument against doing or at least prominently reporting these tests is that they can confuse readers and can also motivate “garden of forking paths” analyses with different sets of covariates than planned. I recently encountered some of these challenges in the wild. Because of open peer review processes, I can give a view into the peer review process for the paper where this came up.

I was a peer reviewer for this paper, “Communicating doctors’ consensus persistently increases COVID-19 vaccinations”, now published in Nature. It is an impressive experiment embedded in a multi-wave survey in the Czech Republic. The intervention provides accurate information about doctors’ trust in COVID-19 vaccines, which people perceived to be lower than it really was. (This is related to some of our own work on people’s beliefs about others’ vaccination intentions.) The paper presents evidence that this increased vaccination:

Fig. 4

This figure (Figure 4 from the published version of the paper) shows the effects by wave of the survey. Not all respondents participated in each wave, so this creates the “full sample”, which includes a varying set of people over time, while the “fixed sample” includes only those who are in all waves. More immediately relevant, there are two sets of covariates used here: a pre-registered set and a set selected using L1-penalized regression.

This differs from a prior version of the paper, which actually didn’t report the preregistered set, motivated by concerns about imbalance of covariates that hadn’t been left out of that set. In my first peer review report, I wrote:

Contrary to the pre-analysis plan, the main analyses include adjustment for some additional covariates: “a non-pre-specified variable for being vaccinated in Wave0 and Wave0 beliefs about the views of doctors. We added the non-specified variables due to a detected imbalance in randomization.” (SI p. 32)

These indeed seem like relevant covariates to adjust for. However, this kind of data-contingent adjustment is potentially worrying. If there were indeed a problem with randomization, one would want to get to the bottom of that. But I don’t see much evidence than anything was wrong; it is simply the case that there is a marginally significant imbalance (.05 < p < .1) in two covariates and a non-significant (p > .1) imbalance in another — without any correction for multiple hypothesis testing. This kind of data-contingent adjustment can increase error rates (e.g., Mutz et al. 2019), especially if no particular rule is followed, creating a “garden of forking paths” (Gelman & Loken 2014). Thus, unless the authors actually think randomization did not occur as planned (in which case perhaps more investigation is needed), I don’t see why these variables should be adjusted for in all main analyses. (Note also that there is no single obvious way to adjust for these covariates. The beliefs about doctors are often discussed in a dichotomous way, e.g., “Underestimating” vs “Overestimating” trust so one could imagine the adjustment being for that dichotomized version additionally or instead. This helps to create many possible specifications, and only one is reported.) … More generally, I would suggest reporting a joint test of all of these covariates being randomized; presumably this retains the null.

This caused the authors to include the pre-registered analyses (which gave similar results) and to note, based on a joint test, that there weren’t “systematic” differences between treatment and control. Still I remained worried that the way they wrote about the differences in covariates between treatment and control invited misplaced skepticism about the randomization:

Nevertheless, we note that three potentially important but not pre-registered variables are not perfectly balanced. Since these three variables are highly predictive of vaccination take-up, not controlling for them could potentially bias the estimation of treatment effects, as is also indicated by the LASSO procedure, which selects these variables among a set of variables that should be controlled for in our estimates.

In my next report, while recommending acceptance, I wrote:

First, what does “not perfectly balanced” mean here? My guess is that all of the variables are not perfectly balanced, as perfect balance would be having identical numbers of subjects with each value in treatment and control, and would typically only be achieved in the blocked/stratified randomization.

Second, in what sense is does this “bias the estimation of treatment effects”? On typical theoretical analyses of randomized experiments, as long as we believe randomization occurred as planned, error due to random differences between groups is not bias; it is *variance* and is correctly accounted for in statistical inference.

This is also related to Reviewer 3’s review [who in the first round wrote “There seems to be an error of randomization on key variables”]. I think it is important for the authors to avoid the incorrect interpretation that something went wrong with their randomization. All indications are that it occurred exactly as planned. However, there can be substantial precision gains from adjusting for covariates, so this provides a reason to prefer the covariate-adjusted estimates.

If I was going to write this paragraph, I would say something like: Nevertheless, because the randomization was not stratified (i.e. blocked) on baseline covariates, there are random imbalances in covariates, as expected. Some of the larger differences are variables that were not specified in the pre-registered set of covariates to use for regression adjustment: (stating the covariates, I might suggest reporting standardized differences, not p-values here).

Of course, the paper is the authors’ to write, but I would just advise that unless they have a reason to believe the randomization did not occur as expected (not just that there were random differences in some covariates), they should avoid giving readers this impression.

I hope this wasn’t too much of a pain for the authors, but I think the final version of the paper is much improved in both (a) reporting the pre-registered analyses (as well as a bit of a multiverse analysis) and (b) not giving readers the incorrect impression there is any substantial evidence that something was wrong in the randomization.

So overall this experience helped me fully appreciate the perspective of Stephen Senn and other methodologists in epidemiology, medicine, and public health that reporting these per-covariate tests can lead to confusion and even worse analytical choices. But I think this is still consistent with what I proposed last time.

I wonder what you all think of this example. It’s also an interesting chance to get other perspectives on how this review and revision process unfolded and on my reviews.

P.S. Just to clarify, it will often make sense to prefer analyses of experiments that adjust for covariates to increase precision. I certainly use those analyses in much of my own work. My point here was more that finding noisy differences in covariates between conditions is not a good reason to change the set of adjusted-for variables. And, even if many readers might reasonably ex ante prefer an analysis that adjusts for more covariates, reporting such an analysis and not reporting the pre-registered analysis is likely to trigger some appropriate skepticism from readers. Furthermore, citing very noisy differences in covariates between conditions is liable to confuse readers and make them think something is wrong with the experiment. Of course, if there is strong evidence against randomization having occurred as planned, that’s notable, but simply adjusting for observables is not a good fix.

[This post is by Dean Eckles.]

When conclusions are unverifiable (multilevel data example)

A. B. Siddique, Y. Jamshidi-Naeini, L. Golzarri-Arroyo, and D. B. Allison write:

Ignoring Clustering and Nesting in Cluster Randomized Trials Renders Conclusions Unverifiable

Siraneh et al conducted a clustered randomized controlled trial (cRCT) to test the effectiveness of additional counseling and social support provided by women identified as “positive deviants” to promote exclusive breastfeeding (EBF) within a community. However, their statistical methods did not account for clustering and nesting effects and thus are not valid.

In the study, randomization occurred at the cluster level (ie, kebeles), and mothers were nested within clusters. . . . Because this is a hierarchical modeling environment and individuals within a cluster are typically positively correlated, an individual-level analysis that does not address clustering effects will generate underestimated standard errors and unduly narrow confidence intervals. That is, the results will overstate statistical significance.

That’s right! They continue:

One alternative is calculating the mean observation by cluster and analyzing the data at the cluster level. . . . A valid alternative would be to use multi-level hierarchical modeling, which recognizes the hierarchy in the data and accounts for both lower and higher levels as distinct levels simultaneously.

Right again.

So what happened in this particular case? Siddique et al. tell the sad story:

We requested the deidentified raw data and statistical code from the authors to reproduce their analyses. Even though we pledged to limit our analysis to testing the hypotheses tested in the article, and the Editor-in-Chief deemed our request “appropriate and reasonable”, the authors were unwilling to share their deidentified raw data and statistical code.

Unwilling to share their deidentified raw data and statistical code, that’s not good! What was the reason?

They said they needed time to analyze the “remaining data” for publication and that the dataset contained identifiers.

Whaaaa? They specifically asked for “deidentified data,” dude. In any case, the authors could’ve taken about 5 minutes and reanalyzed the data themselves. But they didn’t. And one of the authors on that paper is at Harvard! So it’s not like they don’t have the resources.

Siddique et al. conclude:

Given the analytical methods used, the evidence presented by Siraneh et al1 neither supports nor refutes whether a positive deviance intervention affects EBF. The analytical methods were incorrect. All authors have an ethical and professional scientific responsibility to correct non-trivial reported errors in published papers.

Indeed. Also if the authors in question have any Wall Street Journal columns, now’s the time to pull the plug.

My reason for posting this article

Why did I post this run-of-the-mill story of statistical incompetence followed by scientific misbehavior? There must be millions of such cases every year. The reason is that I was intrigued by the word “verifiable” in the title of Siddique et al.’s article. It reminds me of the general connection between replicability and generalizability of results. For a result to be “verifiable,” ultimately it has to replicate, and if there’s no evidence to distinguish the statistical data from noise, then there’s no reason we should expect it to replicate. Also, when the data are hidden, that’s one more way things can’t be verified. We’ve seen too many cases of incompetence, fraud, and just plain bumbling to trust claims that are made without evidence. Even if they’re published in august journals such as Psychological Science, the Proceedings of the National Academy of Sciences, or Risk Management and Healthcare Policy.

P.S. The paper by Siddique et al. concludes with this awesome disclosure statement:

In the last thirty-six months, DBA has received personal payments or promises for same from: Alkermes, Inc.; American Society for Nutrition; Amin Talati Wasserman for KSF Acquisition Corp (Glanbia); Big Sky Health, Inc.; Biofortis Innovation Services (Merieux NutriSciences), Clark Hill PLC; Kaleido Biosciences; Law Offices of Ronald Marron; Medpace/Gelesis; Novo Nordisk Fonden; Reckitt Benckiser Group, PLC; Law Offices of Ronald Marron; Soleno Therapeutics; Sports Research Corp; and WW (formerly Weight Watchers). Donations to a foundation have been made on his behalf by the Northarvest Bean Growers Association. Dr. Allison is an unpaid consultant to the USDA Agricultural Research Service. In the last thirty-six months, Dr. Jamshidi-Naeini has received honoraria from The Alliance for Potato Research and Education. The institution of DBA, ABS, LGA, and YJ-N, Indiana University, and the Indiana University Foundation have received funds or donations to support their research or educational activities from: Alliance for Potato Research and Education; Almond Board; American Egg Board; Arnold Ventures; Eli Lilly and Company; Haas Avocado Board; Gordon and Betty Moore Foundation; Mars, Inc.; National Cattlemen’s Beef Association; USDA; and numerous other for-profit and non-profit organizations to support the work of the School of Public Health and the university more broadly. The authors report no other conflicts of interest in this communication.

Big Avocado strikes again!

WikiGuidelines: A new way to set up guidelines for medical practices to better acknowledge uncertainty

John Williams points to this post by Brad Spellberg, Jaimo Ahn, Robert Centor, who write:

Clinical guidelines greatly influence how doctors care for their patients. . . . Regulators, insurance payers and lawyers can also use guidelines to manage a doctor’s performance, or as evidence in malpractice cases. Often, guidelines compel doctors to provide care in specific ways.

We are physicians who share a common frustration with guidelines based on weak or no evidence. We wanted to create a new approach to medical guidelines built around the humility of uncertainty, in which care recommendations are only made when data is available to support the care. In the absence of such data, guidelines could instead present the pros and cons of various care options.

This sounds like a great idea.

Spellberg et al. continue:

The clinical guidelines movement first began to gain steam in the 1960s. Guideline committees, usually composed of subspecialty experts from academic medical centers, would base care criteria on randomized clinical trials, considered the gold standard of empirical evidence.

So far so good, as long as we recognize that the so-called gold standard isn’t always so good: what with dropout etc., randomized trials have biases too, also there are the usual problems with summarizing studies based on statistical significance or lack thereof, and meta-analysis can be a disaster when averaging experiments of varying quality.

In any case, misinterpretation of randomized trials can be the least of our problems. Spellberg et al. write:

Unfortunately, many committees have since started providing answers to clinical questions even without data from high-quality clinical trials. Instead, they have based recommendations primarily on anecdotal experiences or low-quality data.

Interesting point. If doctors, payers, and regulators are wanting or expecting “guidelines,” then what sort of guidelines do you get when available knowledge is insufficient for strong guidelines? You can see the problem.

Spellberg et al. give several examples where rough-and-ready guidelines led to bad outcomes, and then they give their proposal, which is a more open-ended, and open, approach to creating consensus-based guidelines that recognize uncertainty:

To create a new form of medical guideline that takes the strength of available evidence for a particular practice into account, we gathered 60 other physicians and pharmacists from eight countries on Twitter to draft the first WikiGuideline. Bone infections were voted as the conditions most in need of new guidelines.

We all voted on seven questions about bone infection diagnosis and management to include in the guideline, then broke into teams to generate answers. Each volunteer searched the medical literature and drafted answers to a clinical question based on the data. These answers were repeatedly revised in open dialogue with the group.

These efforts ultimately generated a document with more than 500 references and provided clarity to how providers currently manage bone infections. Of the seven questions we posed, only two had sufficient high-quality data to make a “clear recommendation” on how providers should treat bone infection. The remaining five questions were answered with reviews that provided pros and cons of various care options.

They contrast to the existing standard approach to medical guidelines:

The recommendations WikiGuidelines arrived at differ from current bone infection guidelines by professional group for medical specialists. For example, WikiGuidelines makes a clear recommendation to use oral antibiotics for bone infections based on numerous randomized controlled trials. Current standard guidelines, however, recommend giving intravenous antibiotics, despite the evidence that giving treatment orally is not only just as effective as giving it intravenously, but is also safer and results in fewer side effects.

Interesting. I recognize that I’m hearing only one side of the story. That said, their argument sounds persuasive to me, and of course I’m generally receptive to statements about the importance of recognizing uncertainty.

So I hope this WikiGuidelines for medicine works out and that similar procedures can be done in other fields. The next question will be, who’s on the committee? It could be that it’s easier to get a nonaligned committee for a guideline on bone infections than for something more controversial such as nudge interventions.

Flood/cyclone risk and the all-coastal-cities-are-equal-narrative

Palko writes:

A common, perhaps even the standard framing of rising sea levels is that it’s a existential threat for all coastal cities, and while I understand the desire not to downplay the crisis, this isn’t true. For cities with relatively high elevations like Los Angeles (a few low-lying neighborhoods, but most of it hundreds and some of it thousands of feet above sea-level) or cities with at least moderate elevations and little danger from tropical cyclones (like almost all major cites on the West Coast), we are talking about a problem but not a catastrophe . . . the real tragedy of this framing is not that it overstates the threat to the West Coast, but that it dangerously understates the immediate and genuinely existential threat to many cities on the East and Gulf Coasts. While New York City is not in danger of total oblivion the way Miami or Jacksonville are, it is far from safe from the threats associated with rising sea levels. . . . This is one of the things that makes the following New York Times article from a while back so strange.

An estimated 600 million people live directly on the world’s coastlines, among the most hazardous places to be in the era of climate change. . . . Many people face the risks right now. Two sprawling metropolitan areas offer a glimpse of the future. One rich, one poor, they sit on opposite sides of the Pacific Ocean: the San Francisco Bay Area (population 7 million) and metropolitan Manila (almost 14 million).

Their history, their wealth, and the political and personal choices they make today will shape how they fare as the water inevitably comes to their doorsteps.

Palko continues:

The New York Times felt the need to go all the way to San Francisco to do the story despite the fact that New York City has more people, lower elevation, and faces a far, far greater risk from tropical cyclones. This is not quite as bad as the San Francisco Chronicle doing features on earthquakes and wildfire smoke and using NYC as one of the two examples, but it’s close. . . .

The different elevations of Manila and San Francisco and how they affect the impact of rising sea levels is largely undiscussed. There is exactly one mention of tropical storms, none whatsoever of tropical cyclones, and the fact that certain areas are more vulnerable than others is almost completely ignored. All coastal cities are treated as effectively interchangeable. . . .

The all coastal cities are equal narrative embraced by the New York Times is extraordinarily dangerous. It inevitably underplays the to cities from New York all the way to Houston along the coast, particularly in Florida . . .

I continue to think that Mark Palko and David Weakliem should have columns in the Times. Not that they’re perfect, but both of them seem to have the ability to regularly see through the fog of news media B.S. that surrounds us.

Where does the news media B.S. come from? Some of this is bias, some is corruption, but I think a lot is simple recycling of narratives that for one reason or another are appealing to our tale-spinners.

A proposal for the Ingram Olkin Forum on Statistics Serving Society

This email came in from the National Institute of Statistical Sciences:

Ingram Olkin (S3) Forums: Call for Proposals

The Statistics Serving Society (S3) is a series of forums to honor the memory of Professor Ingram Olkin. Each forum focuses on a current societal issue that might benefit from new or renewed attention from the statistical community. The S3 Forums aim to bring the latest innovations in statistical methodology and data science into new research and public policy collaborations, working to accelerate the development of innovative approaches that impact societal problems. As the Forum will be the first time a particular group of experts will be gathered together to consider an issue, new energy and synergy is expected to produce a flurry of new ideas and approaches.

S3 Forums aim to develop an agenda of statistical action items that are needed to better inform public policy and to generate reliable evidence that can be used to mitigate the problem.

Upcoming Forum

Advancing Demographic Equity with Privacy Preserving Methodologies

Previous Forums Included

Police Use of Force
COVID and the Schools: Modeling Openings, Closings and Learning Loss
Algorithmic Fairness and Social Justice
Unplanned Clinical Trial Disruptions
Gun Violence – The Statistical Issues

Here’s my proposal. A forum on the use of academic researchers to confuse people about societal harms. The canonical example is the cigarette industry hiring statisticians and medical researchers to muddy the waters regarding the smoking-cancer link.

Doing this for the Ingram Olkin Forum is perfect, because . . . well, here’s the story from historian Robert Proctor about an episode from the 1970s:

Ingram Olkin, chairman of Stanford’s Department of Statistics, received $12,000 to do a similar job (SP-82) on the Framingham Heart Study . . . Lorillard’s chief of research okayed Olkin’s contract, commenting that he was to be funded using “considerations other than practical scientific merit.”

The National Institute of Statistical Sciences is located at Research Triangle Park in North Carolina, in the heart of cigarette country, so it would be a perfect venue for such a discussion, especially with this connection to Olkin.

I’m not much of a conference organizer myself, so I’m putting this out there so that maybe one of you can propose this for an Ingram Olkin Forum. Could be a joint Olkin/Fisher/Zellner/Rubin forum. Lots of statistical heavyweights were involved in this one.

A forum on the use of academic researchers to confuse people about societal harms. What better way for statistics to serve society, no?

Dying children and post-publication review (rectal suppositories edition)

James Watson (this guy, not the guy who said cancer would be cured in minus 22 years) writes:

Here’s something that you may find of interest for your blog: it involves causal inference, bad studies, and post publication review!

Background: rectal artesunate suppositories were designed for the treatment of children with suspected severe malaria. They can be given by community health care workers who cannot give IV treatment (gold standard treatment). They provide temporary treatment whilst the kid gets referred to hospital. Their use is supported by (i) 1 moderately large RCT; (ii) our understanding of how severe malaria works; (iii) good pharmacological evidence that the suppositories do what they are supposed to do (kill parasites very quickly); and (iv) multiple large hospital based RCTs showing that artesunate is the best available drug.

The story: A group at Swiss TPH got a very large grant (19 million USD) to do a `deployment study’: basically look at what happened when the suppositories were rolled out in three countries in sub-Saharan Africa. This study (CARAMAL) asked the question: “Can the introduction of pre-referral QA RAS [quality-assured rectal artesunate] reduce severe malaria case fatality ratio over time under real-world operational circumstances in three distinct settings?” (see clinicaltrials.gov: NCT03568344). But they saw increases in mortality after roll-out, not decreases! In Nigeria, mortality went up 4 fold (16.1% versus 4.2%)! In addition, the children who got the suppositories were more likely to die than those who didn’t. These results led the WHO stopping deployment of these suppositories in Africa earlier this year. This is a really serious decision which will probably result in preventable childhood death.

The authors put their findings online last year as 10 pre-prints. In July we wrote a commentary on the overall decision to stop deployment and the reported analyses in the pre-prints. The main points were:
– No pre-specified analysis plan (lots of degrees of freedom of exact comparisons to make, and what to include as baseline variables). This is unusual for a big clinical study.
– Temporal confounding, COVID-19 being one small difference in post versus pre roll-out world….!
– Confounding by indication: comparisons of who got the suppositories versus who didn’t post roll-out are probably due to health workers giving them to the sicker children.
– Mechanistic implausibility of having a massive increase in mortality within 48 hours of giving the drug compared with not giving the drug (this just doesn’t fit with our model of the disease and the pharmacology).

Unsurprisingly (?) the authors did not comment on our piece, and their main paper was published in October with no apparent changes from the pre-print version…

The now published paper (BMC Med) uses pretty strong causal language in the abstract: “pre-referral RAS had no beneficial effect on child survival in three highly malaria-endemic settings”. Given the design and data available in the study, this causal language is clearly not justified. I emailed the authors to ask them for their study protocol (some things are unclear in the paper, like how they exactly recorded who got the suppositories and whether there was a possibility of recall bias). I also wrote my main concerns as a thread on twitter.

They answered:

“We saw that you have just publicly on Twitter implied that we conducted the CARAMAL study without a study protocol nor an analysis plan. You are essentially publicly accusing us of conducting unethical and illegal research. This was less than 24 hours after sending this email requesting us to share the study protocol.

Your Tweet adds to the tendentious and poorly-informed commentary in BMJ Global Health. We fail to see how this style of interaction can lead to a constructive discussion of the issues at the heart of the CARAMAL project.

We have provided all necessary information including the study protocol and the analysis plan to a panel of international experts gathered by the WHO to conduct a thorough evidence review of the effectiveness of RAS. We look forward to their balanced assessment and opinion.”

Basically they refused to share the study protocol. They also admit to reading the previous commentary which discussed the lack of analysis plan. I didn’t accuse them of not writing a study protocol or analysis plan, but not posting it with the paper (which is a fact). Most medical journals make you post the protocol with the publication.

Why this is important: pushback from various people has made the WHO convene an expert group to see whether the moratorium on rectal artesunate deployment was justified. They will be making a decision in February. The underlying study that caused this mess is poorly thought out and poorly analysed. It’s quite urgent that this gets reversed.

This seems pretty wack that they say they have a study protocol and an analysis plan (perhaps not a preanalysis plan, though) but they’re not making it public. What’s the big secret?

To put it another way, if they want to keep key parts of their research secret until the panel of international experts gives “their balanced assessment and opinion,” fine, but then what does it mean for them to be publishing those strong causal claims? Once the big claims have been published, I can’t see a good reason for keeping the protocol and analysis plans secret.

Also what about the issues of temporal confounding, confounding by indication, and mechanistic implausibility? These all seem like a big deal. It always seems suspicious when researchers get all defensive about post-publication criticism and then don’t even bother to address the key concerns. Kids are dying here, and they’re all upset about the “style of interaction”???

Or maybe there’s more to the story. The authors of the published article or anyone else should feel free to provide additional background in the comments. If anyone has the study protocol and analysis plan (or pre-analysis plan), feel free to post that too!

The discount cost paradox?

There must be some econ literature on this one . . .

Joseph Delaney writes that telehealth has been proposed as a solution to ER wait times, but it hasn’t worked so well in practice. Among other things, there are problems with call-center culture:

It made me [Delaney] think of the time this month that there were reports of the 911 number in Toronto asking for call back numbers after a medical emergency. Even if overstated, it really does bring to light the key problem with telehealth—that call center culture is famously customer-hostile.

A number of years back I had a problem with my cable company. Like many foolish persons, I called the cable company and spent 2 hours on hold. After I was told that nothing could be done, I asked if there was anybody I could speak with that had more authority to deal with the issues. I was then placed on hold again. Several hours later a message played saying that the call center was closing and disconnected me. This was an infuriating experience and there was simply no accountability even possible. So the next day I made the long trek to the customer service center, waited in line for about an hour, and then had the problem actually fixed. No part of this experience made me like the company more, but the call experience was terrible.

Recently, I have been constantly hearing “call volumes are unexpectedly high” recordings every time I call a place like a bank or the University travel agent. As a person who once worked in customer service telemarketing call volume forecasting, I even tried times and days that are notoriously light for call volume. No luck.

So the central challenge of telehealth is how to break with the cost-cutting culture that values customer wait times at zero (or even seems to see them as a good thing). You can only redirect from the Emergency Room via telehealth triage if it is relatively quick (let us say an hour, maximum). Because you get no triage credit at the ER for having called telehealth, so if the answer is “go to the ER” but you have lost 4 hours on the phone then that is going to quickly teach everyone not to call telehealth lines.

With pediatric ERs reporting wait times as long as 15 hours, you can see the value of telehealth if it can keep children out of the queue and free up capacity. But that really requires that it be agile (why wait 4 hours as a prelude to waiting 15) and able to do things like prescribe. I know that RSV is an atypically severe phase, but at some point the default needs to be that there are a lot of respiratory viruses running around and we should plan around that.

That gets me to my last pet peeve about telemedicine, which that you need to be able to provide helpful interventions. In a recent covid burst, I had a family member use a telemedicine provider to ask about paxlovid only to be told that it could not be prescribed by phone but that it required an in-person visit. Yes, the plan really was for the infectious person to sit in the waiting room of a walk-in clinic for hours so that the prescription could be written by a person able to see the patient. Now, whether or not treating covid with paxlovid was a good idea is a different question but the issue was that these policies make calling first seem like a bad plan, as you waited hours for an appointment making it much less likely that you can successfully get seen at a walk-in clinic with a time-sensitive health issue.

Which is the opposite of what you want people to do, frankly.

Without solving these cultural issues of how we treat in-calls and how we treat patients, we are not going to be able to really move the demand side for ERs.

Beyond the special difficulties of medicine, this all reminded me of a well-known problem with parole and probation. Given modern technology, these should be more effective than prison and much much cheaper, but they get such low funding that parole and probation officers are overwhelmed. It’s a disturbing paradox that when an alternative solution is cheaper, it gets underfunded so as to not be effective.

That all said, my own health plan’s telehealth system has been working well for me. So I think it can work well if the financial incentives are in the right place.

With the prison thing, I guess the problem is that there are political incentives to spend lots of $ on prisons, not such incentives for parole/probation. Also there may be legal reasons why it’s easy to lock people up but not so easy to implement effective parole/probation.

What do economists say about this?

So here’s my question. Is this a general thing? The idea that when there’s a cheaper and more effective solution out there, it gets done too cheaply and then it’s no longer effective? This would seem to violate some Pareto-like principles of economics, but it happens often enough that I’m thinking there must be some general explanation for it?

“Banned from presenting research to the organization for the next two years” . . . how does that happen?

Paul Alper points us to a news article that reports:

Oz research rejected from 2003 surgery conference, resulted in 2-year ban

In May 2003, Mehmet Oz was the senior author on a study that explored a hot topic at the time . . . But Oz was forced to withdraw his work and was banned from presenting research to the organization for the next two years, according to seven people familiar with the events, whose account of his ban was confirmed by the Oz campaign. Oz is now the Republican nominee for U.S. Senate in Pennsylvania.

He was also prohibited from publishing his work in the society’s medical journal for the same period of time, according to the people familiar with the events, four of whom recalled details of the controversy on the record.

“My understanding is it was the lack of really solid statistical analysis that called everything into question,” said one person who was sympathetic to Oz, saying he simply ran into funding and deadline problems.

My reply: Sure, why would anyone think that just cos someone’s a good surgeon, that they’d have any idea how to conduct research? I know lots of people who are really good at research but it’s not like they could do surgery.

The interesting thing to me is that last paragraph of the above quote, as it seems to reflect the attitude that statistics is just a bunch of paperwork, or a set of hoops to jump through. That attitude that researchers have, that their ideas are evidently correct and the statistics is just some rubber stamp.

With Oz, though, there’s another twist, given his later record of promoting miracle cures. It’s hard to imagine he really believes in all the things that he promotes—or maybe “belief” isn’t really the point; perhaps he just has a very general belief in the power of placebo, in which case anything he recommends would be automatically legit. Indeed, the more people pay for it, the more effective the placebo power is. Kinda like academic funding!

Pathfinder, Causality, and SARS-CoV-2 talks in Paris this week

Tuesday, 11 October 2022

Andrew and I both have talks tomorrow, so I’m bummed I have to miss his. I’ll be talking about Pathfinder at the Polytechnique:

Hi! PARIS Seminar – Bob Carpenter, 11 October 2022

As you may recall from his earlier post, Andrew’s giving a talk on causal inference the same day at Marie Curie in the center of Paris.

Thursday, 13 October 2022

On Thursday, I’ll be giving a talk on SARS-CoV-2 diagnostic testing to France Mentre’s group at Paris Diderot / INSERM. Here’s the title and abstract:

Test Site Variation and Sensitivity Time Trends in Covid Diagnostic Tests

I’ll present two relatively simple models of Covid test sensitivity and specificity. In the first, sensitivity and specificity are allowed to vary by test site. We use calibration tests at sites for known positive or known negative cases to develop a hierarchical model to predict accuracy at a new site. I’ll present results for the data from Santa Clara early in the pandemic. This first model can act as a component in any diagnostic study, much like a PK model for a drug can be coupled with a range of PD models. In the second model, sensitivity is allowed to grow from infection to a peak at symptom onset and decay after that. I’ll provide several parameterizations and show results for different cohorts of patience. I’ll discuss coupling the second model with time trends of infection fatality and hopspitalization in order to infer population infection levels. Theoretically, these models are easy to combine, but practically speaking, we do not have a good data set to fit a joint model without very strong assumptions.

The first model is joint work with Andrew Gelman (Columbia University). The second model is joint work with Tom Ward (UK Health Security Agency).

The location is

2.30 pm, Oct 13, University Paris Diderot, 16 rue Henri Huchard, room 342

and apparently you have to leave ID with security in order to access floor 3 (it’s in the medical school).

Stock prices, a notorious New York Times article, and that method from 1998 that was going to cure cancer in 2 years

Gur Huberman writes:

Apropos your blogpost today, here’s a piece from 2001 that (according to a colleague) shows that I can write an empirical paper based on a single observation.

Gur’s article, with Tomer Regev, is called “Contagious Speculation and a Cure for Cancer: A Nonevent that Made Stock Prices Soar” and begins:

A Sunday New York Times article on a potential development of new cancer-curing drugs caused EntreMed’s stock price to rise from 12.063 at the Friday close, to open at 85 and close near 52 on Monday. It closed above 30 in the three following weeks. The enthusiasm spilled over to other biotechnology stocks. The potential breakthrough in cancer research already had been reported, however, in the journal Nature, and in various popular newspapers—including the Times—more than five months earlier. Thus, enthusiastic public attention induced a permanent rise in share prices, even though no genuinely new information had been presented.

They argue that this contradicts certain theories of finance:

A fundamentals-based approach to stock pricing calls for a price revision when relevant news comes out. Within this framework it is experts who identify the biotechnology companies whose pricing should be most closely tied to do the price revision. These experts follow Nature closely, and therefore the main price reaction of shares of biotechnology firms should have taken place in late November 1997, and not been delayed until May 1998.

I’m not going to disagree with their general point, which is reminiscent of Keynes’s famous analogy of stock pricing to a beauty contest.

Huberman and Regev quote from the Sunday New York Times article that was followed by the stock rise:

Kolata’s (1998) Times article of Sunday, May 3, 1998, presents virtually the same information that the newspaper had reported in November, but much more prominently; namely, the article appeared in the upper left corner of the front page, accompanied by the label “A special report.” The article had comments from various experts, some very hopeful and others quite restrained (of the “this is interesting, but let’s wait and see” variety). The article’s most enthusiastic paragraph was “. . . ‘Judah is going to cure cancer in two years,’ said Dr. James D. Watson, a Nobel Laureate . . . Dr. Watson said Dr. Folkman would be remembered along with scientists like Charles Darwin as someone who permanently altered civilization.” (p. 1) (Watson, of The Double Helix fame, was later reported to have denied the quotes.)

And more:

In the May 10 issue of the Times, Abelson (1998) essentially acknowledges that its May 3 article contained no new news, noting that “[p]rofessional investors have long been familiar with [ENMD’s] cancer-therapy research and had reflected it in the pre-runup price of about $12 a share.” . . . On November 12, King (1998), in a front page article in the Wall Street Journal, reports that other laboratories had failed to replicate Dr. Folkman’s results. ENMD’s stock price plunged 24 percent to close at 24.875 on that day. But that price was still twice the closing price prior to the Times article of May 4!

They conclude:

To the skeptical reader we offer the following hypothetical question: What would have been the price of ENMD in late May 1998 if the editor of the Times had chosen to kill the May 3 story?

I feel like the whole Nobel prize thing just makes everything worse (see here, here, here, and here), but I just wanted to make two comments regarding the effect of the news story on the stock price.

First, the article appearing more prominently in the newspaper does provide some information, in that it represents the judgment of the New York Times editors that the result is important, beyond the earlier judgment of the researchers to write the paper in the first place, the journal editors to publish the article, and the Times to run their first story on the topic. Now, you might say that the judgment of a bunch of newspaper editors should count as nothing compared to the judgment of the journal, but (a) journals do make mistakes (search this blog on PNAS), and (b) Nature and comparable journals publish thousands of articles on biomedical research each year, and only some of these make it to prime slots in a national newspaper. So some judgment is necessary there.

The second point is that, yeah, James Watson is kind of a joke now, but back in 1998 he was still widely respected as a scientist, so his “Judah is going to cure cancer in two years” line, whether or not it was reported accurately, again represents additional information. Even professional investors might take this quote as some sort of evidence.

So I think Huberman and Regev are leaning a bit too hard on their assumption that no new information was conveyed, conditional on the Nature article and the earlier NYT report.

Is Martha (Smith) still with us?

This post is by Phil Price, not Andrew.

It occurred to me a few weeks ago that I haven’t seen a comment by Martha (Smith) in quite a while…several months, possibly many months? She’s a long-time reader and commenter and often had interesting things to say. At times she also alluded to the fact that she was getting on in years. Perhaps she has simply lost interest in the blog, or in commenting on the blog…I hope that’s what it is. Martha, if you’re still out there, please let us know!

Calling all epidemiology-ish methodology-ish folks!

I just wanted to share that my department, Epidemiology at the University of Michigan School of Public Health, has just opened up a search for a tenure-track Assistant Professor position.

We are looking in particular for folks who are pushing forward innovative epidemiological methodology, from causal inference and infectious disease transmission modeling to the ever-expanding world of “-omics”.

We’ll be reviewing applications starting October 12th; don’t hesitate to reach out to me ([email protected]) or the search committee ([email protected]) if you have any questions!

Also – you can find the posting here.

Programmer position with our research group—at the University of Michigan!

Hey, midwesterners—here’s a chance for you to join the big team! It’s for our research project with Yajuan Si, Len Covello, Mitzi Morris, Jonah Gabry, and others on building models and software for epidemic tracking (see this paper, this paper, or, for the short version, this op-ed):

Summary

The Survey Research Center (SRC) at the University of Michigan’s Institute for Social Research (ISR) invites applications for a Full Stack Programmer with the Survey Methodology Program, in collaboration with the team of Stan developers at Columbia University.

Our multidisciplinary research team is involved in cutting-edge statistical methodology research and the development of computational algorithms, software, and web-based interfaces for application and visualization. We are looking for a Full-Stack Programmer to work on an innovative infrastructure to enable user-friendly implementation and reproducibility of statistical methods for population generalizability. The position provides an opportunity to work in an exciting and rewarding research area that constantly poses new technical and computational problems.

Responsibilities*

Develop, test, maintain, document, and deploy an interface for statistical inferences with sample data and result visualization, specifically the implementation of multilevel regression and poststratification.

Provide timely technical support during research deployments.
Create documentation and tutorials about the developed applications for use by interdisciplinary research teams.

Required Qualifications*

Bachelor’s or Master’s degree in Statistics/Computer Science/Information Science/Informatics or related technical discipline or a combination of education and software-development experience in a research or corporate environment to equal three (3) years.

Skills in R/Python programming.

Experience in data visualization and dashboard construction.

Experience with databases. Direct experience with demographic or geospatial data a plus.

Familiarity with C++ or Java. Knowledge with Stan programming a plus.

Track record of successful application development a plus.

Desired Qualifications*

Dashboard development experience preferred.

Work Locations

This position will be on-site at the University of Michigan offices in Ann Arbor, with flexible scheduling and remote opportunities made available within our overall Center policies. If relocation is required, we will allow for reasonable time and flexibility to make necessary arrangements.

This is an exciting research project and we’re hoping to find someone who can take a lead on the programming. Click through to the link for more details and to apply.

The Course of the Pandemic: What’s the story with Excess Deaths?

This post is by Phil Price, not Andrew.

A commenter who goes by “Anoneuoid” has pointed out that ‘excess deaths’ in the U.S. have been about as high in the past year as they were in the year before that. If vaccines work, shouldn’t excess deaths decrease?

Well, maybe not. Anoneuoid seems to think vaccines offer protection against COVID but increase the risk of deaths from other causes. Or something. I don’t much care about Anon’s belief system, but I do think it’s interesting to take a look at excess deaths. So let’s do that.

I went to https://stats.oecd.org and searched for ‘excess’ in the search field, which led me to a downloadable table of ‘excess deaths by week’ for OECD countries. “Excess deaths” means the number of deaths above a baseline (which I believe is the average over the previous ten years or something, perhaps adjusted for population; I don’t know the exact definition used for these data). “Excess deaths” over the past couple of years have been dominated by COVID deaths but that’s not the only effect: at least in the first year of the pandemic people were avoiding doctors and hospitals and thus missing out on being diagnosed or treated for cancer and heart disease and so on, suicides and car accident numbers have changed, and so on.

Below is a plot of excess deaths, by week since the beginning of 2020, in nine OECD countries that I selected somewhat haphazardly. You can download the data yourself and make more plots if you like.

“Excess Deaths” by week, as a percent of baseline deaths, in nine OECD countries, including the US.

If you had asked me a year or so ago, “what do you think will happen with US COVID deaths now that we have vaccines” I probably would have guessed something like what has happened in Italy or the UK or Belgium or France: there would be some ups and downs, but at substantially decreased magnitude. Instead, the US really stands out as being the only country that had high excess mortality prior to the vaccines and also has high mortality now.

But then, I also expected that just about everyone in the US would get vaccinated, which isn’t even close to being the case (about 20% of US residents haven’t gotten any COVID vaccination, and about 30% are not ‘fully vaccinated’…a term that is a bit misleading, perhaps, as the effects of the vaccines wears off for those of us who got our booster several months ago).

Also, there are competing factors — competing in the sense that some tend to make excess deaths increase while some make it decrease. Vaccines provide substantial protection, and doctors have gotten better at treating COVID, so those tend to lead to lower COVID deaths. But most people seem to have resumed normal life without many COVID precautions, presumably leading to higher infection rates than there would otherwise be. And of course there are still traffic accidents and suicides and drug overdoses and so on that could either increase or decrease compared to baseline.

I find the figure above really interesting. Here are a few things that stand out to me, in no particular order:

  • Denmark had no excess mortality through early 2021! That’s remarkable, they saved a lot of lives compared to the other countries.
  • Canada looks like the US in temporal pattern, which kinda makes sense, but with mortality at about half the US level.
  • I knew Italy got hit very hard early on, northern Italy especially, but hadn’t realized Belgium had it so bad. Jeez they had a terrible first year.
  • The time series in the U.S. is much smoother than in the other countries. Belgium, France, Sweden, the UK, Italy…they all had a big initial spike and then dropped all the way back to 0 excess deaths for a few weeks before the next spike. The U.S. went up and never came all the way back down, even briefly, until a few weeks ago. The U.S. has a much larger population and much larger geographic area than any single European country; perhaps the data on some small part of the U.S., like just New England or just Florida, it would look more like one of these other countries.
  • If the U.S. had matched the average excess mortality of the rest of the OECD countries, hundreds of thousands of Americans would be alive who are now dead.

I guess I’ll leave it to commenters to provide insights on all of this. Go to it!

This post is by Phil.

Some project opportunities for Ph.D. students!

Hey! My collaborators and I are working on a bunch of interesting, important projects where we could use some help. If you’d like to work with us, send me an email with your CV and any relevant information and we can see if there’s a way to fit you into the project. Any of these could be part of a Ph.D. thesis. And with remote collaboration, this is open to anyone—you don’t have to be at Columbia University. It would help if you’re a Ph.D. student in statistics or related field with some background in Bayesian inference.

There are other projects going on, but here are a few where we could use some collaboration right away. Please specify in your email which project you’d like to work on.

1. Survey on social conditions and health:

We’re looking for help adjusting to the methodological hit that a study suffered due to COVID shut down in the middle of implementing sampling design to refresh the cohort, as well as completing interviews for the ongoing cohort members. Will need consider sampling adjustments. Also they experimented with conducting some interviews via phone or zoom during the pandemic, as they were shorter than their regular 2 hr in-person interview, so it would be good to examine item missing and imputation strategy for key variables important for the analyses that are planned.

2. Measurement-error models:

This is something that I’m interested in for causal inference in general, also a recent example came up in climate science, where an existing Bayesian approach has problems, and I think we could do something good here by thinking more carefully about priors. In addition to the technical challenges, climate change is a very important problem to be working on.

3. Bayesian curve fitting for drug development:

This is a project with a pharmaceutical company to use hierarchical Bayesian methods to fit concentration response curves in drug discovery. The cool thing here is that have a pipeline with thousands of experiments and so we want an automated approach. This relates to our work on scalable computing, diagnostics, and model understanding, as well as specific issues of nonlinear hierarchical models.

4. Causal inference with latent data:

There are a few examples here of survey data where we’d like to adjust for pre-treatment variables which are either unmeasured or are measured with error. This is of interest for Bayesian modeling and causal inference, in particular the idea that we can improve upon the existing literature by using stronger priors, and also for the particular public health applications.

5. Inference for models identifying spatial locations:

This is for a political science project where we will be conducting a survey and asking people questions about nationalities and ethnic groups and using this to estimate latent positions of groups. Beyond the political science interest (for example, comparing mental maps of Democrats and Republicans), this relates to some research in computational neuroscience. It would be helpful to have a statistics student on the project because there are some challenging modeling and computational issues and it would be good for the political science student to be able to focus on the political science aspects of the project.

coffee-study-lower-dying-risk.html

Paul Alper points to this news article reporting on an observational study of 170,000 British people, finding that “Those who drank 1.5 to 3.5 cups of coffee per day, even with a teaspoon of sugar, were up to 30 percent less likely to die during the study period [approximately seven years] than those who didn’t drink coffee.”

A reduction of mortality by 30% seems implausibly high; on the other hand, it’s not impossible. Indeed, the news article quotes the journal editor who says, “It’s huge. There are very few things that reduce your mortality by 30 percent.” Alternatively, the news article states, “there may be other lifestyle factors contributing to that lower mortality risk among people who drink coffee, like a healthy diet or a consistent exercise routine.” I guess that smoking and drinking would be the biggest factors. I took a quick look at the research paper, and they do adjust for smoking and drinking, along with many other factors, but I don’t know exactly how that adjustment was done.

I’m not saying they did anything wrong; indeed, the authors of the research paper are in a damned-if-they-do, damned-if-they-don’t position, where if they find a small difference, people can say it’s no big deal, and if they find a big difference, people can say they don’t believe it. Still, they found a big difference, and I don’t know what to believe.

Reporting this sort of observational pattern seems like a good start, in any case. I’d like to reduce my own risk of dying in the next seven years by 30%, but I can’t quite bring myself to start drinking coffee every day.

So, yeah, I have nothing useful to add here. What do you expect?—It’s just a blog post! Seriously, I think it can be valuable for me to post sometimes just to express my ignorance and uncertainty. It’s a big world out there, and I have no insight, intuition, or expertise on this one.

Does having kids really protect you from serious COVID‑19 symptoms?

Aleks pointed me to this article, which reports:

Epidemiologic data consistently show strong protection for young children against severe COVID-19 illness. . . . We identified 3,126,427 adults (24% [N = 743,814] with children ≤18, and 8.8% [N = 274,316] with youngest child 0–5 years) to assess whether parents of young children—who have high exposure to non-SARS-CoV-2 coronaviruses—may also benefit from potential cross-immunity. In a large, real-world population, exposure to young children was strongly associated with less severe COVID-19 illness, after balancing known COVID-19 risk factors. . , ,

My first thought was that parents are more careful than non-parents so they’re avoiding covid exposure entirely. But it’s not that: non-parents in the matched comparison had a lower rate of infections but a higher rate of severe cases; see Comparison 3 in Table 2 of the linked article.

One complicating factor is that they didn’t seem to have adjusted for whether the adults were vaccinated–that’s a big deal, right? But maybe not such an issue given that the study ended on 31 Jan 2021, and by then it seems that only 9% of Americans were vaccinated. It’s hard for me to know if this would be enough to explain the difference found in the article–for that it would be helpful to have the raw data, including the dates of these symptoms.

Are the data available? It says, “This article contains supporting information online at http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.2204141119/-/DCSupplemental” but when I click on that link it just takes me to the main page of the article (https://www.pnas.org/doi/abs/10.1073/pnas.2204141119) so I don’t know whassup with that.

Here’s another thing. Given that the parents in the study were infected at a higher rate than the nonparents, it would seem that the results can’t simply be explained by parents being more careful. But could it be a measurement issue? Maybe parents were more likely to get themselves tested.

The article has a one-paragraph section on Limitations, but it does not consider any of the above issues.

I sent the above to Aleks, who added:

My thought is that the population of parents probably lives differently than non-parents: less urban, perhaps biologically healthier. They did match, but just doing matching doesn’t guarantee that the relevant confounders have truly been handled.

This paper is a big deal 1) because it’s used to support herd immunity 2) because it is used to argue against vaccination 3) because it doesn’t incorporate long Covid risks.

For #3, it might be possible to model out the impact, based on what we know about the likelihood of long-term issues, e.g. https://www.clinicalmicrobiologyandinfection.com/article/S1198-743X(22)00321-4/fulltext

Your point about the testing bias could be picked up by the number of asymptomatic vs asymptomatic cases, which would reveal a potential bias.

My only response here is that if the study ends on Jan 2021, I can’t see how it can be taken as an argument against vaccination. Even taking the numbers in Table 2 at face value, we’re talking about a risk reduction for severe covid from having kids of a factor of 1.5. Vaccines are much more effective than that, no? So even if having Grandpa sleep on the couch and be exposed to the grandchildren’s colds is a solution that works for your family, it’s not nearly as effective as getting the shot–and it’s a lot les convenient.

Aleks responds:

Looking at the Israeli age-stratified hospitalization dashboard, the hospitalization rates for unvaccinated 30-39-olds are almost 5x greater than for vaccinated & boosted ones. However, the hospitalization rates for unvaccinated 80+ is only about 30% higher.

If the outcome is that rare, then nothing much can be learned from pure statistics.

Alain Fourmigue writes:

You may have heard of this recent controversial study on the efficacy of colchicine to reduce the number of hospitalisations/deaths due to covid.

It seems to be the opposite of the pattern usually reported on your blog.

Here, we have a researcher making a bold claim despite the lack of statistical significance,
and the scientific community expressing skepticism after the manuscript is released.

This study raises an interesting issue: how to analyse very rare outcomes (prevalence < 1%)? The sample is big (n>4400), but the outcome (death) is rare (y=14).
The SE of the log OR is ~ sqrt(1/5+1/9+1/2230+1/2244).
Because of the small number of deaths, there will inevitably be a lot of uncertainty.
Very frustrating…

Is there nothing we could do?
Is there nothing better than logistic regression / odd ratios for this situation?
I’m not sure the researcher could have afforded a (credible) informative prior.

I replied that, yes, if the outcome is that rare then nothing much can be learned from pure statistics. You’d need a model that connects more directly to the mechanism of the treatment.

After-work socializing . . . alcohol . . . car crashes?

Tony Williams writes:

I thought of this while reading a complaint on LinkedIn about after-work socialization always involving alcohol (and tobacco, for those so inclined).

There seems to be a lot of neat things that can address a nasty issue, DUI. I think it could make sense as an undergrad/MS applied paper idea or thesis.

Spatial differences (probably state level) on relaxation of social distancing policies. Companies, my guess mostly bigger companies located in or near major metropolitan areas, going to hybrid work schedules. Those companies also having different work-from-office days (though, honestly, Wed and Fri is the common one I’ve seen, and that’s obviously intended in part to not allow employees to live further away and “only” commute for two consecutive days).

Feel free to run with it if you’d like. Seems like a lot of natural variation that could be exploited for causal inference. I’d be interested in any results.

My reply: I have no idea, but my guess is that, any such idea, someone has looked into and maybe there’s even a literature on it. The literature would probably be full of the usual problems of selection on statistical significance, but anyone interested in this idea could check out any literature and then start from scratch from the raw data. Could be something interesting there if the signal is large enough compared to the variation.