Megan said,

“The ease and availability of statistical models for means, as well as a general expectation of their use, has created a culture where we don’t even expect justification of the assumption that means are reasonable parameter of interest in the first place. In my experience, there is an deep seeded belief in scientists that means/averages are inherently meaningful and useful and there is no need to justify choice of models based on them.”

+1 This has become “That’s the way we’ve always done it”, “This is what’s standard in the field” — don’t think; just follow the standard procedures.

]]>Consistently, RBCs from COVID-19 patients had increased oxidized glutathione (GSSG), but not reduced glutathione (GSH);… In the light of this apparent oxidant stress-related signature, we hypothesized that RBCs from COVID-19 patients may suffer from impaired antioxidant enzyme machinery, perhaps triggered by degradation of redox enzymes in the context of ablated de novo protein synthesis capacity in mature RBCs

https://www.medrxiv.org/content/10.1101/2020.06.29.20142703v1

A great tragedy is that not a single report of vitamin c levels in a covid patient has been published. They will be deficient, and will benefit from correcting that deficiency.

But if you just give some random amount for a few days at a random point in the disease process instead of enough of it, before massive oxidative damage is done, until the patient is healthy again, then it will probably look like vitamin c doesnt do anything.

]]>I think we all agree that more information is preferrable to less information. You can go all way there and say that the number to treat is one when you treat only those that die without treatment and survive when treated. If that doctor knew who they are, they would probably design the trial differently. But he doesn’t, so it’s not clear to me what’s the problem with his approach.

]]>1) Metal alteady painted, so additional paint does nothing

2) Metal already rusted, so paint was applied too late

3) Pieces too large for the amount of paint applied, so part of it is uncovered and rusts anyway

4) Metal that gets dinged up a lot so the paint chips off, so it will rust unless you repaint it often.

Just taking an average including all these scenarios or, even worse, only studying scenarios like that is going to lead to the conclusion “paint doesnt work”.

That basically sums up 60 years of vitamin c research, we are seeing it with HCQ for covid, etc.

The average person does not exist, so why almost all medical research directed at treating them?

]]>And not just generation. Whether or not it is made in large enough quantities.

]]>@Adam thanks I will have to read up on that.

]]>@ Navigator: there may be all kinds of random processes in how the infection proceeds. Does your body start making a particularly effective antibody. What happens to your cytokine levels at what times. Do you get a bacterial co-infection. Et cetera, et cetera.

]]>> There is no basis in science for a stochastic action here.

I disagree. It is easy to think of such bases. To name one: whether or not your immune system hits on a particularly effective antibody. The generation of such is effectively random.

]]>Number to treat is indeed a common metric and one with some advantages in terms of interpretation. But it suffers from the exact same issues of average treatment effects. The number to treat (as you point out with the words “in expectation”) is only an average. It may take 10 on average, but that might mean 2 on average in a particular subgroup and 25 in another.

]]>What innumeracy? I don’t know where do you see it in the story, if you’re referring to it.

I don’t understand either how “25% of the treated patients see a 100% effect (surviving instead of dying)” is more “real world” than “survival increases from 30% to 65% when patients are treated”.

Surely the doctor understands that the desired “keep alive a patient that would have otherwised died” appears in some fraction of the treated population.

In medicine this is often put in “number needed to treat” terms: 25 percentage points increase in survival is equivalent to saying that four patients have to be treated to prevent one death (in expectation).

]]>I think the future for estimating heterogeneous treatment effects is to focus on observational datasets. That’s the only place we can easily find effective samples sizes of 10,000,000 or more.

]]>Jonathan:

In that case, there’s a separate treatment effect for each event, not just one per person.

]]>I think you have the shoe on the wrong foot: if the mechanisms and outcomes of a process are entirely deterministic but literally unknowable, and the central limit theorem means the distributions of outcomes will be identical to those of a stochastic process, then parsing whether it is, in fact, a stochastic process is philosophical, if not metaphysical. After all, a blind draw of a ball from an urn is entirely deterministic but also perfectly modeled as random.

]]>We would (or should) never claim that a mathematically describable distribution like N(0,1) is sufficiently accurate in describing most real-life phenomena that we can meaningfully interpret our estimate of the eighth moment. Nor would we claim that our effect size estimate is sufficiently precise that we can meaningfully interpret the value of its eighth significant digit. Isn’t it equally absurd to claim the inverse is true, that we can justify our model of a joint distribution of dozens of phenomena we barely understand, so long as we only use the first and second moments? Or that we can rely upon an effect size estimate that summarizes outcomes that depend on a dizzying array of factors not in the model, so long as we only interpret the first and second significant digits? If the answer is no, then the standardized effect estimate is not an estimate of a drug’s true (or even probable) effect on the virus in the human body. It’s a socially-constructed quantity, the mathematical basis of which provides a means of forming consensus among scientists about when we should conclude that a drug is worth prescribing to the public. One that feels more objective than our choice of an acceptable Type I error rate but is really just as arbitrary. It’s not just that all models are wrong: all models of systems as complex as the human body are “not even wrong.”

Now that’s a nihilistic alternative. And while I don’t buy it in practice–at a minimum, the law of large numbers ensures that the top-line results hold up most of the time for most of the people, which is better than waiting until we understand the mechanisms involved–it must be incontrovertibly true for SOME level of complexity, No?

]]>Yes, exactly. But putting the statistically equivalent threshold in the starkest “real world” terms may serve to break through the physician’s innumeracy.

]]>What do you do with a +1 -1 0 causal effect when there are multiple observations per subject? Joe got sick five times, and the drug cured him twice and he stayed sick three times.

]]>I’m really not understanding your objections either. People do things for a “percentage benefit” all the time. Seatbelts and bike helmets don’t stop you from dying in crashes, they make it less likely. If my doctor says to get my cholesterol down, it might not help me avoid heart trouble (and maybe I would never have had any either way), but it may lower my probability of it. (Not sure I believe this, as an aside.) People like reliable cars even though a “less reliable model” might never break down. The idea of raising my survival odds by a certain percentage seems perfectly lucid to me, entirely apart from whether I would think it was worth the effort and expense.

]]>Carlos,

The 10% in your example has been derived over many attempts on a population, not an individual.

However, one patient at a time doesn’t really benefit from that 10% (I mean it’s comforting to hear, sounds official, but the outcome is binary, good for comparison to other numbers only,with 20% being worse.

]]>Jonathan:

In Rubin’s potential outcome framework, the causal effect for each person exists and is simply y^treatment – y^control, which indeed is either +1, -1, or 0 in this example. It’s the same idea as saying that the length of your life exists; we just don’t know the number yet. But you make a good point that it can make sense in modeling to define some sort of intermediate quantity that includes some but not all sources of variation. I wonder if Imbens and Rubin discuss this in their book on causal inference.

]]>I was being facetious about quantum effects, but what about a drug which interacts with, say, your blood glucose level which might vary by a factor of two in a normal person depending on when their last meal was. Person A would have been cured if he had taken the drug on an empty stomach but is not on a full stomach. OK.. Maybe you monitored that particular timing, but what about some other neurotransmitter that you don’t even know about whose levels routinely vary? There is science to who gets cures, but if you’re not measuring the relevant variable, it looks stochastic.

]]>Dr: We could change the valve in your heart, but there is a 10% probability that you die during the operation.

Nr: That’s not a helpful figure, doc. I will either die or not, just tell me what it is!

]]>No it doesn’t, but chemicals do not need meaning to interact with each other.

You may have a philosophical question here. Just don’t mistake it for a scientific one.

]]>@Ben

But couldn’t it also mean, “your survival is stochastic, and this drug will increase your personal p(survival) by 0.25”?

How does one survive ‘better’ by 0.25? You either survive or not. There is no ‘increase’ part, as it is one off for each patient. Basically, I know what you meant, but it’s really not a helpful figure on an individual basis.

Unfortunately, some people also interpret these sorts of numbers as:

“My symptoms will be 25% less severe”.

]]>This goes back to the Rubin/Neyman causal model. You can do inference by modeling the potential outcomes as fixed (or, equivalently, conditioning on some subset/function of them) even if some stochastic process brings them about.

]]>I think (many of) those who compute and use these sorts of numbers understand the limitations. But the seeming alternatives are 1) nihilism, 2) smample sizes of 10,000,000 or more for every study

]]>“The action of the drug” doesn’t mean anything outside of our model.

]]>Isn’t that what he provided? He stated that he wanted to detect a 25 percentage point increase in survival. Say for the sake of the example that survival is 35% without treatment, it would be 60% with treatment. You could say that it has a 100% effect (saves people who would have died otherwise) in 25% of the treated population.

Unless I’m missing something, that’s precisely what the 25pp increase in survival means: an additional 25% of the treated population survives, i.e. 25% of them see the 100% treatment effect of their death being prevented.

(The preceding discussion assumes the treatment doesn’t kill anyone. That would complicate slightly the argument but not substantially, I think.)

]]>There is no basis in science for a stochastic action here. The action of the drug occurs without reference to our model of its action. If it acts differently in two situations, we infer there is a difference in the situations, not that the drug plays bingo.

The stars are indifferent to astromony.

]]>Perhaps a more effective framing would be to ask the doctor for what fraction of the population a 100% treatment effect would be worth detecting.

]]>My take was that the sentence loosely translated to “historically, the standard for clinical trials is to show a clear ATE”. I don’t think Andrew was necessarily saying it was good enough scientifically speaking. Given the probable noise and the number of possible factors (many of them latent), obsessing over a difference in 10% and 25% will provide few conclusive answers (though it is important to mine correlations as potential indicators). Hence, I can understand why significance in ATE is a “good enough” standard to move forward in a clinical setting while remaining unsatisfactory in a scientific one.

]]>Is that an example of “making optimistic assumptions in order to claim high power”? He could have said “yes” and claim high power anyway. Maybe then the example would be about claiming high power for a 10pp effect but not being interested in a 2pp effect.

]]>Although rereading the post I guess that may be the point? The “effect” is big or it isn’t. However, it’s unkown and unknowable (depends on a non-existing counterfactual). Once we start estimating things, one can be “a little bit pregnant”.

I don’t think that the fact that treatments won’t have the same effect in all patients is lost on anyone doing clinical trials, though.

]]>Tell Schrodinger’s cat

]]>I agree, it seems that some variability is not being embraced or some uncertainty is not being accepted. :-)

]]>Doesn’t this ultimately depend on how deterministic of a process you think the infection is in a particular individual? Saying the “the treatment effect on any given person is +1, -1, or 0” seems to suggest that if you reran the tape, so to speak, everyone would have the same outcome. I’m not saying that’s wrong, just saying it seems to be an assumption here. But is it the only possible assumption?

In other words “I am giving you a drug that increases your probability of survival by 25% could mean “there’s a 25% chance that you are in the group of patients who would die without the drug and survive with it”. But couldn’t it also mean, “your survival is stochastic, and this drug will increase your personal p(survival) by 0.25”?

]]>This makes sense.

> There is no underlying number representing the effect of the drug.

This makes sense (though I like the previous sentence better)

> Ideally one would like to know what sorts of patients the treatment would help, but in a clinical trial it is enough to show that there is some clear average effect.

What I don’t get is this. It’s like giving up the thought process that got us this far.

There’s lots of variation in sick patients -> we should recognize that or we’ll mess up with our ATEs -> but it’s enough to show a clear ATE

What do you mean by this? Is it just that we shouldn’t obsess over the difference in a 10% and a 25% effect if we suspect there’s some sort of population sensitivity? Or is it something else?

]]>