An alternative is to rely heavily on a parametric model like the one you wrote down, which, as Andrew, notes, is clearly misspecified.

For elucidation of all this in spatial RD settings like this, see https://arxiv.org/abs/1705.01677

]]>And corollary: Patient and physician beware! –reporting, publishing, and paying attention to data on incidence of side effects is important.

]]>I accept that the sentence was much too loose. Thank you for pointing this out. Cross-over and ‘n of 1’ trials also need randomisation but simply in terms of order of intervention of course.

When the result of any RCT (cross-over or alternative) shows efficacy then there will be a reluctance to accept placebo in future, which is when my suggested approach becomes relevant.

Note that in medicine the severity of ‘disease’ is very important. Mild conditions usually resolve spontaneously due to the body’s self restorative mechanisms and very severe conditions may be beyond rescue. Probability curves of outcomes conditional on baseline measurements tend to be sigmoid, being flat in very mild and severe regions. The best therapeutic opportunity, with larger differences between treatment and control tends to be around the steep segments (see https://onlinelibrary.wiley.com/doi/abs/10.1111/jep.12981). The steeper the middle segment of the curve, the better the ‘diagnostic’ test used to select patients for the treatment. The greater the distance between the treatment and control curve, the more effective the treatment (by modelling with a constant odds ratio).

In order to estimate such probabilities of outcome, the outcome values are dichotomised and the distribution of baseline results estimated in those with and without the ‘dichotomised outcome’. This is done for the treatment and control group to provide pairs of sigmoid curves. The cut-off for dichotomising the outcomes can also be varied to create ‘families’ of curves.

This is a different approach to using RCTs to assess mean outcome values and effect size in the form of the distribution of possible mean differences between the outcomes on treatment and control.

]]>See my recommended wording in the above comment: “A randomised controlled trial (RCT) is accepted as the best way of assessing the _average_ efficacy of a treatment in settings _where it is not feasible to apply both treatments to each person_.”

]]>If I were to replace ‘accepted’ with ‘widely regarded’ would this be sufficient to allow for different opinions about how to assess efficacy or do you have more fundamental objections?

Huw

]]>I disagree with the very first sentence of your abstract: “A randomised controlled trial (RCT) is accepted as the best way of assessing the efficacy of a treatment.”

]]>https://arxiv.org/abs/1808.09169. ]]>

prestige (n.)

1650s, “trick,” from French prestige (16c.) “deceit, imposture, illusion” (in Modern French, “illusion, magic, glamour”), from Latin praestigium “delusion, illusion” (see prestigious). Derogatory until 19c.; sense of “dazzling influence” first applied 1815, to Napoleon.

]]>“Prestigious Science Journals Struggle to Reach Even Average Reliability”

https://www.frontiersin.org/articles/10.3389/fnhum.2018.00037/full ]]>

Babies expected to be poor, ill, or unwanted are most likely to be aborted. So abortion lowers infant mortality, and increases life expectancy. Contraceptive use has the same effect. So does anything that will increase miscarriage rates.

]]>So my guess is that the variance we see is mostly noise. Take that, and fit 6 or 7 parameters to 25 or so data points… ]]>

> ages = runif(10e6,0,90)

> assignments = rbinom(10e6,1,p=.5)

> mean(ages[assignments==1])

[1] 45.00768

> mean(ages[assignments==0])

[1] 45.00057

> sd(ages[assignments==1])

[1] 25.98991

> sd(ages[assignments==0])

[1] 25.98389

> ages assignments mean(ages[assignments==1])

[1] 45.00768

> mean(ages[assignments==0])

[1] 45.00057

> sd(ages[assignments==1])

[1] 25.98991

> sd(ages[assignments==0])

[1] 25.98389

everything is identical to 4 sig figs. That’d be true for virtually ANYTHING you measure…. mean height of girls age 9 to 13, number of people with toe fetishes, whatever…

]]>It’s one thing if you have like 100 or 1000 people and you say that RNG assignment only “approximately” balances things… but with this size group everything would be balanced full stop.

]]>You write:

Sure because if they had all that “nothing changes rapidly about these people (that can affect their longevity) except the air they breathe, and all the other things we measured and controlled for because we had experimental control and we randomly assigned them” would be true right?

I mean that’s what a controlled experiment with randomized assignment and a large sample size ensures is true…

No. The controlled experiment with randomized assignment and large sample size assures that there will be approximate pre-treatment balance between exposed and control groups. At that point, it’s ok if all sorts of things change rapidly about these people.

]]>If you spend a lot of effort, like George Marsaglia’s career, to design and validate random number generators, then you actually run one of these validated random number generators and assign 30 million people to live on one or the other side of the river using it… you will guarantee through the actions of a validated random number generator, that the average of any quantity you like will be within epsilon of each other on either side of the river…

Contrast that with “I don’t know anything about these two groups of people and therefore I treat them as if they were randomly drawn from the same population”. There is nothing about the action of “treating them as if random” that actually makes it true… There is everything about running that RNG and forcing people to move that physically makes the assumption true.

]]>I mean that’s what a controlled experiment with randomized assignment and a large sample size ensures is true…

]]>I think your argument is a bit too strong here. After all, if they really had a controlled experiment with randomized assignment and a large sample size and no interference between units and a plausible measure of life expectancy, then I’d be inclined to believe the result.

]]>One assumption is logically equivalent to the other. No one thought it out really….

In some ways this is Jayne’s mind projection fallacy: “because I don’t know anything that’s dramatically different… therefore nothing is dramatically different”

]]>I think that’s a bit too cynical. People care. They just don’t know where to focus their attention. They end up focusing on heteroscedasticity or discreteness in test statistics or all sorts of peripheral things, but they don’t look hard at identification strategies or statistical significance because they’ve been taught that these are rock-solid bastions of rigor.

]]>Or we could generalize to most of science:

“because mostly no one cares if the method assumptions are met or not, mostly no one bothers to assess them at all, except possibly a little lip service to perfunctorily conclude that all assumptions are met”

]]>I don’t think the assumption was “Chinese people are all the same.” I think the assumption was “Regression discontinuity gives causal identification and statistical significance implies that you can treat a data-based claim as representing a larger truth.”

The authors and journal editors are wrong on both counts, but it’s hard to blame them, at least when the paper was written, given that this is how they were taught. I blame the authors more, years later, for not accepting the problem now that it’s been pointed out to them. Then again, Satoshi Kanazawa has never accepted that his sex-ratio statistics are essentially pure noise, etc etc etc.

]]>So, just to clarify Anoneuoid.. you wouldn’t recommend doing anything Andrew lays out in his book with Jennifer? Because I believe most of that material is not derived from first principles like your cell growth example. In social science you aren’t going to get anywhere without making some assumptions. Are you against Bayesian modelling in general insofar as it involves interpreting the coefficients in any causal way? Because certainly nearly every Bayesian model will have assumptions that we don’t “all agree are true”. Again, it seems like your standard will basically never be met.

I haven’t read it but if they are attempting to interpret the coefficients of arbitrary models then that is a waste of time. As I said, such models can still be used to make predictions. ML is an extreme example of this.

Anyone with experience in ml knows how much the coefficients (or other measures like feature importance) can change by adding/dropping features.

Now, like I said I don’t know what is in that book but you can see the basic idea mentioned on the blog here:

https://statmodeling.stat.columbia.edu/2017/01/04/30805/

Also, it is pretty much just a logical conclusion if you accept the multiverse and garden of forking paths concepts as shown in this paper:

https://statmodeling.stat.columbia.edu/2019/08/01/the-garden-of-forking-paths/

In that paper they discover there are hundreds of millions of different linear model specifications that could be deemed plausible, with coefficients of interest ranging from positive to negative. Later on they say the correct model is probably even linear anyway…

So if he does it in the book, it is inconsistent with what I read on this blog.

As for “it is too hard so your standard will never be met”, I was told the same thing regarding biomed and can verify that is completely false. Here are some good example papers:

https://statmodeling.stat.columbia.edu/2017/07/20/nobel-prize-winning-economist-become-victim-bog-standard-selection-bias/#comment-530272

The problem is more the data being collected isn’t the right type to learn anything (instead it is meant to check if two groups are different), and the researchers have no idea how to formally derive predictions from a set of assumptions so they have something to test. They often are not even trained in basic tools needed to study dynamic systems like running simulations and calculus. I certainly needed to teach myself.

]]>It feels like the only reason this got published is because “Chinese people are all basically the same except for which side of the river they live on” was basically an assumption that everyone involved, authors and reviewers, thought was ok.

Seriously.

]]>Regarding “high standards”: I think it’s fine to present inferences that depend strongly on unverified assumptions. It’s just important to make clear what these assumptions are. The problem comes when there are strong assumptions that are not understood by the authors, reviewers, and promoters of a paper.

]]>This is just an unreasonably high standard, Anoneuoid.

Well that is the standard for doing science. And it only seems high to people who have been getting away with BS.

Read the literature on your topic before the 1940s or so (whenever NHST was adopted) to see what to do.

There is nothing vacuous about doing science instead of wasting your time trying to interpret arbitrary numbers. It is on the level of astrology.

]]>Andrew’s own research on demographic corrections to death rates was good stuff: https://statmodeling.stat.columbia.edu/2017/07/11/criticism-economists-journalists-jumping-conclusions-based-mortality-trends/

I don’t sit around reading the social science literature looking for good examples. If you want to discuss examples of what you think of as good social sciences, I suggest you send examples to Andrew.

]]>Ie, if I assume cells always undergo binary division according to a certain rate I may come up with the number of cells after time t is:

N(t) = N_0*2^(r*t)

All the parameters have well defined meaning that could be checked in other ways (which, along with the N(t) vs t curve, tells us how good our assumptions are collectively). Not the case for an arbitrary regression coefficient.

]]>Anyone can easily prove it to themselves by adding/dropping variables or interactions and examining the results. Unless your model is correctly specified, the coefficients are arbitrary.

]]>If I do a regression discontinuity on “distance north of the 10 freeway” I will find out how the discontinuity in policy created by building the 10 freeway in the 1930’s caused the difference between South Central LA and Glendale…

]]>This is something I’ve thought a lot about. It seems that a lot of this sloppy science is driven by a push for rigor. Same with all those p-values: they’re supposed to represent rigorous Popperian reasoning.

Relatedly, econometricians are trained to not trust probability models. They want things to be nonparametric. When they fit the 5th-degree polynomial, it’s not because they think this is a good model, it’s because they think they’re following a robust procedure with good statistical processes.

And they’re trained not to trust simple observational studies. They demand causal identification.

The result is sometimes a careful vetting of assumptions and models, but other times it’s the “I got causal identification and I got statistical significance and I’m outta here” mentality. It’s sad: the goal of rigor leading to anti-rigor in practice.

Along with all that you have the social incentives: publication in top journals, grants, awards, professorships, etc. These people mostly didn’t get where they were by admitting they’ve ever been wrong.

]]>To me, this kind of analysis just screams “we have the answer because Science(tm)” and it doesn’t feel like the answer actually matters, it’s the fact of having done the right process. As far as the answer goes, it’s kind of “heads I win, tails you lose”

I mean, if they find “on average 5 years lost to pollution in river region” then they can say “because Science(tm)” and claim that everyone may have known all along, but now they have a specific number that no-one else could have calculated! Heroes!

And if they find “coal pollution doesn’t cause reduced life expectancy” then they can say “because Science(tm)” this may be unexpected, but that’s what happens when you make a discovery, and now we know we’re free to burn dirty coal to power human development!

This plays into your narrative about Econ sometimes showing how people are irrational for doing something that seems “obviously” good, and sometimes rational for doing something “obviously” bad.

All the good scientists I know are constantly asking questions like “what else might have caused this? what else should we control for? What additional data could we collect that would answer those questions? What kind of modifications to our model would let us account for the effect of X…”. Those are literally like the content of group meetings.

Anyone who asks those kinds of questions quickly discovers “well, we’re never going to be able to actually answer this question here” and doesn’t publish… Or if they care enough about the problem they spend multiple years building up the evidentiary base and making sure they get as close to the right answer as possible…. Suckers… spending 20 years of their life studying the effects of pollution, they could have just run an RD regression and had a paper in a couple of months!

:-(

So, if you’re right, that this is ignorance rather than rent-seeking. Where does that point the finger? I mean how did they get so ignorant?

]]>During the 1950–1980 period of central planning,

the Chinese government established free winter heating of homes

and offices via the provision of free coal for fuel boilers as a basic

right. The combustion of coal in boilers is associated with the re-

lease of air pollutants, and in particular emission of particulate

matter that can be extremely harmful to human health (4, 5). Due

to budgetary limitations, however, this right was only extended to

areas located in North China, which is defined by the line formed

by the Huai River and Qinling Mountain range (Fig. 1). E[…]

This paper’s RD design exploits the discrete increase in the

availability of free indoor heating as one crosses the Huai River

line (with no availability to the south and, in principle, complete

availability north of the line). Specifically, we separately test

whether the Huai River policy caused a discontinuous change in

TSPs at the river and a discontinuous change in life expectancy.

The respective necessary assumptions are that any unobserved

determinants of TSPs or mortality change smoothly as they

cross the river. If the relevant assumption is valid, adjustment

for a sufficiently flexible polynomial in distance from the river

will remove all potential sources of bias and allow for causal

inference.

https://www.pnas.org/content/early/2013/07/03/1300018110.abstract

I mean I don’t even understand why they would expect a discontinuity instead of the pollution spreading out due to the wind. It makes no sense to me…

And I see they assume no other important factor is discontinuous at the river. That is a highly questionable assumption since the Chinese government decided to use that as a geographical boundary. You really want us to believe that happened without political considerations?

Also, even if that assumption was correct how in the world does this method “remove all potential sources of bias”? Maybe driving is more common as you go north for some reason, or the equipment used to measure pollution (TSP) is more sensitive for some reason because it was deployed in a north-south fashion, etc, etc.

I don’t think this data is capable of answering their questions.

]]>I would just like for there to be a little less cheap talk from you and Daniel. You both criticize everything that comes out of social science, yet you have little quality research in the area to show for yourselves. Just saying. Gets tiring.

]]>Do you ever pause to think that maybe, just MAYBE, not every single academic is an imbecile? Have a little humility, jesus christ.

I was definitely taught to act like an imbecile* by academics.

* Test a strawman hypothesis, interpret arbitrary regression coefficients, etc

]]>