For non-parametric identification, the typical assumption is taking limits from both sides so that the differences in potential outcomes on each side are vanishing. I would say this is the standard assumption in recent empirical economics.

An alternative is to rely heavily on a parametric model like the one you wrote down, which, as Andrew, notes, is clearly misspecified.

For elucidation of all this in spatial RD settings like this, see https://arxiv.org/abs/1705.01677

]]>+1

And corollary: Patient and physician beware! –reporting, publishing, and paying attention to data on incidence of side effects is important.

]]>Sorry! The first sentence of the above second paragraph should have been ‘The result of any CT (cross-over or randomised) ….

]]>Andrew

I accept that the sentence was much too loose. Thank you for pointing this out. Cross-over and ‘n of 1’ trials also need randomisation but simply in terms of order of intervention of course.

When the result of any RCT (cross-over or alternative) shows efficacy then there will be a reluctance to accept placebo in future, which is when my suggested approach becomes relevant.

Note that in medicine the severity of ‘disease’ is very important. Mild conditions usually resolve spontaneously due to the body’s self restorative mechanisms and very severe conditions may be beyond rescue. Probability curves of outcomes conditional on baseline measurements tend to be sigmoid, being flat in very mild and severe regions. The best therapeutic opportunity, with larger differences between treatment and control tends to be around the steep segments (see https://onlinelibrary.wiley.com/doi/abs/10.1111/jep.12981). The steeper the middle segment of the curve, the better the ‘diagnostic’ test used to select patients for the treatment. The greater the distance between the treatment and control curve, the more effective the treatment (by modelling with a constant odds ratio).

In order to estimate such probabilities of outcome, the outcome values are dichotomised and the distribution of baseline results estimated in those with and without the ‘dichotomised outcome’. This is done for the treatment and control group to provide pairs of sigmoid curves. The cut-off for dichotomising the outcomes can also be varied to create ‘families’ of curves.

This is a different approach to using RCTs to assess mean outcome values and effect size in the form of the distribution of possible mean differences between the outcomes on treatment and control.

]]>This is in my experience a key point missed by a lot of trial-investigators and interpreters, personalized medicine notwithstanding. If there is a lot of treatment effect heterogeneity, which is often the case, clinical recommendations based on an estimate of average efficacy will correspondingly often be bad predictions of clinical effect.

]]>Huw:

See my recommended wording in the above comment: “A randomised controlled trial (RCT) is accepted as the best way of assessing the _average_ efficacy of a treatment in settings _where it is not feasible to apply both treatments to each person_.”

]]>Don’t listen to him, Huw – he’s an extremist.

]]>A randomised controlled trial (RCT) is accepted as the best way of assessing the _average_ efficacy of a treatment in settings where it is not feasible to apply both treatments to each person.

]]>If I were to replace ‘accepted’ with ‘widely regarded’ would this be sufficient to allow for different opinions about how to assess efficacy or do you have more fundamental objections?

Huw

]]>Not the best start! What should I have said?

]]>Huw:

I disagree with the very first sentence of your abstract: “A randomised controlled trial (RCT) is accepted as the best way of assessing the efficacy of a treatment.”

]]>https://arxiv.org/abs/1808.09169. ]]>

]]>prestige (n.)

1650s, “trick,” from French prestige (16c.) “deceit, imposture, illusion” (in Modern French, “illusion, magic, glamour”), from Latin praestigium “delusion, illusion” (see prestigious). Derogatory until 19c.; sense of “dazzling influence” first applied 1815, to Napoleon.

“Prestigious Science Journals Struggle to Reach Even Average Reliability”

https://www.frontiersin.org/articles/10.3389/fnhum.2018.00037/full ]]>

It would actually not be unreasonable to assume there’s people moving between the cities close to the river if the cutoff point (i.e. using the river as the cutoff) was determined based on factors such as political power or economic importance (cities could be different in terms of these characteristics). I haven’t read the paper so I can’t comment on that.

]]>Precisely. Abortions are not counted.

Babies expected to be poor, ill, or unwanted are most likely to be aborted. So abortion lowers infant mortality, and increases life expectancy. Contraceptive use has the same effect. So does anything that will increase miscarriage rates.

]]>Thanks. I guess things add up – crime, drug use, lack of access to care, moving away as soon as you can afford it… Still, 30 is a lot, I wonder if that gradient replicates in other US cities. Small area analyses can be fraught.

]]>The biggest modifiable factor affecting life expectancy at birth is probably the abortion rate.

]]>This recent article in he Economist (https://www.economist.com/united-states/2019/10/10/a-ride-along-chicagos-red-line) claims that the life expectancy varies by 30 years (!) from one end of a Chicago metro line to the other.

]]>So my guess is that the variance we see is mostly noise. Take that, and fit 6 or 7 parameters to 25 or so data points… ]]>

dang it, blog ate the code… here using the = instead of left arrow assignment

> ages = runif(10e6,0,90)

> assignments = rbinom(10e6,1,p=.5)

> mean(ages[assignments==1])

[1] 45.00768

> mean(ages[assignments==0])

[1] 45.00057

> sd(ages[assignments==1])

[1] 25.98991

> sd(ages[assignments==0])

[1] 25.98389

like just imagine ages are uniformly between 0 and 90 years, there are 10 million people… we split them randomly in two groups… and calculate the mean and sd of each sub-group:

> ages assignments mean(ages[assignments==1])

[1] 45.00768

> mean(ages[assignments==0])

[1] 45.00057

> sd(ages[assignments==1])

[1] 25.98991

> sd(ages[assignments==0])

[1] 25.98389

everything is identical to 4 sig figs. That’d be true for virtually ANYTHING you measure…. mean height of girls age 9 to 13, number of people with toe fetishes, whatever…

]]>Obviously there would be all sorts of person to person variation, which isn’t what I meant. I don’t know how many people live in this region, but I’m going to guess it’s at least 10 million. If you took a random number generator and assigned them to be on one side of the river vs the other, any measure you like, the average would likely be the same to 3 or 4 significant figures.

It’s one thing if you have like 100 or 1000 people and you say that RNG assignment only “approximately” balances things… but with this size group everything would be balanced full stop.

]]>Daniel:

You write:

Sure because if they had all that “nothing changes rapidly about these people (that can affect their longevity) except the air they breathe, and all the other things we measured and controlled for because we had experimental control and we randomly assigned them” would be true right?

I mean that’s what a controlled experiment with randomized assignment and a large sample size ensures is true…

No. The controlled experiment with randomized assignment and large sample size assures that there will be approximate pre-treatment balance between exposed and control groups. At that point, it’s ok if all sorts of things change rapidly about these people.

]]>In many ways I think this is a confusion of Bayesian vs Frequentist notions of random.

If you spend a lot of effort, like George Marsaglia’s career, to design and validate random number generators, then you actually run one of these validated random number generators and assign 30 million people to live on one or the other side of the river using it… you will guarantee through the actions of a validated random number generator, that the average of any quantity you like will be within epsilon of each other on either side of the river…

Contrast that with “I don’t know anything about these two groups of people and therefore I treat them as if they were randomly drawn from the same population”. There is nothing about the action of “treating them as if random” that actually makes it true… There is everything about running that RNG and forcing people to move that physically makes the assumption true.

]]>Sure because if they had all that “nothing changes rapidly about these people (that can affect their longevity) except the air they breathe, and all the other things we measured and controlled for because we had experimental control and we randomly assigned them” would be true right?

I mean that’s what a controlled experiment with randomized assignment and a large sample size ensures is true…

]]>Daniel:

I think your argument is a bit too strong here. After all, if they really had a controlled experiment with randomized assignment and a large sample size and no interference between units and a plausible measure of life expectancy, then I’d be inclined to believe the result.

]]>But “causal identification” If and Only If “nothing changes rapidly about these people (that can affect their longevity) except the air they breathe”

One assumption is logically equivalent to the other. No one thought it out really….

In some ways this is Jayne’s mind projection fallacy: “because I don’t know anything that’s dramatically different… therefore nothing is dramatically different”

]]>Jim,

I think that’s a bit too cynical. People care. They just don’t know where to focus their attention. They end up focusing on heteroscedasticity or discreteness in test statistics or all sorts of peripheral things, but they don’t look hard at identification strategies or statistical significance because they’ve been taught that these are rock-solid bastions of rigor.

]]>“because RD is an identification strategy, there was no need to assess its strong assumptions.”

Or we could generalize to most of science:

“because mostly no one cares if the method assumptions are met or not, mostly no one bothers to assess them at all, except possibly a little lip service to perfunctorily conclude that all assumptions are met”

]]>Daniel:

I don’t think the assumption was “Chinese people are all the same.” I think the assumption was “Regression discontinuity gives causal identification and statistical significance implies that you can treat a data-based claim as representing a larger truth.”

The authors and journal editors are wrong on both counts, but it’s hard to blame them, at least when the paper was written, given that this is how they were taught. I blame the authors more, years later, for not accepting the problem now that it’s been pointed out to them. Then again, Satoshi Kanazawa has never accepted that his sex-ratio statistics are essentially pure noise, etc etc etc.

]]>So, just to clarify Anoneuoid.. you wouldn’t recommend doing anything Andrew lays out in his book with Jennifer? Because I believe most of that material is not derived from first principles like your cell growth example. In social science you aren’t going to get anywhere without making some assumptions. Are you against Bayesian modelling in general insofar as it involves interpreting the coefficients in any causal way? Because certainly nearly every Bayesian model will have assumptions that we don’t “all agree are true”. Again, it seems like your standard will basically never be met.

I haven’t read it but if they are attempting to interpret the coefficients of arbitrary models then that is a waste of time. As I said, such models can still be used to make predictions. ML is an extreme example of this.

Anyone with experience in ml knows how much the coefficients (or other measures like feature importance) can change by adding/dropping features.

Now, like I said I don’t know what is in that book but you can see the basic idea mentioned on the blog here:

https://statmodeling.stat.columbia.edu/2017/01/04/30805/

Also, it is pretty much just a logical conclusion if you accept the multiverse and garden of forking paths concepts as shown in this paper:

https://statmodeling.stat.columbia.edu/2019/08/01/the-garden-of-forking-paths/

In that paper they discover there are hundreds of millions of different linear model specifications that could be deemed plausible, with coefficients of interest ranging from positive to negative. Later on they say the correct model is probably even linear anyway…

So if he does it in the book, it is inconsistent with what I read on this blog.

As for “it is too hard so your standard will never be met”, I was told the same thing regarding biomed and can verify that is completely false. Here are some good example papers:

https://statmodeling.stat.columbia.edu/2017/07/20/nobel-prize-winning-economist-become-victim-bog-standard-selection-bias/#comment-530272

The problem is more the data being collected isn’t the right type to learn anything (instead it is meant to check if two groups are different), and the researchers have no idea how to formally derive predictions from a set of assumptions so they have something to test. They often are not even trained in basic tools needed to study dynamic systems like running simulations and calculus. I certainly needed to teach myself.

]]>Andrew, I think it’s fine to present some assumptions and their consequences too, but some studies are better than other studies, and this study is no different than the South Central LA vs Glendale study I made up here:

It feels like the only reason this got published is because “Chinese people are all basically the same except for which side of the river they live on” was basically an assumption that everyone involved, authors and reviewers, thought was ok.

Seriously.

]]>So, just to clarify Anoneuoid.. you wouldn’t recommend doing anything Andrew lays out in his book with Jennifer? Because I believe most of that material is not derived from first principles like your cell growth example. In social science you aren’t going to get anywhere without making some assumptions. Are you against Bayesian modelling in general insofar as it involves interpreting the coefficients in any causal way? Because certainly nearly every Bayesian model will have assumptions that we don’t “all agree are true”. Again, it seems like your standard will basically never be met.

]]>Mino:

Regarding “high standards”: I think it’s fine to present inferences that depend strongly on unverified assumptions. It’s just important to make clear what these assumptions are. The problem comes when there are strong assumptions that are not understood by the authors, reviewers, and promoters of a paper.

]]>This is just an unreasonably high standard, Anoneuoid.

Well that is the standard for doing science. And it only seems high to people who have been getting away with BS.

Read the literature on your topic before the 1940s or so (whenever NHST was adopted) to see what to do.

There is nothing vacuous about doing science instead of wasting your time trying to interpret arbitrary numbers. It is on the level of astrology.

]]>I don’t criticize “everything that comes out of social science”, it’s unfortunate that this blog mostly discusses bad examples. I did send that study of diet, where they did a very careful job of measurement… I mean, maybe not perfect, but about two orders of magnitude better than what you see in most diet research. That was good stuff, I’m glad they’re following up with a replication and some other kinds of follow ups.

Andrew’s own research on demographic corrections to death rates was good stuff: https://statmodeling.stat.columbia.edu/2017/07/11/criticism-economists-journalists-jumping-conclusions-based-mortality-trends/

I don’t sit around reading the social science literature looking for good examples. If you want to discuss examples of what you think of as good social sciences, I suggest you send examples to Andrew.

]]>This is just an unreasonably high standard, Anoneuoid. The estimates from any statistical model will be conditional on the variables that are included in it. Even if none of the omitted variables *should* be in the model, there will be some noise that gets picked up and affects things slightly anyways due to finite samples. This is a remarkably vacuous point, the more I think about it. Which, I’ve come to expect from you.

]]>You can use any model for prediction, but it is a waste of time to attempt interpreting the coefficients/parameters unless they were derived from some principles that you are willing to accept as more or less true.

Ie, if I assume cells always undergo binary division according to a certain rate I may come up with the number of cells after time t is:

N(t) = N_0*2^(r*t)

All the parameters have well defined meaning that could be checked in other ways (which, along with the N(t) vs t curve, tells us how good our assumptions are collectively). Not the case for an arbitrary regression coefficient.

]]>What’s your point though? This is true of any model — if the model is misspecified then ya, the coefficients won’t map to reality. I don’t see why regression is any different. That doesn’t mean we shouldn’t use it, it just means we need to defend the model we are using based on theory or common sense. I ask again, what is your point good sir?

]]>Yes, essentially it doesn’t stand up to the slightest bit of scrutiny. The only difference between this and something like the standard biomed paper is the methods are easier to understand and the jargon is easier to parse for outsiders.

]]>*mino

]]>Sorry matt, but you seem incapable of understanding that regression coefficients are conditional on what is included in the model. Change the model, change the coefficients. This has been explained to you before, but you choose to argue with a strawman.

Anyone can easily prove it to themselves by adding/dropping variables or interactions and examining the results. Unless your model is correctly specified, the coefficients are arbitrary.

]]>Maybe people don’t stay in one place for 50 years. Maybe north of the river there are more paint factories adding to ozone smog. Maybe there are systematic racial or regional biases that affect many other political policies, giving favoritism to the north. Maybe the access to free coal makes the north region more attractive to elderly people moving in from rural areas. Maybe a factory pollutes the ground water north of the river. Maybe south of the river there are a lot more textile factories with high levels of dust and fiber pollution on the sewing floors….

If I do a regression discontinuity on “distance north of the 10 freeway” I will find out how the discontinuity in policy created by building the 10 freeway in the 1930’s caused the difference between South Central LA and Glendale…

]]>Daniel:

This is something I’ve thought a lot about. It seems that a lot of this sloppy science is driven by a push for rigor. Same with all those p-values: they’re supposed to represent rigorous Popperian reasoning.

Relatedly, econometricians are trained to not trust probability models. They want things to be nonparametric. When they fit the 5th-degree polynomial, it’s not because they think this is a good model, it’s because they think they’re following a robust procedure with good statistical processes.

And they’re trained not to trust simple observational studies. They demand causal identification.

The result is sometimes a careful vetting of assumptions and models, but other times it’s the “I got causal identification and I got statistical significance and I’m outta here” mentality. It’s sad: the goal of rigor leading to anti-rigor in practice.

Along with all that you have the social incentives: publication in top journals, grants, awards, professorships, etc. These people mostly didn’t get where they were by admitting they’ve ever been wrong.

]]>I guess that’s something. I mean, I’ve been in meetings where a proposed collaborator said something along the lines of “as long as we get to publish in Nature without any competitors to worry about we are on board”… didn’t matter what it was really…

To me, this kind of analysis just screams “we have the answer because Science(tm)” and it doesn’t feel like the answer actually matters, it’s the fact of having done the right process. As far as the answer goes, it’s kind of “heads I win, tails you lose”

I mean, if they find “on average 5 years lost to pollution in river region” then they can say “because Science(tm)” and claim that everyone may have known all along, but now they have a specific number that no-one else could have calculated! Heroes!

And if they find “coal pollution doesn’t cause reduced life expectancy” then they can say “because Science(tm)” this may be unexpected, but that’s what happens when you make a discovery, and now we know we’re free to burn dirty coal to power human development!

This plays into your narrative about Econ sometimes showing how people are irrational for doing something that seems “obviously” good, and sometimes rational for doing something “obviously” bad.

All the good scientists I know are constantly asking questions like “what else might have caused this? what else should we control for? What additional data could we collect that would answer those questions? What kind of modifications to our model would let us account for the effect of X…”. Those are literally like the content of group meetings.

Anyone who asks those kinds of questions quickly discovers “well, we’re never going to be able to actually answer this question here” and doesn’t publish… Or if they care enough about the problem they spend multiple years building up the evidentiary base and making sure they get as close to the right answer as possible…. Suckers… spending 20 years of their life studying the effects of pollution, they could have just run an RD regression and had a paper in a couple of months!

:-(

So, if you’re right, that this is ignorance rather than rent-seeking. Where does that point the finger? I mean how did they get so ignorant?

]]>I finally looked at the pollution vs latitude plot in figure 2, the curve doesn’t even fit. The pollution increases for ~5-10 degrees as you go north of the river. It isn’t a discontinuity as they predicted:

During the 1950–1980 period of central planning,

the Chinese government established free winter heating of homes

and offices via the provision of free coal for fuel boilers as a basic

right. The combustion of coal in boilers is associated with the re-

lease of air pollutants, and in particular emission of particulate

matter that can be extremely harmful to human health (4, 5). Due

to budgetary limitations, however, this right was only extended to

areas located in North China, which is defined by the line formed

by the Huai River and Qinling Mountain range (Fig. 1). E[…]

This paper’s RD design exploits the discrete increase in the

availability of free indoor heating as one crosses the Huai River

line (with no availability to the south and, in principle, complete

availability north of the line). Specifically, we separately test

whether the Huai River policy caused a discontinuous change in

TSPs at the river and a discontinuous change in life expectancy.

The respective necessary assumptions are that any unobserved

determinants of TSPs or mortality change smoothly as they

cross the river. If the relevant assumption is valid, adjustment

for a sufficiently flexible polynomial in distance from the river

will remove all potential sources of bias and allow for causal

inference.

https://www.pnas.org/content/early/2013/07/03/1300018110.abstract

I mean I don’t even understand why they would expect a discontinuity instead of the pollution spreading out due to the wind. It makes no sense to me…

And I see they assume no other important factor is discontinuous at the river. That is a highly questionable assumption since the Chinese government decided to use that as a geographical boundary. You really want us to believe that happened without political considerations?

Also, even if that assumption was correct how in the world does this method “remove all potential sources of bias”? Maybe driving is more common as you go north for some reason, or the equipment used to measure pollution (TSP) is more sensitive for some reason because it was deployed in a north-south fashion, etc, etc.

I don’t think this data is capable of answering their questions.

]]>ah the “arbitrary regression coefficients” hypothesis. That one never gets old. Anoneuoid can’t handle the fact that which group you omit changes the interpretation of the regression coefficient, even though the information it contains stays the same.

I would just like for there to be a little less cheap talk from you and Daniel. You both criticize everything that comes out of social science, yet you have little quality research in the area to show for yourselves. Just saying. Gets tiring.

]]>Do you ever pause to think that maybe, just MAYBE, not every single academic is an imbecile? Have a little humility, jesus christ.

I was definitely taught to act like an imbecile* by academics.

* Test a strawman hypothesis, interpret arbitrary regression coefficients, etc

]]>