I’ll review them. Thank you much.

]]>Sameera:

The Edlin factor is biggest for a one-time study. In our research we try to use internal replication where possible. You can see various examples on the published papers section of my website. (There are millions of examples from other researchers too; I just point to my own work first because that’s what I’m most familiar with.)

]]>*Statistical models will be useful (probably necessary) to evaluate how closely a particular data set and theoretical prediction correspond to one another, but the quantitative theory is prior to any confirmatory statistical modeling, though it may be usefully (though probably only partially) the product of exploratory statistical modeling.*

I don’t see the relevance of that quote right here, can you explain?

]]>“All models are wrong; some models are useful.” George E.P. Box :-)

]]>U would.

]]>*Another thing is that coming up with such predictions is not a statistical problem. The people looking for an answer there will not succeed.*

This is a very important point, I believe. Statistical models will be useful (probably necessary) to evaluate how closely a particular data set and theoretical prediction correspond to one another, but the quantitative theory is prior to any confirmatory statistical modeling, though it may be usefully (though probably only partially) the product of exploratory statistical modeling.

]]>I’ve been thinking the same for awhile, in fact that the “replication crisis” aspect is only the most obvious very tiny tip of the iceberg of problems. The vast majority is with misinterpreting measurements.

Eventually I came to see that there are too many alternate possibilities that can explain something vague like “increase/decrease in walking speed”. While in principle you could perhaps figure it out via different conditions, it is infeasible to run experiments that can distinguish between them all.

To check the explanation you simply need to get a quantitative prediction out of it (eg, “primed oldness theory says people in group A should walk ~2/3x as fast as group B”; “primed oldness theory says the variance of walking speed from day to day should be at least x%”; etc).

Put another way: observing results consistent with a theory that allows half of all possible values… is not a severe enough test of the theory. Another thing is that coming up with such predictions is not a statistical problem. The people looking for an answer there will not succeed.

]]>The more I think about the myriad problems with modern (social and medical) science, the more convinced I am that measurement is crucially important. As you say, it relates directly to design, but it is not only a design issue. Measurement is also related to reliability and validity in well-documented ways, as well as research degrees of freedom/the garden of forking paths. For example, is walking speed a valid and reliable measure of “primed oldness”? Is it the only dependent variable that was measured as part of studies of “primed oldness”?

Measurement also has direct implications for model construction issues that get discussed here a lot. You have rightly pointed out that choices about a model’s likelihood function are, like priors, at least partially subjective and very important for a model’s performance. Decisions about measurement also play a key role in determining what likelihood function does or does not make sense in a particular model.

I just read an interesting paper (pdf) that argues pretty compellingly that other (related) aspects of measurement are central to how results get (mis-)interpreted. The point in this paper is related to but, I think, distinct from validity, reliability, researcher df, and how measurement (partially) determines likelihood functions.

]]>An open ended discussion is welcome.

]]>