An interesting point came up in a comment thread the other day and you might have missed it, so I’ll repeat it here.

Dan Goldstein wrote to me:

Many times I’ve heard you say people should improve the quality of their measurements. Have you considered that people may be quite close to the best quality of measurement they can achieve?

Have you thought about the degree of measurement improvement that might actually be achievable?

And what that would mean for the quality of statistical inferences?Competent psychophysicists are getting measurements that are close to the best they can reasonably achieve. Equipment that costs ten times more might only reduce error by one thousandth. It’s the variation between people that gets ya.

I replied:

There are subfields where measurement is taken seriously. You mention psychophysics; other examples include psychometrics and of old-fashioned physics and chemistry. In those fields, I agree that there can be diminishing returns from improved measurement.

What I was talking about are the many, many fields of social research where measurement is sloppy and noisy. I think the source of much of this is a statistical ideology that measurement doesn’t really matter.

The reasoning, I think, goes like this:

1. Measurement has bias and variance.

2. If you’re doing a randomized experiment, you don’t need to worry about bias because it cancels out in the two groups.

3. Variance matters because if your variance his higher, your standard errors will be higher and so you’ll be less likely to achieve statistical significance.

4. If your findings are statistically significant, then retroactively you can say that your standard error was not too high, hence measurement variance did not materially affect your results.

5. Another concern is that you were not measuring quite what you thought you were measuring. But that’s ok because you’ve still discovered something. If you claimed that Y is predicted from X but you didn’t actually measure X, you were actually measuring Z, then you just change the interpretation of your finding: you’ve now discovered that Y is predicted from Z, and you still have a finding.

Put the above 5 steps together and you can conclude that as long as you achieve statistical significance from a randomized experiment, you don’t have to worry about measurement. And, indeed, I’ve seen lots and lots of papers in top journals, written by respected researchers, that don’t seem to take measurement at all seriously (again, with exceptions, especially in fields such as psychometrics that are particularly focused on measurement).

I’ve never seen steps 1-5 listed explicitly in the above form, but it’s my impression that this the implicit reasoning that allows many many researchers to go about their work without concern about measurement error. Their reasoning is, I think, that if measurement error were a problem, it would show up in the form of big standard errors. So when standard errors are big and results are not statistically significant, then they might start to worry about measurement error. But not before.

I think the apparent syllogism of steps 1-5 above is wrong. As Eric Loken and I have discussed, when you have noisy data, a statistical significant finding doesn’t tell you so much. The fact that a result is statistically significant does not imply that your measurement error was so low that your statistically significant finding can be trusted.

If all of social and behavioral science were like psychometrics and psychophysics, I’d still have a lot to talk about, but I don’t think I’d need to talk so much about measurement error.

tl;dr: Measurement is always important and should always be emphasized, but in some fields there is already a broad recognition of the importance of measurement, and researchers in those fields don’t need me to harangue them about the importance of measurement. But even they often don’t mind that I talk about measurement so much, because they recognize researchers in other subfields are not always aware of the importance of measurement, with the unawareness arising perhaps from a misunderstanding of statistical significance and evidence.

Ummm, I guess I just violated the “tl;dr” principle by writing a tl;dr summary that itself was a long paragraph. That’s academic writing for ya! Whatever.

The biggest issue with measurement is the difficulty in getting any funding at all to develop and validate (truly validate) measurement instruments. Most of the measures out there in the fields I work in were developed as sidelines from the “real” research the investigators were doing that needed an instrument. As such, they tend to be rather under-developed and under-tested.

Josiah Stamp, via Peter Kennedy: The Government are very keen on amassing statistics – they collect them, add them, raise them to the nth

power, take the cube root and prepare wonderful diagrams. But what you must never forget is that every one of those figures comes in the first instance from the village watchman, who just puts down what he damn pleases.

There is, in a lot of economics work, the notion that “I found the data in a reputable source… now I can analyze them as if they are accurate.”

Statistics literally means the science of the state.

While it is correct to say that measurement in chemistry can be very precise, it is often the case that the values of interest are not single values but are the results of calculations from several measured values. What this leads to is a propagation of errors so that the eventual number will not be anywhere near as precise as the individual measurements that are made. At this point it becomes important to understand where the highest level of variability comes from in order to target improvements in that. I find that the arguments about the importance of measurement retain value, even in fields where this might not obviously be the case.

Good points. Thanks.

“it is often the case that the values of interest are not single values but are the results of calculations from several measured values”

Perhaps but not necessarily. I’ve worked extensively with mass-spec data for major and trace elements, assay, microprobe data and isotope ratios and each data type has similar errors at similar concentrations.

There are lots of ways to reduce error. Splitting and running multiple analysis of samples is a common approach. In the mining industry it’s common to split samples and run multiples, as well as to insert standards and blanks into the sample stream, then analyze drill core in downhole sequence to ensure large values don’t bleed across samples.

Isotopic geochemists use a wide variety of checks on their data, commonly using multiple aliquots from a sample and obtaining data from different systems. The U-Pb system for example has two independent decay chains with different half-lives that generate highly accurate ages for rocks as old as 4.5billion.

jim – that all makes sense, thanks. Where I see the combination of errors is in aspects such as the calculation of a reaction yield, which combines weighings and analysis. Each weighing in itself combines two measures (the high and low value) so there is an error in each. There will be the weighings of the material inputs and outputs as well as the weighings of the sample and standard for the analysis. The peak size of the analytical peaks is estimated and then all of this is fed into a set of calculations. All of that ignores the sampling variability, which can be important (particularly if the material has an appreciable moisture content for example). What this means is that it is not unusual to have a standard deviation on a yield of 2-3%. This may not sound a lot, but a financially important difference could be less than that. I don’t believe that the discussion of the importance of measurement is restricted to social and psychological sciences, but is relevant to the more ‘old fashioned’ sciences as well.

Tom – good points. I don’t have any experience with industrial chemistry. I’m sure a small error can be financially relevant. Actually though I have had experience doing a similar thing: calculating grade / tonnage / total resource recovery in a mining block. In that case though the variation far outweighs the measurement error.