Sketching the distribution of data vs. sketching the imagined distribution of data

Elliot Marsden writes:

I was reading the recently published UK review of food and eating habits. The above figure caught my eye as it looked like the distribution of weight had radically changed, beyond just its mean shifting, over past decades. This would really change my beliefs!

But in fact the distributional data wasn’t available at earlier times, so a normal distribution was assumed, despite the fact that when the distributional data is in fact available, it looks little like a normal distribution.

This reminded me of your recent posts on scientific method as ritual: a table of the category shares for each decade would probably make the point better, but a modelled shifting distribution sends a signal of ability. Enough method to impress, but not enough method to inform.

Yes, there are lots of examples like this, even in textbooks. See the section “The graph that wasn’t there” in this article.

21 thoughts on “Sketching the distribution of data vs. sketching the imagined distribution of data

  1. How does anyone come to think weight can be Normally distributed??? I mean it’s got a hard cutoff at zero (actually somewhere above zero.)

    Maybe it can be approximated by a Skewed Normal distribution, it looks like the graph on the right might be a good fit.

    • People also think that intelligence is normally distributed, using the evidence of IQ scores inferred from SAT or military aptitude tests. Of course, test scores are zero bounded and the raw scores don’t actually look normally distributed, so researchers define a transformation from test scores to IQ scores that makes them into a Normal(100, 15) — because intelligence is supposed to be normally distributed.

      • > Everyone believes in it [the law of gaussian errors]: experimentalists believing that it is a mathematical theorem, mathematicians believing that it is an empirical fact

      • yup, in this case, is far worse. the models used to “measure” IQ (psychometricians beliefs about measurement have nothing to do with measurement in, say, physics) assumed intelligence is normally distributed, but because it is actually impossible to observe this unobservable construct (i.e. intelligence), even less to quantify it, it remains and will forever remain an assumption that people may believe or not. i believe it is BS

        • I don’t know the details of how IQ is measured or the degree to which it is valid, but if you think it’s not valid at all you can’t be living in the real world. Video footage from the capitol last week should give you more than enough evidence.

        • We all know what is meant by saying a person is “really smart” or “rather dumb”; I doubt ‘another somebody’ is claiming that everyone is equal in cognitive ability.

          But you can agree that people vary in cognitive ability without believing a single number can summarize this, whether you call it IQ or g or whatever. I’m in the camp that thinks IQ is not “real” in the sense of something that can be precisely measured, even in principle, but I think it is still sometimes useful as both a concept and a measurement. I think of it like “athleticism”. We could have everyone compete in the decathlon and define the results as measuring “AQ”, but some different scoring scheme or different set of events would result in different numbers that would be just as correct (or incorrect). Really great athletes would all be in the upper ranks and really bad athletes would all be in the lower ranks, for any reasonable AQ test, but there would be no point to arguing that the exact numbers or the exact order of athletes mean anything.

          As a friend points out, IQ measurements don’t have to be accurate measurements of a real ‘thing’ in order to be useful. For instance, childhood exposure to a certain amount of lead might decrease IQ by 3-6 points, which is a useful metric for judging just how bad the situation is. Analogously, if we knew that some dietary change increased decathlon scores by 50 points on average, that would be a useful thing to know.

        • Nothing to do with physics?

          In fact, the same measurement problems with latent psychological constructs arise in the context of electrons, quarks, dark matter, etc. These are all unobservable constructs, but we can build mathematical models that predict what measurements would look like if those constructs had a certain form. We can then check to see whether observed measurements are in accord with either the qualitative or quantitative features of those predictions.

          This is where there are some differences between physics and psychometrics, because generally the experimental situations in physics are so tightly controlled that they enable us to make enough assumptions to test quantitative, rather than purely qualitative (e.g., orderings) features of predictions.

        • wow! hold that one right there. particle physics and psychometrics? could you elaborate more? INSERT GREAT PHYCISIT NAME HERE must be turning in their graves

    • Dogen said, “How does anyone come to think weight can be Normally distributed??? I mean it’s got a hard cutoff at zero (actually somewhere above zero.)”

      The problem here is that the “population” of entities whose weight is being considered isn’t specified. If the population is “all things”, you are correct. But in most contexts where weight is studied, it is restricted to a population such as “adult males in country X”, or “female infants under age 2 months”, where weights have a lower bound greater than 0.

  2. Have any of you followed Michael Yeadon’s Twitter? He and his circles claim that the PCR test is wholly inappropriate to use for mass testing for COVID19 primarily b/c it yields high rate of false positives. Michael Yeadon also is skeptical of the claims of asymptomatic spread.

    https://twitter.com/MichaelYeadon3/status/1349327314140749825

    I posted this specific tweet as it contains a link to a probability blog too.

    If any of you have any comments, great.

    • I feel like there’s a lot of confused thinking on that blog. This really comes down to the definition of a true positive. The PCR test’s false positive rate is near zero if you consider the presence of the virus to be the target you’re testing for. If the target variable is “symptoms manifest eventually,” then there’s a very high false positive rate, and also the very notion of “asymptomatic” is incoherent—you either have it or you don’t.

      • Yes the content in that blog raises a lot of questions. I do infer that the PCR test is not an optimal public screening tool. But Michael Yeadon and others go further and suggest that it is a sham test. I haven’t yet fully absorbed the arguments they posit.

        The rapid antigen test looks to be a better contagiousness detection tool of asymptomatics & recovering symptomatics.

  3. Obviously the left and middle graph should have been guesstimated with a positive skew, like the real (most recent) data. But I’d expect that such data would be log-normally distributed or have some other smooth but modestly skewed shape.

    The “lumpiness” of the right-hand graph may be an artifact of:

    1. the rounding of height and weight measurements for the Health Survey for England 2018, not all of which were taken by a nurse (I think many were self-reports, which may have been based on memory or guess or wishful thinking); and

    2. the fact that there were under 5000 BMIs graphed.

  4. SleepGuy left out a category on his chart, FoodOrg made up a distribution on theirs. We owe alot to these folks. It really takes a special person to handle the pressures of science. Sometimes you gotta take the risk and make up some bullshit to burnish your rep as a leading authority so you can get them to stop questioning you. Really it’s only then that you can foist your personal beliefs on the unwitting populace under the guise of science and really help them.

  5. This is such insanity. If I am reading correctly, they go from a US guessmetimate (1955), to a English guesstimate (1980), to actual UK distribution of self-reported BMI. Embarrassing.

    And they could have thrown in the actual distribution of US BMI in 1923, which is reported here. But that would have messed up the story, because even 100 years ago the median US weight was nearly at the overweight cutoff.

  6. Sketching the distribution is always difficult even with full data: it requires density estimation, while the density is generally an infinite dimensional project. Presumably a related issue is whether users should be allowed to sketching the marginal distribution of posterior simulations from Stan at all.

    • > whether users should be allowed to sketching the marginal distribution of posterior simulations from Stan at all.
      But is that not just part of trying to understand the posterior or any distribution by using simulation (ideally repeatedly)? So repeatedly draw a sample and estimate the density.

Leave a Reply to DCE Cancel reply

Your email address will not be published. Required fields are marked *