Sketching the distribution of data vs. sketching the imagined distribution of data

Posted on January 14, 2021 9:32 AM by Andrew

Elliot Marsden writes:

I was reading the recently published UK review of food and eating habits. The above figure caught my eye as it looked like the distribution of weight had radically changed, beyond just its mean shifting, over past decades. This would really change my beliefs!

But in fact the distributional data wasn’t available at earlier times, so a normal distribution was assumed, despite the fact that when the distributional data is in fact available, it looks little like a normal distribution.

This reminded me of your recent posts on scientific method as ritual: a table of the category shares for each decade would probably make the point better, but a modelled shifting distribution sends a signal of ability. Enough method to impress, but not enough method to inform.

Yes, there are lots of examples like this, even in textbooks. See the section “The graph that wasn’t there” in this article.

21 thoughts on “Sketching the distribution of data vs. sketching the imagined distribution of data”

DCE on January 14, 2021 10:01 AM at 10:01 am said:

[Squinting to read grey type]: “…’1955 mean BMI interpolated from US historic BMI trends’…wut?”

Reply ↓
Dogen on January 14, 2021 11:59 AM at 11:59 am said:

How does anyone come to think weight can be Normally distributed??? I mean it’s got a hard cutoff at zero (actually somewhere above zero.)

Maybe it can be approximated by a Skewed Normal distribution, it looks like the graph on the right might be a good fit.

Reply ↓
- somebody on January 14, 2021 12:27 PM at 12:27 pm said:
  
  People also think that intelligence is normally distributed, using the evidence of IQ scores inferred from SAT or military aptitude tests. Of course, test scores are zero bounded and the raw scores don’t actually look normally distributed, so researchers define a transformation from test scores to IQ scores that makes them into a Normal(100, 15) — because intelligence is supposed to be normally distributed.
  
  Reply ↓
  - somebody on January 14, 2021 12:34 PM at 12:34 pm said:
    
    > Everyone believes in it [the law of gaussian errors]: experimentalists believing that it is a mathematical theorem, mathematicians believing that it is an empirical fact
    
    Reply ↓
  - another somebody on January 14, 2021 2:06 PM at 2:06 pm said:
    
    yup, in this case, is far worse. the models used to “measure” IQ (psychometricians beliefs about measurement have nothing to do with measurement in, say, physics) assumed intelligence is normally distributed, but because it is actually impossible to observe this unobservable construct (i.e. intelligence), even less to quantify it, it remains and will forever remain an assumption that people may believe or not. i believe it is BS
    
    Reply ↓
    - jim on January 14, 2021 11:36 PM at 11:36 pm said:
      
      I don’t know the details of how IQ is measured or the degree to which it is valid, but if you think it’s not valid at all you can’t be living in the real world. Video footage from the capitol last week should give you more than enough evidence.
    - Phil on January 17, 2021 12:48 AM at 12:48 am said:
      
      We all know what is meant by saying a person is “really smart” or “rather dumb”; I doubt ‘another somebody’ is claiming that everyone is equal in cognitive ability.
      
      But you can agree that people vary in cognitive ability without believing a single number can summarize this, whether you call it IQ or g or whatever. I’m in the camp that thinks IQ is not “real” in the sense of something that can be precisely measured, even in principle, but I think it is still sometimes useful as both a concept and a measurement. I think of it like “athleticism”. We could have everyone compete in the decathlon and define the results as measuring “AQ”, but some different scoring scheme or different set of events would result in different numbers that would be just as correct (or incorrect). Really great athletes would all be in the upper ranks and really bad athletes would all be in the lower ranks, for any reasonable AQ test, but there would be no point to arguing that the exact numbers or the exact order of athletes mean anything.
      
      As a friend points out, IQ measurements don’t have to be accurate measurements of a real ‘thing’ in order to be useful. For instance, childhood exposure to a certain amount of lead might decrease IQ by 3-6 points, which is a useful metric for judging just how bad the situation is. Analogously, if we knew that some dietary change increased decathlon scores by 50 points on average, that would be a useful thing to know.
    - gec on January 15, 2021 8:57 AM at 8:57 am said:
      
      Nothing to do with physics?
      
      In fact, the same measurement problems with latent psychological constructs arise in the context of electrons, quarks, dark matter, etc. These are all unobservable constructs, but we can build mathematical models that predict what measurements would look like if those constructs had a certain form. We can then check to see whether observed measurements are in accord with either the qualitative or quantitative features of those predictions.
      
      This is where there are some differences between physics and psychometrics, because generally the experimental situations in physics are so tightly controlled that they enable us to make enough assumptions to test quantitative, rather than purely qualitative (e.g., orderings) features of predictions.
    - Anonymous on January 15, 2021 9:42 AM at 9:42 am said:
      
      wow! hold that one right there. particle physics and psychometrics? could you elaborate more? INSERT GREAT PHYCISIT NAME HERE must be turning in their graves
- Martha (Smith) on January 14, 2021 9:48 PM at 9:48 pm said:
  
  Dogen said, “How does anyone come to think weight can be Normally distributed??? I mean it’s got a hard cutoff at zero (actually somewhere above zero.)”
  
  The problem here is that the “population” of entities whose weight is being considered isn’t specified. If the population is “all things”, you are correct. But in most contexts where weight is studied, it is restricted to a population such as “adult males in country X”, or “female infants under age 2 months”, where weights have a lower bound greater than 0.
  
  Reply ↓
  - jim on January 14, 2021 11:27 PM at 11:27 pm said:
    
    Good point; also this is BMI not weight, which is something like weight divided by height, so by definition nonzero.
    
    Reply ↓
    - Dogen on January 17, 2021 5:15 PM at 5:15 pm said:
      
      Um, that’s why I said the cutoff is somewhere above zero. Whether weight or BMI there is no conceivable tail to the left that looks Gaussian.
Sameera Daniels on January 14, 2021 12:43 PM at 12:43 pm said:

Have any of you followed Michael Yeadon’s Twitter? He and his circles claim that the PCR test is wholly inappropriate to use for mass testing for COVID19 primarily b/c it yields high rate of false positives. Michael Yeadon also is skeptical of the claims of asymptomatic spread.

https://twitter.com/MichaelYeadon3/status/1349327314140749825

I posted this specific tweet as it contains a link to a probability blog too.

If any of you have any comments, great.

Reply ↓
- Sameera Daniels on January 14, 2021 12:45 PM at 12:45 pm said:
  
  Oops! I posted it to the wrong topic. I meant to post it to the Routine Hospita-based SARS-CoV-2 testing…
  
  Apologies.
  
  Reply ↓
- somebody on January 14, 2021 1:06 PM at 1:06 pm said:
  
  I feel like there’s a lot of confused thinking on that blog. This really comes down to the definition of a true positive. The PCR test’s false positive rate is near zero if you consider the presence of the virus to be the target you’re testing for. If the target variable is “symptoms manifest eventually,” then there’s a very high false positive rate, and also the very notion of “asymptomatic” is incoherent—you either have it or you don’t.
  
  Reply ↓
  - Sameera Daniels on January 14, 2021 1:53 PM at 1:53 pm said:
    
    Yes the content in that blog raises a lot of questions. I do infer that the PCR test is not an optimal public screening tool. But Michael Yeadon and others go further and suggest that it is a sham test. I haven’t yet fully absorbed the arguments they posit.
    
    The rapid antigen test looks to be a better contagiousness detection tool of asymptomatics & recovering symptomatics.
    
    Reply ↓
David Pittelli on January 14, 2021 1:13 PM at 1:13 pm said:

Obviously the left and middle graph should have been guesstimated with a positive skew, like the real (most recent) data. But I’d expect that such data would be log-normally distributed or have some other smooth but modestly skewed shape.

The “lumpiness” of the right-hand graph may be an artifact of:

1. the rounding of height and weight measurements for the Health Survey for England 2018, not all of which were taken by a nurse (I think many were self-reports, which may have been based on memory or guess or wishful thinking); and

2. the fact that there were under 5000 BMIs graphed.

Reply ↓
jim on January 14, 2021 2:17 PM at 2:17 pm said:

SleepGuy left out a category on his chart, FoodOrg made up a distribution on theirs. We owe alot to these folks. It really takes a special person to handle the pressures of science. Sometimes you gotta take the risk and make up some bullshit to burnish your rep as a leading authority so you can get them to stop questioning you. Really it’s only then that you can foist your personal beliefs on the unwitting populace under the guise of science and really help them.

Reply ↓
Fafa on January 14, 2021 3:50 PM at 3:50 pm said:

This is such insanity. If I am reading correctly, they go from a US guessmetimate (1955), to a English guesstimate (1980), to actual UK distribution of self-reported BMI. Embarrassing.

And they could have thrown in the actual distribution of US BMI in 1923, which is reported here. But that would have messed up the story, because even 100 years ago the median US weight was nearly at the overweight cutoff.

Reply ↓
Yuling on January 14, 2021 11:34 PM at 11:34 pm said:

Sketching the distribution is always difficult even with full data: it requires density estimation, while the density is generally an infinite dimensional project. Presumably a related issue is whether users should be allowed to sketching the marginal distribution of posterior simulations from Stan at all.

Reply ↓
- Keith O'Rourke on January 15, 2021 7:26 AM at 7:26 am said:
  
  > whether users should be allowed to sketching the marginal distribution of posterior simulations from Stan at all.
  But is that not just part of trying to understand the posterior or any distribution by using simulation (ideally repeatedly)? So repeatedly draw a sample and estimate the density.
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Sketching the distribution of data vs. sketching the imagined distribution of data

21 thoughts on “Sketching the distribution of data vs. sketching the imagined distribution of data”

Leave a Reply Cancel reply