Perhaps the most contextless email I’ve ever received

Date: February 3, 2015 at 12:55:59 PM EST
Subject: Sample Stats Question
From: ** <**@gmail.com>

Hello,
I hope all is well and trust that you are having a great day so far. I hate to bother you but I have a stats question that I need help with: How can you tell which group has the best readers when they have the following information: Group A-130, 140, 170,170, 190, 200, 215, 225, 240, 250
Group B- 188, 189, 193, 193, 193, 194, 194, 195, 195, 196
Group A-mean (193), median (195), mode (170)
Group B- mean (193), median(193.5), mode (193)
Why?
This is for my own personal use and understanding of this subkject matter so anything you could say and redirect me would be greatly appreicated.
Any feedback that you could give me to help understand this better would be greatly appreciated.
Thanks,

35 thoughts on “Perhaps the most contextless email I’ve ever received

  1. After reading another great addition to the continuing saga of weird emails, I received this email the instant I clicked over to my email. Are you playing a trick on us, when we click the link you mail us a random message? That aside, if you’ve got game, I’ll forward the message.

    “Hello,

    We are Forestville Basketball team in Australia,We are looking for Point guard,Shooting guard,Small forward,Power forward and Center with small to mid sized contracts($1200-$6000/per month)Interested players should forward game highlight or film.

    We are interested to hear from any players who can add to the strength of our Men and Women. The Australian Basketball office will be pleased to assist players and parents with the complexities.Please get back to us We will contact you on receipt.

    Regards
    Mr R Dikel.
    Team Manager.”

  2. Ignoring the point of this post and just focusing on the question, it seems really stupid. Assuming the numbers represent reading scores, then the best five readers (out of twenty in both groups) are all in Group A. So, “which group has the best readers”? Group A has _all_ of the best readers.

    • Doesn’t look like a random sample. Rank ordered, group A has the five best plus the five worst, while group B has the middle ten. You can even eliminate the assumption, since if scores represent better reading or poorer reading, group A still has all of the extremes–no context needed.

  3. If ever there was a situation that calls for a KS test, this is it (though I suspect there aren’t enough n to differentiate the samples–maybe an exact KS test (there must be such a thing)).

      • For the good of my soul I thought I would run it through BEST. There was no credibility difference. The “”Distribution – Difference of Means” graph was roughly symmetric (zero was in the
        “95% Highest Density Interval”, so I guess it “credible”). Also, the “hairy caterpillar” graph was “hairy” (by my definition–I guess once you start allowing anyone to impose their private beliefs on a statistical procedure Rorschach interpretation of vague shapes is acceptable).

        I also ran a ks test:

        Two-sample Kolmogorov-Smirnov test

        data: d1 and d2
        D = 0.5, p-value = 0.1641
        alternative hypothesis: two-sided

        Warning message:
        In ks.test(d1, d2) : cannot compute exact p-value with ties

        which confirmed my suspicion that there would not be sufficient power to distinguish the groups at the .05 level. A more exact test might, however, which was my original thought. BEST, while not clearly WORST, appears to have no real advantages for this problem.

        • The difference of SD’s is well within the credibility interval, though the distribution does have a somewhat longer right-hand tail. The dynamic of this exchange has become typical of discussions I’ve had with Bayesians–have you tried this (yes, didn’t work), well, what about this (yes, it didn’t work). But there are indications (there always are–Andrew’s blog is devoted to researchers finding indications and running with them in almost a reductio ab absurdum manner).

          My overall point is that while Andrew’s email was without context it really asks a very fundamental statistical question–when are two distributions different? Is BEST really better than a KS test–this simple example suggests not.

        • Maybe you’re having trouble reading the BEST graphs? My run yielded

          SD Group 1:
          Mean: 46.6, 95% HDI (23.9, 75.0)

          SD Group 2:
          Mean: 2.87, 95% HDI (1.27, 4.70)

          Difference of SDs:
          Mean: 43.8, 95% HDI (21.3, 72.2)

          BEST clearly indicates that the two groups have very different dispersions. Not sure how you missed that…?

        • You’re right about the difference of SD’s–I read it correctly but misinterpreted it (for some reason I didn’t think that the mean of differences should be zero–my mind is going). Obviously simulation techniques give insight into the problem, but I ran Bartlett’s test and got

          Bartlett test of homogeneity of variances

          data: x and g
          Bartlett’s K-squared = 35.2465, df = 1, p-value = 2.905e-09

          As I recall the ks test does have low power but this is a pretty stunning display of that.

          which also shows the

        • (Of course I ran BEST before I pointed you to it — it’s trivial to do thanks to the fully awesome Rasmus Bååth. I was laconic because I thought it would be obvious that the information one gets from BEST is much richer than from the KS test.)

        • numeric:
          > As I recall the ks test does have low power but this is a pretty stunning display of that.
          I am always unsure how folks get type one error and power without knowing the groups were randomized?

          That is selective samples from the same population won’t have the same distribution – so that can’t be the Null?

          If its taken as the Null, the distribution of the test statistics will be very far from Uniform(0,1).

          If the Null is defined on the basis that both sample are from the same population (essentially taken as or deemed random samples from the same distribution) the amount of actual selection impacts the alternative. So what is the alternative?

          If its just the impact of selective sampling, why would one assume anything remotely additive as might be represented by a shift in the mean?

          Given one is usually interested in a effect of something after selection of the sample, how would the impact of selective sampling be _backed out_ (i.e. it could increase or decrease power)?

          But without a well defined alternative, one can’t calculate power.

          (OK now I am waiting for Andrew to post a section from the NY telephone directory to see how many comments that generates.)

        • We all agree that the centers are about the same but the spreads are different. And clearly the maximum and upper ranks are in group A. And the questioner most likely understands that. But they are in a situation where only the measures of central tendency are asked for. I would present the 75th percentile which in this case is 225 for group A and 195 for group B. This is the upper line of the box on the box plot and gives a sense of the ‘top of the middle.’

          This problem is not without context. People in education and healthcare and now asked to describe things quantitatively with few legacy tools or personal understanding. Clearly these are from two different types of classes or methods. I had to do this once with a biology problem and the scientist found it the best way to describe his specimens of particulates.

    • I feel like this ignores the question. The question is, “which group has the best readers.” The best readers are in Group A. There is nothing to be gained by trying to characterize the distributions the groups are drawn from.

  4. suppose the ‘scores’ represent the number of words the reader recalled reading 10 minutes after completing a passage. if the correct answer is 193, then group B has the ‘best’ readers.

    it’s pretty likely this was the process used to generate this data…

  5. Anscomble anyone?

    R code

    library(ggplot2)
    dd1 <- structure(list(group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("a",
    "b"), class = "factor"), score = c(130, 140, 170, 170, 190, 200,
    215, 225, 240, 250, 188, 189, 193, 193, 193, 194, 194, 195, 195,
    196)), .Names = c("group", "score"), row.names = c(NA, -20L),
    class = "data.frame")

    ggplot(dd1, aes(score, group, colour = group)) +
    geom_point()

    • jrkrideaku:

      That just spoils the fun and we would have missed out on the insightful comments.

      When folks start running named tests and recalling power rumors, I just want to reach for a graph or a simulation.

      (Actually ran Anscombe’s Quartet of ‘Identical’ Simple Linear Regressions (its in R) and distributed to folks to inoculate them against being bullied by output from fancy procedures.)

  6. How can you tell which group has the best readers? As is, you can’t. Without the context and more information, you can’t tell a thing. For instance, we’re all making the assumption that the scores are actually reading scores. That’s actually an assumption. They could be lat/lon sets for all we know :-)

    I think this is a great question for an advanced course, with the right answer being “what the hell is going on here?”

Leave a Reply

Your email address will not be published. Required fields are marked *