International Journal of Epidemiology versus Hivemind and the Datagoround

Posted on October 2, 2014 9:15 AM by Andrew

The Hivemind wins (see the comment thread here, which is full of detective work from various commenters).

As I wrote as a postscript to that earlier post, maybe we should call this the “stone soup” or “Bem” phenomenon, when a highly flawed work stimulates interesting, thoughtful discussion.

21 thoughts on “International Journal of Epidemiology versus Hivemind and the Datagoround”

B. C. on October 2, 2014 10:34 AM at 10:34 am said:

Hi.
It seems that your RSS feed ( http://feeds.feedburner.com/StatisticalModeling ) is not up to date (it is stuck on September 27th).
Best,

Reply ↓
B.C. on October 2, 2014 12:19 PM at 12:19 pm said:

Everything is fine now… may be it was my RSS reader…

Reply ↓
Daniel Lakeland on October 2, 2014 1:09 PM at 1:09 pm said:

I think “question” has a very plausible mechanism for the researchers having mechanically created their seasonal effect. The “data generating process” he uses (simulating how he thinks the researchers collected, entered and analyzed the data) produces a fit to the seasonal trend that is just TOO good.

“question” obviously is a pseudonym for someone who has wished to remain anonymous, but in an ideal world, this issue should be addressed, and it should be retracted if his analysis is right. Bringing this to the attention of journal editors seems to be potentially career damaging for anyone not well established. Andrew, how do you feel about leading a push-back on the journal based on the analysis done here on your blog?

Reply ↓
- Keith O'Rourke on October 3, 2014 7:50 AM at 7:50 am said:
  
  > potentially career damaging for anyone not well established.
  Unlikely unless you were/are some how involved with authors or their institution.
  
  Why not
  
  A. Write a short letter to the editor (half page) with an appendix going more into the details of “question”‘s suspected mechanism and simulation and send both concurrently to the editor and all authors.
  
  B. Spin out and develop a paper using this paper as a lead in example of the problem (many epidemiologists might really benefit from such a paper, even if the subject matter seems trivial to statisticians).
  
  You have acknowledged “question” here and likely only need repeat that in A or B.
  
  Both A and B can be career positive.
  An example of an A. from my career is http://onlinelibrary.wiley.com/doi/10.1002/sim.2115/abstract
  
  Now the positive was not the large citation count of 1, but rather one of the authors of the paper actually thanked me for providing them a reason to revisit their paper. Additionally, afterwards Altman and others told me “You should have done B – as a letter to the editor it will have no impact at all”.
  
  If A and or B fails, then yes it might get messy to do something.
  
  Reply ↓
  - question on October 3, 2014 8:21 AM at 8:21 am said:
    
    The best thing is to frame it in a positive light. Get the raw data and do “Reanalysis of Swedish MMC data” or something like that. I suspect that if we had the actual ages of each person to assess age distribution by birth month the fit would become nearly perfect (eg the distribution of ages for April/May births was bimodal with more older people, thus allowing for the observed average age with increased cancer rate). If this were the case it would constrain the “risk factors” for MMC greatly, possibly invalidating much of the research on this. The real question appears to be what mechanism would explain this (pseudo-)sigmoid age vs prevalence curve:
    
    http://s4.postimg.org/ypou2507x/CMM4.png
    
    Compare that to the 2001 US data shown in figure 10 in the below paper. It looks like the same pattern:
    http://www.davidrasnick.com/Cancer_files/Duesberg%202011%20speciation.pdf
    http://www.ncbi.nlm.nih.gov/pubmed/21666415
    
    Reply ↓
    - Keith O'Rourke on October 3, 2014 11:07 AM at 11:07 am said:
      
      Agree, but for figure 10 in the link, just getting near the end of the bridge?
      (See Figure 2 of http://www.medicine.mcgill.ca/epidemiology/hanley/Reprints/Turner-Hanley-JRSS-A.pdf)
- question on October 3, 2014 2:32 PM at 2:32 pm said:
  
  So I was thinking what else usually has a sigmoid shape? Growth. What is growing that plateaus around 25 years? People, specifically their skin. I found this paper providing some rough estimates of surface size:
  
  “Age changes in absolute and relative surface areas of the human body”
  http://www.sciencedirect.com/science/article/pii/S0007122657800228
  http://www.ncbi.nlm.nih.gov/pubmed/?term=13510573
  
  Using that data along with this from wikipedia I got some estimates of number of melanocytes by age: “The average square inch (6.5 cm²) of skin holds… 60,000 melanocytes”
  https://en.wikipedia.org/wiki/Human_skin
  
  The number of melanocytes by age looks much like the CMM cases *5 years earlier*. So possibly it takes ~ 5 years from initiation of the cancer until it is detectable. If we plot one vs the other it is consistent with a constant rate of conversion from normal to cancer cell of 1.2E-11 per cell per year. Also, the intercept is very close to the maximal rate per person at adulthood, which is probably not a coincidence.
  http://s30.postimg.org/6v2dvcz8h/CMM7.png
  
  Reply ↓
  - question on October 3, 2014 5:01 PM at 5:01 pm said:
    
    Time for a sanity check. In this paper they say:
    “In 2003, 1,889 new cases of cutaneous malignant melanoma were reported in Sweden”
    http://www.onk.ns.ac.rs/archive/vol13/PDFVol13/V13s1p69.pdf
    
    The current paper reports 1,595 total cases for 1973-2009. So something appears severely wrong.
    
    Reply ↓
    - jrc on October 3, 2014 5:22 PM at 5:22 pm said:
      
      Yeah but the average age of diagnosis is 52 in the U.S.
      
      http://mfne.org/learn-about-melanoma/facts-about-melanoma-and-skin-cancer/
      
      They might well have missed some cases, but we don’t expect a lot of those 1,889 from 2003 to be in people below the age of 36 (or 37).
    - question on October 3, 2014 6:39 PM at 6:39 pm said:
      
      Well now the last datapoint in their age vs cases data does not make sense…
    - Daniel Lakeland on October 3, 2014 7:11 PM at 7:11 pm said:
      
      Right, this population is a young population relative to typical melanoma onset. I really like your point about growth and number of skin cells though. That could be included in a model like my bayesian sketch from the other post, using a constant conversion rate per cell times a sigmoidal type growth curve of cell number to get a varying rate of conversion per unit time etc.
      
      If only that swedish 3million record dataset were generally available, I think we’d have a great medical paper in another couple weeks.
    - Daniel Lakeland on October 3, 2014 7:16 PM at 7:16 pm said:
      
      Note, we don’t need any identifying information, at least as a first pass, we just would need birth day, and diagnosis day. It might be nice as Elin pointed out to have “death day” for those who died before end of study and before diagnosis with melanoma, but as I said earlier because it’s a young population that’s going to be a really small fraction of people.
    - Elin on October 3, 2014 8:05 PM at 8:05 pm said:
      
      Well the question for me with their data is are they using death or not; it would explain some of the reason that the mean ages are a bit lower than what you’d expect. However not enough to explain the big discrepancies, but that’s why we need to know the size of the annual month specific cohorts, just knowing the number born in March without the distribution by year means we have been assuming that number of births was uniform.
    - question on October 4, 2014 9:44 AM at 9:44 am said:
      
      I found this nearly complete (missing 2003 and 1991, for some reason) data. While I think I was right to defend the uniform birthyears vs month and births vs birthyear in the absence of data, that assumption is spectacularly violated:
      http://s22.postimg.org/rldbv6s7l/CMM8.png
      Source:http://data.un.org/Data.aspx?d=POP&f=tableCode:55
      R format: http://pastebin.com/VWsmKcAh
    - Elin on October 4, 2014 7:22 PM at 7:22 pm said:
      
      Really frustrating about the missing years because this is really helpful.
    - Elin on October 5, 2014 8:50 AM at 8:50 am said:
      
      2003 (in order of month January to December): 8049 7826 8619 8768 8654 8528 9186 8481 8356 8231 7215 7244
      
      From
      http://www.scb.se/sv_/Hitta-statistik/Statistik-efter-amne/Befolkning/Befolkningens-sammansattning/Befolkningsstatistik/25788/25795/
      
      There were 88173 births in 1999 (if reading the instructions in Google translate worked).
    - Elin on October 5, 2014 9:12 AM at 9:12 am said:
      
      1999 <- c( 7188 7080 7981 7967 8071 7738 7832 7595 7207 7007 6139 6368)
    - question on October 5, 2014 9:46 AM at 9:46 am said:
      
      The UN data includes 1999, its missing 1991. Anyway I wonder what is going on that leads to these minor differences of one birth counted every other month or so.
      
      UN 1999:
      January 7187
      February 7080
      March 7981
      April 7967
      May 8072
      June 7738
      July 7832
      August 7595
      September 7206
      October 7008
      November 6139
      December 6368
    - question on October 7, 2014 9:17 AM at 9:17 am said:
      
      To conclude this chapter. Elin appears to have been correct about their methodology, there really was just a crazy age by birthmonth interaction. We need the birthmonth and birth year for the CMM cases to proceed further with this. I am wary of making any more assumptions about how these are distributed.
    - Andrew on October 7, 2014 9:40 AM at 9:40 am said:
      
      Letter to the editor to the journal, anyone?
Elin on October 7, 2014 2:53 PM at 2:53 pm said:

This 1991 issue is really a pain. If nothing else they need to share their data, especially for the CMM … though I wonder if there is some privacy issue? No idea at all about the mysterious extra babies between Statistics Sweden and the UN,I’m not seeing any obvious explanation.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

International Journal of Epidemiology versus Hivemind and the Datagoround

21 thoughts on “International Journal of Epidemiology versus Hivemind and the Datagoround”

Leave a Reply to Daniel Lakeland Cancel reply