Data-share this, pal:
As the man said, you have no obligation to share any of your data and I have no obligation to believe anything you say.
Data-share this, pal:
As the man said, you have no obligation to share any of your data and I have no obligation to believe anything you say.
I recently taught an undergraduate class aimed at helping students to become more adept at critically reading published research; the basic format is that they pick a recent paper each week and we go through it in detail, with a focus on the interpretation and analysis of the data. One of the papers the students picked was from a trial of psilocybin for the treatment of depression, also published in NEJM (link here: https://www.nejm.org/doi/pdf/10.1056/NEJMoa2206443?articleTools=true). I’m not sure whether this paper was published before the one you refer to in your post, but looking at the data sharing statement those authors provided (link here: https://www.nejm.org/doi/suppl/10.1056/NEJMoa2206443/suppl_file/nejmoa2206443_data-sharing.pdf), it looks like we have a clear case of plagiarism on our hands…
(One of the questions I had students discuss in the relevant class was what they thought of the responses given in the data sharing statement. The general tenor of that discussion was one of disbelief on students’ behalf that researchers could get away with something like this. I wonder at what point the transition from “undergraduate student who finds a stupid data sharing statement ridiculous” to “medical researcher who is comfortable submitting a stupid data sharing statement” takes place?)
Is it a longitudinal or a cohort thing?
Also, it may be a selection thing. If 99% think the former, but only the 1% who think the latter become professors, there you go.
I may not have understood your comment, but if I did then an example is provided by the panel discussions after the NEJM held the SPRINT competition exploring the value of open data for clinical trials. The panel of trial participants expressed shock that the data from trials are not generally shared openly – they said one of their motivations for participating was the belief that the data would be shared. The panel of researchers was much more concerned about data sharing, citing concerns about bad replication attempts, loss of credit for their work, and methodological terrorists (of course, they did not use the term but couched it in nicer-sounding professional language). The world of the latter group thought limiting access to such data was the norm while the former group did not understand their concerns.
Is that along the lines of your 99% and 1% comment?
I was referring to your statement “(One of the questions I had students discuss in the relevant class was what they thought of the responses given in the data sharing statement. The general tenor of that discussion was one of disbelief on students’ behalf that researchers could get away with something like this. I wonder at what point the transition from “undergraduate student who finds a stupid data sharing statement ridiculous” to “medical researcher who is comfortable submitting a stupid data sharing statement” takes place?)”
The phrase “general tenor” indicates a broad, but not universal, consensus. So, if 99% of undergrads are shocked that researchers can get away with it, but 1% would be comfortable doing it, but only the 1% of students become professors, that would explain the discrepancy.
Dale I did a very small amount of digging to try to find out how the SPRINT data was distributed, and supposedly there were some definite preconditions, you had to apply, get IRB approval, answer some questions to prove that you knew some basic stuff about data analysis, and be approved by a board…
F*** that, when the trial participants themselves largely wanted their data shared. That data should be on a free download site. IMHO the NIH should simply have a policy requiring anything funded by them to have anonymized individual downloadable data with NO exceptions. Make the participants sign a form saying they’re ok with releasing the data, or they can’t participate. This data is the real product that NIH is producing, not the papers with the private thoughts of the researchers who have exclusive access.
My understanding is that NIH does have a data sharing requirement for non-clinical data these days, but it needs to be for clinical as well. I’d suggest **perhaps** an exclusion for diseases or conditions that are extremely uncommon, like less than 5000 people in the US. Anonymizing that would be hard.
Daniel
I have a somewhat lengthy presentation about my experience in that competition. Yes, there were hoops to jump through in order to participate – with the result that only half of the participants made it through those hurdles. Then I requested some data that was relevant to my analysis and was denied because the authors of the original paper had not used that data (the data was on an assessment of likely adherence to the treatment protocol – and my finding was that people that missed their scheduled visits were worse off than the control group). I then discovered that the same authors had used the data I was denied in a publication in a different journal. To this day, the NEJM seems confused what to do about data sharing – they now require the statement (as shown above) but say they have not established a policy yet. How many years will it take them to decide where they stand?
As a young naive recent PhD graduate from USC in about 2014 I was interested in decompression sickness. I did a FOIA with the Navy to get them to release the computer readable datasets that went into creating the graphs in a completely OPEN publication they had from the late 1990’s… (a publication even someone in north korea could download off the web today, a scan of a paper printout which with enough work you could work out the values from the graph to reasonable accuracy). They gave me the runaround and quietly closed my FOIA after a year. There’s people in the Navy whose whole job is to stonewall the public on FOIA.
Then I also make a request to some researchers at USC my actual “alma mater” so to speak (I hate institutions but anyway they’re who I did my PhD with) who were studying **elephant seals** not even humans. And they published an article in which they actually signed a paper saying their data was open and would be provided on request. They basically politely said “f** you if there’s nothing in it for us” and I never got anything from them. I have requested data on percentage body fat and body measurements from researchers who had a paper saying their data was open on request… Nothing.
Open on request means “F*** YOU” that’s what it means. I have never ONCE gotten open on request data. It’s either on a repository with a link to download, or its not available period.
Oof, yea, it’s really an awful reflection of the incentive structures in academia in our age isn’t it? I aim for transparency and reproducibility in everything that comes out of my group, but we’re not perfect either. From the standpoint of advising graduate students, there’s definitely extra cognitive load when you insist on trying out more rigorous analyses, workflows, etc. Students look around at Mr and Mrs Everybody Else having an easier time of it by just following the herd on whatever subgroup ‘professional norms’ exist in different fields. “Well if some canned ANOVA out of SAS plus post-hoc NHST story time and irreproducible data are good enough for them why not for me?”
Careful work ends up as one line in your CV the same way as sloppy work. While thoughtful prospective employers might sift through that at *later stages* of an interview, to even get in the door you have to play the quantity game first.
It’s really a shame, but I don’t know what else to do than insist on somewhat higher standards myself, and then wait for a critical mass of us to reach positions of power without hopefully losing our way first…
To learn the answer to that, I may have to follow the next 25 cohorts of students I teach for the next 50 years!
Seriously though, I suspect it’s a mixture of all the factors you mention. There are probably a lot of researchers around now to whom data sharing is a totally foreign idea (i.e., a cohort effect), and it also can be damaging to the length of one’s list of publications to be both rigorous and transparent (i.e., a selection effect).
I recently had the joy of finding that one of the major publishing firms had “lost” all copies of the supplemental information for a paper from the early 2000’s that I was interested in. They claimed that the publisher had changed, but that was just dishonest: the original paper specified where it could be found by name and that very same publisher then said it had been lost and that I should ask the authors. The authors did a good job of publishing all of the relevant data in the supplemental information. The multi-million (billion?) dollar company that took their paper (and copyright) then proceeded to lose that information in a few dozen years.
Why would we expect researchers to keep their data around and accessible when the journals themselves (whose sole job is to publish and distribute the results) can lose it?
Whenever I review manuscripts for publication, I often state that the de-identified data should be included in the supplemental information or otherwise be immediately accessible (along with the statistical scripts that they used to do the analysis). If I saw them just say “no” to data sharing, I’d feel completely justified in just saying “reject” to their manuscript (and all revisions/resubmissions.)