Proposed new EPA rules requiring open data and reproducibility

Tom Daula points to this news article by Heidi Vogt, “EPA Wants New Rules to Rely Solely on Public Data,” with subtitle, “Agency says proposal means transparency; scientists see public-health risk.” Vogt writes:

The Environmental Protection Agency plans to restrict research used in developing regulations, the agency said Tuesday . . . The new proposal would exclude the many research studies that don’t make their raw data public and limit the use of findings that can’t be reproduced by others. The EPA said this would boost transparency. . . .

The move prompted an uproar from scientists who say it would exclude so much research that the resulting rules would endanger Americans’ health. Ahead of the announcement, a coalition of 985 scientists issued a statement decrying the plan.

“This proposal would greatly weaken EPA’s ability to comprehensively consider the scientific evidence,” they said in a letter issued Monday. The group said the EPA has long been very transparent in explaining the scientific basis for decisions and that requiring public data would exclude essential studies that involve proprietary information or confidential personal data. . . .

The administrator made his announcement flanked by two lawmakers who introduced that legislation: Sen. Mike Rounds (R., S.D.) and Rep. Lamar Smith (R., Texas).

Mr. Smith has argued that confidential data such as patient records could be redacted or given only to those who agree to keep it confidential.

Scientists have said this sort of process would still exclude many studies and make others costly to use in regulation. Gretchen Goldman, research director for the Center for Science and Democracy, has said studies are already rigorously reviewed by scientific journals and that those peer reviews rarely require raw data to assess the science.

Richard Denison, lead scientist at the Environmental Defense Fund, said the rule could exclude studies that track real-life situations that it would be unethical to reproduce. He gave as an example the monitoring of the Deepwater Horizon oil spill in the Gulf of Mexico in 2010.

“The only way to reproduce that work would be to stage another such oil spill, clearly nonsensical,” he said in a statement.

As for providing all the raw data, Mr. Denison said that would prevent the use of medical records that must be kept confidential by law.

The American Association for the Advancement of Science—-the world’s largest general scientific society and the publisher of the journal Science—said the rule would also exclude many studies that rely on outside funders, because they sometimes limit access to the underlying data.

Daula expressed this view:

If journals required data and code to replicate then it wouldn’t matter. Having a big player demand such transparency may spur journals to adopt such a policy. Thoughts? Controversial politically, but seems in line with ideas advanced on your blog.

I have mixed feelings about this proposal. Overall it seems like a good idea, as long as exceptions for special cases are carved out.

1. Going forward, I strongly support the idea that decisions should be made based on open data and reproducible studies.

2. That said, there are lots of decisions that need to be made based on existing, imperfect studies. So in practice some compromises need to be made.

3. Regarding the example given by the guy from the Environmental Defense Fund, I don’t know how the monitoring was done of the Deepwater Horizon oil spill. But why can’t these data be open, and why can’t the analysis be reproducible?

4. There seems to be some confusion over the nature of “reproducibility,” which has different meanings in different contexts. A simple psychology experiment can actually be reproduced (although there’s never such a thing as an exact replication, given that any attempted replication will include new people and a new context). In some examples of environmental science, you can re-run a lab or field experiment; in other cases (as when studying global warming or massive oil spills), there’s no way to replicate. But the data processing and analysis should still be replicable. I haven’t seen the proposed EPA rules, so I’m not sure what’s meant by “limit the use of findings that can’t be reproduced by others.”

I’d hope that for a study such as the Deepwater Horizon monitoring, there’s be no requirement that a new oil spill be reproduced—but it does seem reasonable for the data to be fully available and the data processing and analysis be replicable.

5. I’m disappointed to see the research director for the Center for Science and Democracy saying that studies are already rigorously reviewed by scientific journals and that those peer reviews rarely require raw data to assess the science.

No kidding, peer reviews rarely require raw data to assess the science! And that’s a big problem. So, no, I don’t think the existence of purportedly rigorous peer review (if you want an environmental science example, see here) is any reason to dismiss a call for open data and reproducibility.

Also, I’d think that any organization called the “Center for Science and Democracy” would favor openness.

6. I can understand the reasoning by which these science organizations are opposing this EPA plan: The current EPA administrator is notorious for secrecy, and from newspaper reports it seems pretty clear that the EPA is making a lot of decisions based on closed-doors meetings with industry. But, if the problem is a closed, secretive government, I don’t think the solution is to defend closed, secretive science.

7. Specific objections raised by the scientists were: (a) “requiring public data would exclude essential studies that involve proprietary information or confidential personal data,” and (b) “rule would also exclude many studies that rely on outside funders, because they sometimes limit access to the underlying data.” I suppose exceptions would have to be made in these cases, but I do think that lots of scrutiny should be applied to claims based on unshared data and unreplicable experiments.

58 thoughts on “Proposed new EPA rules requiring open data and reproducibility

  1. You’re not seriously this naive, right?

    The goal here is not to promote sound science, it’s to give the EPA an excuse to ignore science. This is being done deftly, so that “useful idiots” in the scientific community will support the regulations, but have no doubt about the underlying motivation. This is straight from a long-running Republican playbook to insist on “sound science” (who could be against sound science, that’s obviously good right?) as a way to discard uncomfortable truths. See, for example: https://www.washingtonpost.com/archive/opinions/2004/02/29/beware-sound-science-its-doublespeak-for-trouble/8e4aaeed-f918-4cc1-8508-3e38b3cfb613/?utm_term=.af08563c865b

    • Suppose what you’re saying is true, it could easily be remedied by the researchers making their data public.
      The science is funded by the public, the EPA regulates the public, yet the data aren’t publicly available. How does this make sense?

      • The data gathering may or may not have been government funded but if it was, what government?

        Public release of some data sets might be illegal in the country where it was gathered.

      • 1) The EPA funds almost nothing. It’s perfectly reasonable for the NSF/NIH/etc. to require release of data in return for funding. But the EPA has little-to-no-influence over studies before they’re conducted. The EPA is just responding *after the fact* to studies funded and conducted by others. Maybe there’s some pressure on scientists to do their work differently so the EPA won’t ignore them, but I doubt it.

        2) Many of these studies were conducted in the past. The authors may well be dead. So we just ignore that?

      • If the data in question is health-care data that’s been de-identified to HIPAA standards, then is that data ‘raw’; or if the data contains personal health data that is used under an IRB and not possible to release publicly, then is this data usable?

        These are details worthy of discussion.

    • Joe:

      See my point 6 above. There could well be good reasons to oppose this EPA policy. But, if so, I think the opposition should be on the substance of environmental policy, not on a general defense of peer review.

      • It’s a general defense of allowing the scientific community to police itself, not of peer review per se. The EPA should not be in the business of setting its own standards on these matters.

        The same sort of blanket seizure of authority by bureaucrats could be used to, say, ignore Bayesian studies (“The EPA does not wish to rely on studies that are heavily influenced by researcher preconceptions, known in scientific jargon as “informative priors.” As such, the EPA will only use frequentist studies.”) or studies done by non-American researchers (“The EPA does not wish to allow Russia and Iran to dominate our environmental policy…”).

        • Alan:

          If a government official (I prefer to avoid the loaded term “bureaucrat”) proposes a regulation opposing informative priors, or proposes to ignore studies performed outside of the United States, I would oppose such policies directly. If a government official proposes a regulation encouraging open data and replicability, I would support that. As noted in my post above, much depends on the details.

        • But that’s the problem. This is a process issue not an outcome issue. Scientists, rather than government officials, ought to be setting the standards for good science.

        • “Scientists, rather than government officials, ought to be setting the standards for good science.”

          Why?

          1) They do not own science
          2) They get paid by the government/tax-payer if i am not mistaken a lot of the time
          3) They perhaps have shown not to be able to set the standards

          It seems to me scientists always blame everyone but themselves. And to me they seem to feel little responsibility or take accountability. I don’t like that. don’t like that at all.

        • Anonymous I empathize with your three points, especially the third. I read a brief analysis by Frank Von Hippel, of 1975 Reactor Safety Study [RSS} wherein it took protracted process [4 years] by which its errors, initially in the Executive Summary, came to light and acknowledged by its authors. Hippel highlights the challenge of getting scientists’ commitment and review process aligned. Question is what are the institutional safeguards and oversight mechanism? Non governmental organizations play a significant role.

          Of course NAS and AAAS organizations attempt to discuss the accountability question. There has always seemed some subset that try to raise critical scientific questions. It does seem that some strides have been possible. It takes sustained public awareness and education that includes public participation.

        • Agreed. That’s what the Founders thought. Congress would hold public hearings and scientists would be invited by competing sides to address the issue. Congressmen as representatives of the people would then decide what the law should be. If the people didn’t like it or if the science became more compelling they voted in new congressmen and got a new law. This was too much work for congressmen and entirely too much risk. They’d rather spend their days getting free meals, being pelted with lavish donations and stumbling upon naughty girls in the woods after being directed that way to look for an errant ball by the lobbyist driving the cart at an afternoon golf outing (true story). So, they created the agencies not to improve the decision-making process but rather to hide it from the people.

        • Government officials should be setting the standard for how policy is made; that’s the issue, not whether the science is “good”.

        • Yes, especially most of the gov’t officials will not have the expertise to evaluate the quality of the research.

        • That this is being introduced by Rep. Lamar Smith sets a very strong “prior” for me that the motivation is malevolent and that the details will indeed be such that the outcome is opposite to what I think you would want.

          For anyone not familiar – Lamar Smith is this piece of work:

          http://www.latimes.com/business/hiltzik/la-fi-hiltzik-lamar-smith-20171103-story.html

          http://www.sciencemag.org/news/2017/11/lamar-smith-departing-head-house-science-panel-will-leave-controversial-and-complicated

  2. This is a very unfortunate example regarding open data. The EPA (more specifically, its leadership) has a clear political agenda which is served by requiring “open data” as a means to, as Joe says, disregard science as much as possible. Much data the government collects is protected (for any number of reasons, both good and bad, mostly bad in my opinion) and so could not be used if open data is required. It handcuffs EPA researchers. It also puts many of us in the position of arguing against open data policies in order to support better policy-making. As I said, it is unfortunate that this issue has been set up this way. I’d prefer to support open data AND support EPA research based on the best data they have available, even if it cannot be publicly released. That option is being taken away – just another in the current administration policies that I happen not to support. Just a reminder that I am a pawn.

  3. My reading of this is that the elephant in the room is the large mass of rules based on old data. The study of effects of lead, or ground-level ozone, or particulates, or radon, or effects of pollution on wetlands, or – well, just look at the major regulations, and just about all of the big ones were passed years ago, usually based on lots of data. Ultimately, Pruitt or some other person might want to use this requirement to kill or weaken these old rules, since almost all the data supporting them is not available, for a variety of reasons. It would be redundant and wasteful to do these studies again. And some of them, perhaps most, are impossible to do, because they would be immoral, given what we know.

    Will we really discourage people from from keeping data private by retroactively denying ourselves the value of studies done long ago?

    Going forward, I can see where EPA could encourage open data, but I think they need to continue to use whatever is available, weighing it appropriately if it looks like the data are suspect. But sometimes that just isn’t an issue. And, as has been pointed out, they do not support much of the research. Those who support it should be the ones to enforce openness requirements.

  4. > It also puts many of us in the position of arguing against open data policies in order to support better policy-making.

    Hmmm. That would suggest that you don’t really support open data policies.

    1) Most readers of this blog think that data/code should be more open/reproducible.

    2) Most readers recognize that policy changes are one way to accomplish this. So, for example, we all (?) think that journals like JASA should require open data/code for publication (with some allowance for special cases).

    3) We realize that, even the most desirable policy will have trade-offs. Miranda warnings are good (I think) even though they might, on occasion, result in a guilty criminal going free.

    4) You seem to be against open code/data if such a rule leads (always? sometime? just once?) to a public policy outcome you don’t like. Is that really your position?

  5. One might wish to consider the consequences of requiring an agency to hold off on regulating potentially toxic substances until after they have achieved a quality of scientific research that none of the Ivey league universities or likely any other institute seemed to have yet achieved. Especially an agency which I believe has had its funding reduced. Increasing transparency usually requires more funding.

    Often, the firm to be regulated was obligated to do the studies and they often do not want their competitors or even university researchers to have access. Other agencies and other country’s agencies sometimes do share such data and reproduce and critique each other’s analysis.

    Some agencies, will not use or only very cautiously use academic studies even sometimes when they do have access to the raw data (as there may have not have been adequate data quality control and now no way to ensure its accuracy). In such cases I would argue, the quality of scientific research is much higher than in academia.

    One might also wish to recall the success of the Tobacco Lobby’s response to scientific evidence – we are not going to disregard it but rather ask for better research, in fact we will fund some of it. What could have gone wrong with that?

  6. For patient privacy reasons, “raw” individual health records will never be routinely available. So, by this policy the EPA effectively excludes all research that includes impacts on human health. Which the petroleum companies currently running the EPA will love.

    • Have you ever been involved in navigating the privacy hurdles of doing a huge epi study of a refinery workforce spanning decades? I have. It has been done many times and several are currently underway. I assure you that they get far more scrutiny than the studies routinely discussed here. Going in you know that multiple state and federal agencies are going to be looking over your shoulder and you know that all of your employees with important roles in the study will be deposed by lawyers incentivized by the possibility of a huge payday to uncover methodological flaws or shenanigans.

      The reason many people don’t like petroleum worker studies is that they don’t like the results. They always wind up having to argue that the petroleum industry has developed a magical formula whereby its members can screen out all but the super-duper healthy workers with unusual innate defenses against carcinogens and cardiovascular disease. The real answer – high paying jobs, on-site health interventions including smoking cessation programs begun in the 1960s, social interventions to elevate SES, work involving lots of getting up and moving around, a benign (or, if you’re a hormesis enthusiast https://www.nature.com/articles/srep43361 beneficial) environment – is one that many people just can’t stomach.

      • You could not have possibly missed my point by a farther mark. I’m not at all talking about the health of petroleum industry workers – but rest assured that the current administration will be reducing their regulatory oversight on that front!

        Rather, by requiring completely open data, the new EPA policy COMPLETELY eliminates human health as something that can be considered in EPA policy decisions. It eliminates health consequences at every level and as potentially affected by EVERY industry. For privacy reasons, individual human health data will NEVER be open. For this administration, anonymized aggregate data will fail the open raw data test as well (because the underlying raw data for the aggregates are not available), so health will no longer be admissible in EPA decisions.

        Andrew has totally fallen into the trap that Pruitt laid for him and other open data advocates who do not routinely work with human health data.

        • There was discussion here not too long ago indicating most patients who participated in such studies did not at all mind their data being “open”, but in fact assumed that was the case and once they learned the truth were horrified that it wasn’t.

          I can’t find the thread at the moment unfortunately. Anyway, in that case the problem with regards to health data lies elsewhere.

        • Maybe there will be a HUGE regulatory change regarding patient data privacy – but at the moment, all the changes are towards GREATER patient privacy. And Pruitt knows this – that’s why he’s pushed this change to EPA policies. If the EPA cannot base it’s policies on data that is not completely open, then human health is completely excluded, which is what his puppet-masters in polluting industries want.

        • I really don’t think it would require a huge change. Vast databases of medical data already exist somehow (see the other comment about SEER).

        • So you’re saying that somehow the EPA will no longer be able to rely on the results of studies examining anonymized (which the Census Bureau does routinely and w/o inducing outrage) e.g. industrial hygiene monitoring, years’ worth of PFTs (pulmonary function tests), X-rays and in some cases paraffin blocks and IHC-stained slides used to justify ultimate diagnoses drawn from my client’s employees data? What language can you cite for such a claim? If it is true then everyone, industry included, would oppose the change. By the way, are they shutting down the National Death Index too? And will there be no more SEER reports? And what of the CDC? Is there some nefarious plan to end reports of listeriosis outbreaks because they are, by necessity, drawn from human health records?

        • And will there be no more SEER reports?

          This is a great point. How exactly does SEER exist in the universe these people think they live in?

          Obviously the real obstacle with open medical research is not legal, it is that for some reason the researchers are not getting waivers to release the information or something like that.

  7. Riffing off a somewhat recent topic, the real debate is whether EPA should be required to justify its alpha or even say what it is. Recall that the EPA was born at a time when the reigning paradigm was that the key to defeating infectious diseases had been found and that man-made diseases, especially cancer which was widely assumed to be “100% environmental”, would going forward be the driver of human morbidity/mortality. Couple that with the LNT model that assumes even one atom of arsenic or one fiber of asbestos carries certain death and it’s hardly a surprise that EPA began its assessment of evidence with a very high tolerance for false positives.

    Once it became clear that EPA would accept almost anything as evidence and would overweight anything that even suggested a risk EPA became a potent political tool. I’ll give you one instance in which I was a spectator (with names and facts modified to protect the guilty and the innocent). Paper company “A” was competing with paper company “B”. Paper company “B” had a new process and was beating “A” with it. “A”, trying to figure out the new process, sampled the waste water “B” discharged into the local river. They couldn’t reverse engineer the new process but they did detect a chemical fingerprint different from their own. Then “A” had a commercial tox lab expose two dozen rats exposed to the specific to “B” chemicals at very high concentrations. Thanks to the magical Zymbal gland a couple of the rats developed tumors. “A” then got their congressman to sic EPA on “B”‘s process. “B” hired the state university’s health sciences group to do a tox study on its effluent and to review the literature re: the chemicals at issue. The money came with no strings attached, no right to oversight, method selection, paper editing or publication veto. The outcome was very good news for “B” on the science front.

    The EPA however weighs evidence in secret on scales calibrated by politicians and it refuses to state its alpha. The result was victory for “A” and an an enormous expense imposed on “B” which happily bought “A” time to make its process more efficient.

    Now, if everything is political and it’s all just will-to-power then EPA is functioning precisely as intended. But if there is objective truth to be approximated and lives saved or dollars not wasted thereby then it might be worth pondering the words of a former EPA head that came some time after an independent group of scientists had audited the EPA’s “science”: “Scientific data have not always been featured prominently in environmental efforts and have sometimes been ignored even when available.” From that you might well infer that the wails and laments of “science” being somehow outlawed at EPA are cynical at best.

    • It seems to me that neither many politicians nor public have sufficient expertise, time, nor, more broadly, learning disposition that can address the challenges listed here.

      Subsets of scientists who may have the time and inclination have and can make a valuable contributions. My observation that at least in the Obama administration that was more likely with John Holdren and Richard Meserve, as two, who advised the Executive branch and appeared before Congressional committees.

      There is a need for oversight body. I thought that the Office of Science & Technology functioned as a conduit for further deliberation of environmental and technological risks. I gather there is a vacancy still in this administration.

      • We live in a finite world of finite resources that is populated by people with a seemingly infinite variety of needs, desires and ambitions. The market mediates some of the resulting conflicts but it creates others. You don’t want a rock knocked into your windshield to turn the glass into flying daggers and so are happy for the plastic that lies between its two glass sheets. However, someone has to find and refine the oil that is sent to the chemical plant that synthesizes the plastic that is sent to the glass factory where the windshield is made. If the rig, refinery, chemical plant or factory sits just beyond your back fence you might be forgiven for forgetting about the blessings of modern glass given the smells and noise – especially if you fear they might be impacting your health.

        The varying competing interests must be balanced and doing so is always a political act. Even assuming arguendo that “science” could accurately measure the risks and estimate outcomes (e.g. 10 fewer eyes lost with modern glass versus 10 fewer cases of hearing loss without the glass factory) deciding which one is less bad and by how much will depend on a populace that varies considerably more than Student’s barley.

        • Thanatos,

          Great points. That doesn’t mean that we shouldn’t advance the best thinking and processes rather than just uphold the status quo. As I quip often, this is a battle between good and better thinkers. And in this age of AI and machine learning, I have hope that the better thinkers’s time has come, despite the kabuki theater that substitutes for substance.

  8. This rule is essentially stating a study without fully open data provides zero information relevant to regulatory decision making. This is an incredibly strong statement to make. The availability of the data could (and probably should) be one factor to consider when weighing the value of a study, but it should not be THE factor.

    Also, A lot of environmental research is based on sensitive data that should not be made public. Many wildlife biologists do not release location data over concerns that poachers, pet trade collectors, or other bad actors could take the information and wipe out entire populations of rare species. (This sort of research is more often the purview of the US FWS than the EPA, but not always).

  9. These comments are very interesting. Open data is a goal to be achieved, not a reason to throw out previous work in total. This sounds like a clever way for an awful person to abuse current buzzwords like reproducibility.

    Much of the EPA research is done with standardized protocols for field, in-vivo and in-vitro toxicology responses. Other research is chemical detection with other protocols. The concerns of peer review are methodological-use of reagents, containers, preservatives, etc., not just the end result. Also summary data is usually available when raw data isn’t. This is not high-variability social science research.

  10. I think you are missing the point Andrew. Decisions are going to be made. Regulate this, don’t regulate that. Do this, don’t do that. The EPA has to make a call of what to do, even if that is nothing. The alternative to using imperfect data is to use no data. Imagine you have a disease with two possible treatments. One has imperfect evidence of good effect, and the other is homeopathy. Are you going to flip a coin between the two just because the data on the first treatment is not public?

    Even ignoring the obvious bad faith that the administrator is operating under, I think it is bad science to not take all evidence into account when you have to make a call. The place to fix open access / data is on the research generation / funding / publication side.

    • Ian:

      Yes, perhaps better would be to say that they are grading the evidence on different scales. Open data and replicable is level 1, all else is level 2. Level 2 evidence can be relevant when there is insufficient level 1 evidence.

    • Imagine you have a disease with two possible treatments. One has imperfect evidence of good effect, and the other is homeopathy. Are you going to flip a coin between the two just because the data on the first treatment is not public?

      Most treatments come with pretty severe yet unstudied long term side effects, the medical literature is extremely biased in terms of “discovering” an effect, the treatment is probably very expensive, etc. So I would go with homeopathy (treating this as placebo), or more likely something like exercises and dietary changes. From what I’ve seen, that is also where most people are turning these days as their faith in the medical establishment fails.

      Here is some interesting reading about coughing after taking an ACE inhibitor (blood pressure med):
      https://www.peoplespharmacy.com/2010/04/26/when-will-doctors-pay-attention-to-an-ace-cough/

      The next obvious step is to look up whether coughing may lower blood pressure… Apparently it does: https://www.ncbi.nlm.nih.gov/pubmed/16051114

    • Ian:
      > The place to fix open access / data is on the research generation / funding / publication side.
      Agree, why put the onus on agency that is supposed to protect the health of Americans (that is currently underfunded?) rather than on institutions that are supposed to generate new knowledge? That was part of what my comment above was about.

      Now it probably is harder (legally?) to do that – but I believe less risky with more benefit.

  11. A lot of the comments about “but there are often good reasons not to release your data!” strike me as unconvincing, with the exception of actual medical studies on individuals (and even there surely something could be released).

    Scientists have proven to be very bad at policing themselves; given the EPA’s reliance on science, it shouldn’t just shrug its shoulders at “well we don’t have any better studies available”, it should actively work to improve things. Telling scientists their studies will go in the trash without open data seems a pretty good kick in that direction.

    • > Telling scientists their studies will go in the trash without open data seems a pretty good kick in that direction.
      Many agencies have taken that position or at least without access to the data the findings of your study will be set aside or at most treated as just supportive (i.e. won’t stand on its own). I believe this is what should be done.

      I think if you look through enough EPA public documentation you may see that they have done that one some occasions. Certainly the firms being regulated often argue for that (as they should).

      I have yet to hear many academic groups suggesting such action would provide strong incentives for them to change.

  12. Article in The Chronicle of Higher Education ( On-line Wednesday, April 25, 2018)

    Why Researchers Shouldn’t Share All Their Data

    By Nathan Schneider

    Too much self-exposure might compromise a career. It might also muddle one’s message.

    It is pay-walled and I have not read it but some people seem to think it is a good idea not to share data.

    Telling scientists their studies will go in the trash without open data seems a pretty good kick in that direction.
    A point I made earlier. Not all science is done in the USA.

    To make up an extreme example: Telling a Nobel prize winner who lives in France and who made be legally obliged under EU privacy law that his or her study is being trashed by the EPA is not likely to make the data available.

    • I can’t find any sign of the Chronicle article – can you provide a link? I don’t have access behind a paywall either, but the quoted line is exactly the type of reason I would reject. The exposure of data that might threaten a career is a symptom of the poor incentives in the current system. Careers should be enhanced by providing open data, not threatened by it. Of course, as I and others have said, this is not really the issue with the current EPA thinking. I have little doubt that the EPA is pursuing a political agenda rather than trying to establish a principled policy for scientific research.

    • I had a quick read, it seems thoughtful and addresses creative work and journalism as much or more than scientific studies.

      I liked this excerpt – for those accustomed to harassment online, the call for openness feels like a call to invite more harassment. “The only way to handle this sort of problem properly”, Dash contends, “is by explicitly placing consent and safety over openness and transparency”. Dash also questions whether dumping vast amounts of information online counts as transparency in the first place: “What you wind up with is a company that produces so much unorganized, uninteresting and irrelevant data that you can’t find meaningful information.””

      • I finally located the article – it was published on April 8, not April 25. But, I don’t have access, so I can only comment on the excerpt you liked. Providing data is certainly not a panacea. Yes, it can be meaningless and overwhelming and it can promote harassment (methodological terrorists, anyone?). But these are poor excuses to justify keeping data secret – and that path rapidly leads to justifying the status quo, with peer review, esteemed editors, “reputable” institutions, citation counts, etc. as the best we can do. After all, all of these institutions were established in part to elevate scientific research onto the pedestal upon which it sits. I think the replication crisis should call for some radical (i.e., non-incremental) changes to that system. Open data is far from a sufficient solution, but I think it is an important step that is necessary, given the difficulties of changing the existing power structures we have established.

    • US is a very litigious country. Nearly everything is vetted by lawyers. As it is in sciences and social sciences we turn many debates into fights which also complicates the regulatory environment. We need to consider better collaboration too. There appeared greater effort in this regard during the Obama and Bush administration. Of course the Replication Crises, so billed, has accelerated transparency efforts. Congress is surely aware of it from my experience.

  13. Andrew – as many have noted, this is an attempt to make it harder to for the EPA to regulate various toxic substances and that it is often either unethical or illegal to make replication data related to health available in raw form, even if deidentified. I clearly support openness but it can be impossible in practice. Also note that the EPA’s proposed rule is the child of Lamar Smith, D., Tex, Honest Act. Would I be cynical that Rep. Smith is a leading climate change denier in the House. Science Magazine ran what is, I think, a very sane and short discussion of the Honest Act; those interested can check https://preview.tinyurl.com/ya86qpxo (which is a “safe” tiny url). Thanks.

    • Thanks for the link, I raised the following as a possible concern above – “Polluters and manufacturers of dangerous products have taken a page from the tobacco industry playbook, magnifying those uncertainties to prolong the review of scientific data, slow the regulatory process, and evade liability.”

  14. I think that everyone is missing an enormous point. One big source of data that the EPA depends on is industry data, which is proprietary. (This is true for the FDA as well.) There are a lot of questions related to cost/benefit, how to clean up sites, etc, that only the industry has. And, they are not going to generate that data if it has to be open. Currently, the EPA can see the data and have their scientist review it, without the data being open. After Pruitt announced this rule, he realized that industry data would also be excluded. It has been reported that in response, the rule was revised to give him the power to make exceptions. But, the power to make exceptions will essentially give the EPA director a blank check to take into account whatever industry data he wants to. That is an inherently corrupt unscientific process. If you think that we should just ignore industry data if it is not open, that is hopelessly naive. Who is going to know the best safest way to cap a well? It is going to be some scientist at Exxon Mobile, not an academic. What if you are a company that has developed a new system to remove pollutants from factory smoke stacks. Your data shows that you system works and produces more benefits than cost, but the industry doesn’t want the added cost. Under these new regs, Pruitt can just choose to ignore the data since it is proprietary.

  15. John Ioannidis has a different perspective on this rule, which is also worth reading: http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1002576

    “Making scientific data, methods, protocols, software, and scripts widely available is an exciting, worthy aspiration. Government-based regulatory and funding incentives can be instrumental in making this happen at large scale. However, we should recognize that most of the raw data from past studies are not publicly available. In a random sample of the biomedical literature (2000–2014), none of 268 papers shared all of their raw data. Only one shared a full research protocol. The proportion of studies that have had all their raw data independently re-analyzed is probably less than one in a thousand. The number of studies that have been exactly replicated in new investigations is quite larger, but still a minority in most fields. A new standard currently proposed for the Environmental Protection Agency aims to ban the use of scientific studies for regulatory purposes unless all their raw data are widely available in public and can be reproduced. If the proposed rule is approved, science will be practically eliminated from all decision-making processes. Regulation would then depend uniquely on opinion and whim.”

    • If the EPA can basically outlaw use of the “scientific” literature by saying there must be replication studies performed, I think its just revealing of how farcical most of what was presented as science has been. There is a totally undeserved authority there. Also, if this scares you NHST should scare you at least as much:

      “If [NHST is adopted], science will be practically eliminated from all decision-making processes. Regulation would then depend uniquely on opinion and whim.”

      NHST is actually measuring the collective opinion of a field, that is the only thing keeping it from generating obvious nonsense for the most part. You are guaranteed to generate “evidence” for anything if enough time/money is spent on it. If enough people don’t like the results they always have an excuse to reject the evidence, since it is always based on faulty reasoning (eg Bem’s studies were following standard procedure).

    • > If the proposed rule is approved, science will be practically eliminated from all decision-making processes.
      Not sure John is all that familiar with what happens within a regulatory agency like the EPA.

      Most of the studies are conducted by industry under legal obligations and or requests from the agency. There are requirements that all the records and raw data are forwarded or at least subject to audit. Those who conducted the study and analysis are often questioned in person and asked to clarify and even do new different analyses.

      The vested interests are strong here but they are well understood and mitigated with legal penalties. Sure there are always going to be some bad apples. So I believe its actual much better science than what happens in academia except it’s confidential.

      I don’t believe journal publications are a science anymore or at least, a very weak science. Many agencies already disregard published journal articles if at all possible and when it not possible they are often considered supportive rather than _real_. I believe that is usually sensible.

      Now, confidential does not mean impossible for others to verify but rather _the_ others are limited. Sometimes other agencies will review and reproduce the assessment to ensure its quality. I see nothing wrong with a court subpoena to obtain study data as long as those reviewing it are required to prevent personal disclosures. I am a supporter of random audits of studies and some university are starting to do that (e.g. University of Toronto for compliance with ethical requirements which includes adequate data management).

Comments are closed.