Provisional draft of the Neurips code of ethics

This is Jessica. Recently a team associated with the machine learning conference Neurips released an interesting document detailing a code of ethics, essentially listing concerns considered fair game for critiquing (possibly even rejecting) submitted papers. Samy Bengio, Alina Beygelzimer, Kate Crawford, Jeanne Fromer, Iason Gabriel, Amanda Levendowski, Inioluwa Deborah Raji, Marc’Aurelio Ranzato write:

Abstract: Over the past few decades, research in machine learning and AI has had a tremendous impact in our society. The number of deployed applications has greatly increased, particularly in recent years. As a result, the NeurIPS Conference has received an increased number of submissions (approaching 10,000 in the past two years), with several papers describing research that has foreseeable deployment scenarios. With such great opportunity to impact the life of people comes also a great responsibility to ensure research has an overall positive effect in our society.  

Before 2020, the NeurIPS program chairs had the arduous task of assessing papers not only for their scientific merit, but also in terms of their ethical implications. As the number of submissions increased and as it became clear that ethical considerations were also becoming more complex, such ad hoc process had to be redesigned in order to properly support our community of submitting authors.

The program chairs of NeurIPS 2020 established an ethics review process, which was chaired by Iason Gabriel. As part of that process, papers flagged by technical reviewers could undergo an additional review process handled by reviewers with expertise at the intersection of AI and Ethics. These ethical reviewers based their assessment on guidelines that Iason Gabriel, in collaboration with the program chairs, drafted for the occasion. 

This pilot experiment was overall successful because it surfaced early on several papers that needed additional discussion and provided authors with additional feedback on how to improve their work. Extended and improved versions of such ethics review process were later adopted by the NeurIPS 2021 program chairs as well as by other leading machine learning conferences. 

One outstanding issue with such a process was the lack of transparency in the process and lack of guidelines for the authors. Early in 2021, the NeurIPS Board gave Marc’Aurelio Ranzato the mandate to form a committee to draft a code of Ethics to remedy this. The committee, which corresponds to the author list of this document, includes individuals with diverse background and views who have been working or have been involved in Ethics in AI as part of their research or professional service. The first outcome of the committee was the Ethics guidelines, which was published in May 2021.

The committee has worked for over a year to draft a Code of Ethics. This document is their current draft, which has not been approved by the NeurIPS Board as of yet. The Board decided to first engage with the community to gather feedback. We therefore invite reviews and comments on this document. We welcome your encouragement as well as your critical feedback. We will then revise this document accordingly and finalize the draft, hoping that this will become a useful resource for submitting authors, reviewers and presenters.

The idea that AI/ML researchers need to take certain values (fairness, environmental sustainability, privacy, etc.) seriously so as to avoid real-world harms if their tools are deployed has been pretty visible in recent years, but these conversations have been mostly “opt-in” until recently. Documents like the code of ethics and requirements like broader impacts statements attempt to introduce values that haven’t been baked into the domain the way that caring about scalability and efficiency are. Naturally things get interesting. It’s hard not to acknowledge that knowledge of how to evaluate or deliberate about ethics is sparse (not to mention how to institute guidelines). Some adamantly oppose any ethics regulation on this ground, or for other reasons, e.g., arguing for instance that tech is neutral or that the future is too unpredictable to evaluate work on downstream consquences. Documenting what constitutes an ethics concern at the organization level requires deciding where to draw lines and what to leave out in a way that the individual papers about ethical concerns don’t have to contend with. I’ve heard some people refer to all this as a power struggle between the more activism-minded critics and those who are content from having benefitted from the status quo.

At any rate, among ML conferences, Neurips organizers have been pretty brave in stepping up to experiment with ways to establish processes to encourage more responsible research, whether everyone in the community wants them or not. The authors of this doc took on a difficult challenge in agreeing to write it. 

My initial reaction from reading the code of ethics is that if there is a power struggle around how big the role of ethical deliberation should be in AI/ML, this is a rather small step. It’s an interesting mix of a) what seem like basic research integrity suggestions that should not ruffle any feathers and b) Western progressive politics made explicit. I have more questions than strong opinions. 

The biggest question, as some of the public comments on the draft get at, is to what extent the guidelines can be interpreted as suggestions versus mandates. It seems reasonable to think that if Neurips is going to do ethics review, there should be some log somewhere of what kinds of things might lead to an ethics flag being raised that is taken seriously in review, and the abstract above makes clear this was part of the motivation for the code of ethics. But being transparent about the content of the guidelines is not the same as being transparent about the intended use of the guidelines. I think the authors do attempt to clarify – they write for instance that “this document aims at supporting the comprehensive evaluation of ethics in AI research by offering material of reflection and further reading, as opposed to providing a strict set of rules.” At the same time, most Neurips authors know by now that it is within the realm of possibility that their paper submission could be rejected if there are ethics concerns that aren’t addressed to satisfaction, so naturally they are going to pay attention to the specific content. 

From my read I expect that this code is not meant to have the same power as the ethics reviewers themselves; its a reflection of stuff they care about that can change over time but which will probably always be incomplete and secondary to their authority. Language used throughout the document makes it read like a tentative set of suggestions. There’s a lot of “encourage”, “emphasize,” “to the extent possible” to imply it’s more like a wish list on the part of Neurips organization. Though there are a few sentences that use slightly stronger terms, e.g., saying the the document “outlines conference expectations about the broad ethical practices that must be adopted by participants and by the wider community, including submitting authors, members of the program committee and speakers.” There’s also a reference to the code of ethics being meant to complement the Neurips code of conduct, which as far as I know is treated more like a set of rules that can get one reported, asked to leave events, etc. So not surprising to see some confusion about what exactly its role will be.

When it comes to the specific content, many of the suggestions about general research ethics seem like fairly well-accepted best practices: paying participants (like in dataset creation) a fair wage, involving the IRB in data collection, trying to protect the privacy of individuals represented in a dataset, documenting model and dataset details (though I’m not sure templates are necessary, as they suggest, so much as making the effort to record everything that might be important for readers to know). 

Suggestions about research ethics that struck me as slightly bolder include aiming for datasets that are representative of a diverse population and using disaggregated (by sub-population) test sets when reporting model performance. I guess fairness has graduated from being a sub-area of AI/ML to being an expectation of all applications. I wonder about a weaker suggestion that could have been made, that authors simply be more mindful in making claims about what their contributions are capable of (e.g., don’t claim it can detect some attribute from images of people’s faces if your training and test data only included images of white people’s faces). The idea that it’s not enough to ask for transparency and accurate statements about limitations, that instead we should evaluate the potential to do good for the world, is admittedly where I struggle the most with ethics reform in CS. The less value-laden premise that researchers should be transparent and rigorous in their claims seems at least as important to express as a value. Do these exist somewhere else? I glanced at the code of conduct but that seems more about barring harassment or fraud.

The second part of the code provides guidance on how authors should reflect on societal impact. It says that all authors are expected to include reflection on societal impacts in a separate paper section, so that reads like a requirement. The high level idea that authors should to the best of their ability try to imagine ways that the technology might be used to cause harm does not seem contentious to me. Requiring sections in all papers on societal impacts does however raise the question of who they are for – readers, authors, both? When I first heard proposals a few years ago that AI and ML researchers need to be stating broader impacts of what they build, I wondered why anyone would expect such reflection to be useful, given how much uncertainty exists in the pipeline from research paper to deployment, and the fact that computer scientists have often taken very few humanities classes. Having paid attention to attempts to codify ethics over the last few years, I now have the impression that the motivation behind requiring these statements is driven more by a desire to normalize the idea of reflecting on ethics, i.e., to encourage awareness in a general way rather than producing concrete factual statements. Still, it’s often hard in reading these proposals to tell what the ethics advocates have in mind, so how the expectations about downstream impacts are phrased seems important. You don’t want to imply predictability so much as an expectation that researchers will not intentionally produce tools for societal evil. 

Related to this, early on the doc states that:

In general, what is legally permissible might not be considered ethically acceptable in AI research. There are two guiding principles that underlie the NeurIPS Code of Ethics. First, AI research is worthwhile only to the extent that it can have a positive net outcome for our society. Second, since the assessment of whether a research practice or outcome is ethical might be difficult to establish unequivocally, transparent and truthful communication is of paramount importance.

“Positive net outcome” is a strong statement if you try to take it literally as a criteria we should use to judge all research in the near term. The second principle seems to acknowledge this, so I doubt the authors intend this phrase to be taken very seriously. Still the wording does make me cringe a little, since it implies that there is some ultimate “objective” perspective on what is positive versus negative. And the word “net” makes it seem like something that’s at least theoretically measurable, like we can calculate the definitive goodness of a project, it’s just that we can’t observe it perfectly as humans. My mind inevitably goes to wondering how many research projects submitted to Neurips explore techniques that don’t feed directly into real world deployments, but maybe help the researchers learn important lessons they apply in later projects, but could still be said to have opportunity costs in terms of taking effort away from what seem like more obviously beneficial applications (say, more efficient methods for diagnosing certain illnesses)… etc. Researchers can be pretty good at making up problems, or exaggerating problems we think we can solve, and we can learn from these projects, but the impact is sort of meh. I feel that way about some of my own work actually! I would hate to have weigh in on whether it was “net positive” or not. I don’t think the authors are advocating for thinking like this, but I guess I would prefer to avoid the positive / negative labels altogether and frame it more generally as an expectation that researchers will be careful in considering and acknowledging risks. 

One guideline in the societal impacts section that stood out to me is worded less like a suggestion:

Research should not be designed for incorporation into weapons or weapons systems, either as a direct aim or primary outcome. We encourage researchers to consider ways in which their contribution could be used to develop, support, or expand weapons systems and to take measures to forestall this outcome.

It would seem from this that one should expect mentioning weapons as an application to get their paper flagged for ethics review. I wonder here how the authors decided that weapons systems were off limits but not other uses of tech. I’m not disagreeing, but I’ve seen people call for bans on other types of applications, carceral for instance, so I wonder why only weapons system are ruled out. In line with transparency, it would be great for Neurips to also publish more on the process for producing this document: what were the criteria that the group had in mind in thinking through how to write this guidelines, including both what types of concerns to include and how to phrase the directives? I imagine there might be some lessons to be learned from reflecting on the process for creating guidelines like this, where there may have been disagreements, etc. 

It seems like it could also be an interesting exercise to collect many versions of a document like this, representing what different members of the Neurips community think is worth explicitly citing as community guidelines that could inform issues flagged for ethics review. There are some comments about how the vision in the document doesn’t really capture an international perspective as much as a Western one, for instance. It makes me wonder more about what the part of the abstract that says the committee includes “individuals with diverse background and views,” like along what dimensions views differed.  

15 thoughts on “Provisional draft of the Neurips code of ethics

  1. I note criticism of the NIH on the basis that committee allocation of grants tended to disadvantage more risky proposals. I wonder what the effect will be on new ideas about applications of ML, or hypotheses about society inspired by the output of ML, if all papers outlining these ideas have to pass committees who score their fairness and societal impact? I wonder if people with a sideline in political activism would like to sit on these committees?

  2. Jessica:

    This indeed raises interesting questions. Doing an uncontroversial ethics review seems impossible, but ethics are important so I can see why it makes sense to have some guidelines and to do this through an open process. If this all happens and papers start to get rejected for ethical reasons, I think these ethics review reports should be open so anyone can see the grounds under which a paper was rejected. I feel the same way about regular review reports.

    Regarding the bit about research on weapons systems, it seems reasonable to me for Neurips to restrict itself to peaceful applications. Not everyone will agree, though. In any case, research on weapons systems is going to happen anyway. I could see the logic of the Department of Defense organizing its own annual conference on the topic—or maybe such a conference exists already.

    • The ethics reviews last Neurips were open (I think all Neurips reviews are open; wish that was the case in the venues I publish in). Having read through some of the exchanges between the ethics reviewers and authors from last year, they seemed pretty productive/congenial. Though there did seem to be some confusion though among the regular reviewers about what could be flagged as an ethics concern (e.g., it wasn’t really meant for things like plagarism).

  3. I was so sure this post would eventually describe a “code of ethics” that is actually code: a ML/AI approach for unbiased and transparent screening of 10,000 conference submissions for problematic ethics.

  4. I’m sitting here trying to figure out why dumb weapons systems are less harmful to society on balance than smart weapons systems. Can anyone help?

    I can think of scenarios where dumb weapons systems would be worse, and I can think of scenarios where smart weapons systems would be worse. But it’s far from clear to me which one is worse in general, let alone being so much worse that pursuing a shift from dumb to smart is unethical.

      • I’m not sure what AI research on dumb weapons would be. But, to me there is some confusion here. If they want to focus on the consequences of the research, the research that creates some AI guidance system will have the consequence of making certain kinds of weapons better, e.g., self guided killer drones, along with making other nonweapons better, e.g., self guided emergency rescue drones. The applications of the research may have ethical implications, but that does not make the research immoral. We normally think of scientific or research ethics to be about honesty, transparency, respect for research subjects. Mengele’s research was immoral, consequences be damned. Your political research could be used to help an evil politician get in power. How can we evaluate whether research will have a net positive effect, when it is impossible to ascertain all of its applications and which applications society will pursue.

    • John N-G said: “I’m sitting here trying to figure out why dumb weapons systems are less harmful to society on balance than smart weapons systems. Can anyone help?”

      That’s a great point, and that’s likely to be true for almost everything our ethics-flogging researchers want to address. Outside of the general scientific ethics that Steve mentions below, usually the point of bringing “ethics” into the conversation on research is to bend the research to “my ethics” as the case may be. Do we all remember how the scientific community threw a cow when Little George wanted to circumscribe certain kinds of medical research (I think it was on fetuses or something, I can’t recall the whole debate). The people wanting to bring “ethics” into AI research are wanting to bring only *their* – debatable – “ethics” to prevent what they don’t like. The research community should be having a fit about this just as they had about GW and his restrictions on research, but apparently – we should be used to this by now – how the shoe fits depends on which foot it’s on.

      Just the same, as Jessica points out (“Some adamantly oppose any ethics regulation on this ground, or for other reasons, e.g., arguing for instance that tech is neutral or that the future is too unpredictable to evaluate work on downstream consquences.”) it’s heartening that there is resistance to those who would prefer that AI research not uncover facts that contradict their beliefs or politics.

      • You really don’t think it’s possible to create a technology that makes the world worse? If you came up with a set of glasses that can see under people’s clothes, you would have no qualms about releasing them for sale immediately? Hackers that spot zero day vulnerabilities—they should just publish them on the internet as soon as they discover them? Vulnerability disclosure policy is a load of crap?

        To be clear, I’m not sitting here in support of any particular code of ethics, but rather responding to your hostility towards seemingly even the idea of having a code of ethics, or thinking through the ethical implications of engineering work.

        Not to mention that machine learning has a correctness problem that is also an ethics problem in practice.

        • “You really don’t think it’s possible to create a technology that makes the world worse? ”

          There is almost certainly no technology that has **only** negative implications! :) C’mon Somebody. Nuclear weapons, nuclear power.

          “rather responding to your hostility towards seemingly even the idea of having a code of ethics”

          Like Dale, you’re putting words in my mouth. The question is not about having a “code of ethics”. The question is knowing the outcome of technological development. If you don’t know / can’t predict the outcome (you can’t), then what good will a code of ethics do? And again the question is: whose code of ethics? I’m happy to let the pointy-nosed pocket gopher go the way of the dodo if that’s a benefit to humans, but to some people saving the PNPG is worth tearing down hospitals and preventing people from having housing. What’s the “ethical” answer, and how is it universal, rather than just your personal opinion?

        • I object to the idea that we HAD to take the path

          relativity -> nuclear weapons -> nuclear power

          It’s merely an artifact of funding and historical urgency. There’s no reason you couldn’t jump straight to nuclear power and never produce a nuclear weapon. I’d argue the world would have been much better off that way.

          Like Dale, you’re putting words in my mouth. The question is not about having a “code of ethics”.

          In response to “some adamantly oppose any ethics regulation on this ground” you said “it’s heartening”. So I’m really not sure what I’m supposed to conclude here.

          The question is knowing the outcome of technological development. If you don’t know / can’t predict the outcome (you can’t), then what good will a code of ethics do?

          If person A engineers a 3D printed assault rifle and posts it on the internet with assembly instructions, and someone else prints it out to kill a bunch of people, that outcome was predictable to person A. I would say person A and whatever 3d model library they posted the gun on bear some moral responsibility there. I think the world would be better if person A had a personal moral code that stopped them from posting it, and if the 3d model library had a code of conduct that prohibited them from hosting it.

          Good outcomes result from bad intentions and bad outcomes from good intentions and both result by accident from completely neutral intentions. Gregor Mendel probably falsified some data, but obviously the world has benefitted from his findings. Does that mean journals should not bar falsified data?

          Every person, institution, journal, conference already has a code of ethics that presumes some ethical frame. If you didn’t agree that there should be institutional standards, you would be happily reading and publishing on vixra and not caring about what NeurIPS does or doesn’t do. You already agree there should be standards, you just have a specific disagreement with specific proposals. Your comment would be of positive value if you voiced what those are and why, rather than couching it in some nihilistic “unpredictable future” “ethics are relative” bullshit.

      • Anonymous
        While there are several Anonymi (is that a word?) on this blog, this post makes me think you are the same one that posted recently on the thread about journals and non-reproducible research. There you dismiss virtually all clinical trials as worthless – completely worthless. Here you seem to dismiss any discussion of ethics about technology as inappropriate misguided attempts to impose one’s ethical beliefs on others.

        While I certainly think there are seeds of truth in both complaints, I don’t think these extreme and blanket rejections of such efforts to be productive. Clinical trials do serve a purpose and ethical considerations of technology are also important. Both are imperfect and both can (and should and need to) be improved. But I don’t think total rejection of the efforts is a constructive step. In the absence of conducting clinical trials or ethical analysis, what exactly are you proposing?

        • “There you dismiss virtually all clinical trials as worthless – completely worthless.”

          That’s laughable! I did no such thing!!! I did not make reference specifically to clinical trials and did not claim that “all clinical trials are worthless”. Nonetheless, I do generally go along with the claim that a high percentage of medical research – that is, statistically based work – is useless and unproductive.

          “In the absence of conducting clinical trials or ethical analysis, what exactly are you proposing?”

          Two separate questions.

          With respect to clinical trials, I have no objection to the method and in fact I support it when there is a clear objective and a well-designed study. My objection is that most studies are doomed to failure before they start because they don’t actually have the components they need to mee the objective. My answer to that is cut the funding. People will be more selective about the projects they pursue.

          With respect to “ethics”: Did you miss John N-G’s point? The point is no one knows what the outcome of most research will be, so you can’t even get to the point that your judging the actual “ethics”. What you’re judging is some half-baked emotionally charged anticipation of the “ethics”. You’ll have to explain to me how you can foresee with certainty the outcome of various forms of AI research, then have a full enough understanding of its implications to judge the ethics. Your claim that such efforts are “imperfect” is a massive understatement – in my view a dishonest assessment to serve your efforts to insert your own ethics – when in fact efforts to understand the ethics are almost certain to be wrong, since the number of wrong answers about the future is very large but there is only one right answer.

          Taking fossil fuels as an example – here we are >150yrs after the widespread commercialization of fossil fuels and we **still** can’t agree on the ethical implications of fossil fuel use. No one could have anticipated in 1880 the society we have today, much less assess the “ethics” of fossil fuel production. So again your claim that such efforts are “imperfect” is ridiculous. It’s demonstrably wrong.

        • reply to Anonymous
          Now, you’ve gone and put words in my mouth and misrepresented my beliefs. Ethics is complex – your example about fossil fuels is a good one. You take the 150 years of indecision about its ethics as evidence that such efforts are doomed (and actually counterproductive as they are only platforms to promote one’s own ethical beliefs). I take it as evidence of the difficulty of ethics and human inability to really create sociopolitical systems that can keep up with our technological progress.

          Given the way the world is evolving, I can think of no more important problem than our need to learn how to live with each other (other to include people, animals, our planet, etc.). In a world where almost all nations appear to be headed to be run by megalomaniacs, I would think that we should take difficult problems as a necessary challenge to confront. You appear to take complexity as a good reason to cut funding and eliminate lines of inquiry. Let’s not slow down progress – after all, it has served us so well (yes it has, but only in some ways, and I’m not willing to ignore the issues we don’t know how to resolve).

          Regarding clinical trials: we may be in agreement, but it is hard to tell. It depends on what you deem to be studies doomed to fail to begin with. Clinical trials are hard to conduct, expensive in many dimensions, and there are many reasons why they achieve less than we want. “Cut funding” would be easy to agree with, but it will depend on what exactly you would cut. I’m not as sure we would find agreement there.

Leave a Reply

Your email address will not be published. Required fields are marked *