Who owns your code and text and who can use it legally? Copyright and licensing basics for open-source

I am not a lawyer (“IANAL” in web-speak); but even if I were, you should take this with a grain of salt (same way you take everything you hear from anyone). If you want the straight dope for U.S. law, see the U.S. government Copyright FAQ; it’s surprisingly clear for government legalese.

What is copyrighted?

Computer code and written material such as books, journals, and web pages, are subject to copyright law. Copyright is for the expression of an idea, not the idea itself. If you want to protect your ideas, you’ll need a patent (or to be good at keeping secrets).

Who owns copyrighted material?

In the U.S., copyright is automatically assigned to the author of any text or computer code. But if you want to sue someone for infringing your copyright, the government recommends registering the copyright. And most of the rest of the world respects U.S. copyright law.

Most employers require as part of their employment contract that copyright for works created by their employees be assigned to the employer. Although many people don’t know this, most universities require the assignment of copyright for code written by university research employees (including faculty and research scientists) to the university. Typically, universities allow the author to retain copyright for books, articles, tutorials, and other traditional written material. Web sites (especially with code) and syllabuses for courses are in a grey area.

The copyright holder may assign copyright to others. This is what authors do for non-open-access journals and books—they assign the copyright to the publisher. That means that even they may not be able to legally distribute copies of the work to other people; some journals allow crippled (non-official) versions of the works to be distributed. The National Institutes of Health require all research to be distributed openly, but they don’t require the official version to be so, so you can usually find two versions (pre-publication and official published version) of most work done under the auspices of the NIH.

What protections does copyright give you?

You can dictate who can use your work and for what. There are fair use exceptions, but I don’t understand the line between fair use and infringement (like other legal definitions, it’s all very fuzzy and subject to past and future court decisions).

Licensing

For others to be able to use copyrighted text or code legally, the copyrighted material must be explicitly licensed for such use by the copyright holder. Just saying “common domain” or “this is trivial” isn’t enough. Just saying “do whatever you want with it” is in a grey area gain, because it’s not a recognized license and presumably that “whatever you want” doesn’t involve claiming copyright ownership. The actual copyright holder needs to explicitly license the material.

There is a frightening degree of non-conformance among open-source contributors, largely I suspect, due to misunderstandings of the author’s employment contract and copyright law.

Derived works

Most of the complication from software licensing comes from so-called derived works. For example, I download open-source package A, then extend it to produce open-source package B that includes open-source package A. That’s why most licenses explicitly state what happens in these cases. The reason we don’t like the Gnu Public Licenses (GPL) is that they restrict derived works with copyleft (forcing package B to adopt the same license, or at best one that’s compatible). That’s why I insisted on the BSD license for Stan—it’s maximally open in tems of what it allows others to do with the code, and it’s compatible with GPL. R’s licensed under the GPL, so we released RStan under the GPL so that users don’t have to deal with both the GPL and a second license to use RStan.

Where does Stan stand?

Columbia owns the copyright for all code written by Columbia research staff (research faculty, postdocs, and research scientists). It’s less clear (from our reading of the faculty handbook) who owns works created by Ph.D. students and teaching faculty. For non-Columbia contributions, the author (or their assignee) retains copyright for their contribution. The advantage of this distributed copyright is that ownership isn’t concentrated with one company or person; the disadvantage is that we’ll never be able to contact everyone to change licenses, etc.

The good news is that Columbia’s Tech Ventures office (the controller of software copyrights at Columbia), has given the Stan project a signed waiver that allows us to release all past and future work on Stan under open source licenses. They maintain the copyright, though, under our employment contracts (at least for the research faculty and research scientists).

For other contributors, we now require them to explicitly state who owns the copyrighted contribution and to agree that the copyright holder gives permission to license the material under the relevant license (BSD for most of Stan, GPL or MIT for some of the interfaces).

The other good news is that most universities and companies are coming around and allowing their employees to contribute to open-source projects. The Gnu Public License (GPL) is often an exception for companies, because they are afraid of its copyleft properties.

C.Y.A.

The Stan project is trying to cover our asses from being sued in the future by a putative copyright holder, though we don’t like having to deal with all this crap (pun intended).

Luckily, most universities these days seem to be opening up to open source (no, that wasn’t intended to continue the metaphor of the previous paragraph).

But what about patents?

Don’t get me started on software patents. Or patent trolls. Like copyrights, patents protect the owner of intellectual property against its illegal use by others. Unlike copyright, which is about the realization of an idea (such as a way of writing a recipe for chocolate chip cookies), patents are more abstract and are about the right to realize ideas (such as making a chocolate chip cookie in any fashion). If you need to remember one thing about patent law, it’s that a patent lets you stop others from using your patented technology—it doesn’t let you use it (your patent B may depend on some other patent A).

Or trademarks?

Like patents, trademarks prevent other people from (legally) using your intellectual property without your permission, such as building a knockoff logo or brand. Trademarks can involve names, font choices, color schemes, etc. The trademark itself can involve fonts, color schemes, similar names, etc. But they tend to be limited to areas, so we could register a trademark for Stan (which we’re considering doing), without running afoul of the down-under Stan.

There are also unregistered trademarks, but I don’t know all the subtleties about what rights registered trademarks grant you over the unregistered ones. Hopefully, we’ll never be writing that little R in a circle above the Stan name, Stan®; even if you do register a trademark, you don’t have to use that annoying mark—it’s just there to remind people that the item in question is trademarked.

What do CERN, the ISS, and Stephen Fry have in Common?

You’ll have to read the New Yorker article on Richard M. Stallman and the The GNU Manifesto by Maria Bustillos to find out!

And what’s up with Tim O’Reilly’s comments about the Old Testment vs. New Testament?   That’s an ad hominem attack of the highest order, guaranteed to get the Judeo-Christians even more riled up than computer scientists debating GPL vs. BSD. On the plus side, it did remind me of Dana Fradon’s side-splitting New Yorker cartoon about the God of the Old Testament.

This is all strong evidence that Andrew missed an opportunity by not putting Stallman in the “founders of religions” bracket along with Freud.

I knew we’d hit critical mass with Stan when rms (I hear that’s waht the cool kids call him) wrote to us about the Stan license. I pretty much steamrolled the BSD license to maximize our user base. Allen Riddell, on the other hand, decided to copyleft PyStan. Reasonable people can disagree. Of course, R is GPL-ed, so the combination of RStan and R has to be copylefted, too.

So let’s take a little poll, in the spirt of recent posts by Andrew and all this focus on seminar speakers.

1. Which is tastier, free beer or free speech?

  • [ ] Beer
  • [ ] Speech
  • [ ] It’s political season and I need a drink with my speech.
  • [ ] Beer and speeches are both too bitter.