What if it’s never decorative gourd season?

Posted on November 13, 2019 11:51 AM by Dan Simpson

If it rains, now we’ll change
We’ll hold and save all of what came
We won’t let it run away
If it rains — Robert Forster

I’ve been working recently as part of a team of statisticians based in Toronto on a big, complicated applied problem. One of the things about working on this project is that, in a first for me, we know that we need to release all code and data once the project is done. And, I mean, I’ve bolted on open practices to the end of an analysis, or just released a git repo at the end (sometimes the wrong one!). But this has been my first real opportunity to be part of a team that is weaving open practices all the way through an analysis. And it is certainly a challenge.

It’s worth saying that, notoriously “science adjacent” as I am, the project is not really a science project. It is a descriptive, explorative, and predictive study, rather than one that is focussed on discovery or confirmation. So I get to work my way through open and reproducible science practices without, say, trying desperately to make Neyman-Pearson theory work.

A slight opening

Elisabeth Kübler-Ross taught us that there are five stages in the transition to more open and reproducible scientific practices:

Denial (I don’t need to do that!)
Anger (How dare they not do it!)
Bargaining (A double whammy of “Please let this be good enough” and “Please let other people do this as well”)
Depression (Open and reproducible practices are so hard and no one wants to do them properly)
Acceptance (Open and reproducible science is not a single destination, but a journey and an exercise in reflective practice)

And, really, we’re often on many parts of the journey simultaneously. (Although, like, we could probably stop spending so long on Anger, because it’s not that much fun for anyone.)

And a part of this journey is to carefully and critically consider the shibboleths and touchstones of open and reproducible practice. Not because everyone else is wrong, but more because these things are complex and subtle and working out how to weave them into our idiosyncratic research practices.

So I’ve found myself asking the following question.

Should we release code with our papers?

Now to friends and family who are also working their way through the Kübler-Ross stages of Open Science, I’m very sorry but you’re probably not going to love where I land on this. Because I think most code that is released is next to useless. And that it would be better to release nothing than release something that is useless. Less digital pollution.

It’s decorative gourd season!

A fairly well known (and operatically sweary) piece in McSweeney’s Internet Tendency celebrates the Autumn every year by declaring It’s decorative gourd season, m**********rs! And that’s the piece. A catalogue of profane excitement at the chance to display decorative gourds. Why? Because displaying them is enough!

But is that really true for code? While I do have some sympathy for the sort of “it’s been a looonnng day and if you just take one bite of the broccoli we can go watch Frozen again”-school of getting reluctant people into open science, it’s a desperation move and at best a stop-gap measure. It’s the type of thing that just invites malicious compliance or, perhaps worse, indifferent compliance.

Moreover, making a policy (even informally) that “any code release is better than no code release” is in opposition to our usual insistence that manuscripts reach a certain level of (academic) clarity and that our analyses are reported clearly and conscientiously. It’s not enough that a manuscript or a results section or a graph simply exist. We have much higher standards than that.

So what should the standard for code be?

The gourd’s got potential! Even if it’s only decorative, it can still be useful.

One potential use purely decorative code is the idea that the code can be read to help us understand what the paper is actually doing.

This is potentially true, but it definitely isn’t automatically true. Most code is too hard to read to be useful for this purpose. Just like most gourds aren’t the type of decorative gourd you’d write a rapturous essay about.

So unless code meets a certain standard, it’s going to need do something than just sit there and look pretty, which means we will need our code to be at least slightly functional.

A minimally functional gourd?

This is actually really hard to work out. Why? Well there are just so many things we can look at. So let’s look at some possibilities.

Good code “runs”. Why the scare quotes? Well because there are always some caveats here. Code can be good even if it takes some setup or a particular operating system to run. Or you might need a Matlab license. To some extent, the idea of whether “the code runs” is an ill-defined target that may vary from person to person. But in most fields there are common computing set ups and if your code runs on one of those systems it’s probably fine.

Good code takes meaningful input and produces meaningful output: It should be possible to, for example, run good code on similar-but-different data. This means it shouldn’t require too much wrangling to get data into the code. There are some obvious questions here about what is “similar” data.

Good code should be somewhat generalizable. A simple example of this: good code for a regression-type problem should not assume you have exactly 7 covariates, making it impossible to use when there data has 8 covariates. This is vital for dealing with, for instance, the reviewer who asks for an extra covariate to be added, or for a graph to change.

How limited can code be while still being good? Well that depends on the justification. Good code should have justifiable limitations.

Code with these 4 properties is no longer decorative! It might not be good, but it at least does something. Can we come up with some similar targets for the written code to make it more useful? It turns out that this is much harder because judging the quality of code is much more subjective.

Good gourd! What is that smell?

The chances that a stranger can pick up your code and, without running it, understand what the method is doing are greatly increased with good coding practice. Basically, if it’s code you can come back to a year later and modify as if you’d never put it down, then your code is possibly readable.

This is not an easy skill to master. And there’s no agreed upon way to write this type of code. Like clearly written prose, there are any number of ways that code can be understandable. But like writing clear prose, there are a pile of methods, techniques, and procedures to help you write better code.

Simple things like consistent spacings and doing whatever RStudio’s auto-format does like adding spaces each side of “+” can make your code much easier to read. But it’s basically impossible to list a set of rules that would guarantee good code. Kinda like it’s impossible to list a set of rules that would make good prose.

So instead, let’s work out what is bad about code. Again, this is a subjective thing, but we are looking for code that smells.

If you want to really learn what this means (with a focus on R), you should listen to Jenny Bryan’s excellent keynote presentation on code smell (slides etc here). But let’s summarize.

How can you tell if code smells? Well if you open a file and are immediately moved to not just light a votive candle but realize in your soul that without intercessory prayer you will never be able to modify even a corner of the code, then the code smells. If you can look at it and at a glance see basically what the code is supposed to do, then your code smells nice snd clean.

If this sounds subjective, it’s because it is. Jenny’s talk gives some really good advice about how to make less whiffy code, but her most important piece of advice is not about a specific piece of bad code. It’s the following:

Your taste develops faster than your ability.

To say it differently, as you code more you learn what works and what doesn’t. But a true frustration is that (just like with writing) you tend to know what you want to do before you necessarily have the skills to pull it off.

The good thing is that code for academic work is iterative. We do all of our stuff, send it off for review, and then have to change things. So we have a strong incentive to make our code better and we have multiple opportunities to make it so.

Because what do you do when you have to add a multilevel component to a model? Can you do that by just changing your code in a single place? Or do you have to change the code in a pile of different places? Because good smelling code is often code that is modular and modifiable.

But because we build our code over the full lifecycle of a project (rather than just once after which it is never touched again), we can learn the types of structures we need to build into our code and we can share these insights with our friends, colleagues, and students.

A gourd supportive lab environment is vital to success

The frustration we feel when we want to be able to code better than our skills allow is awful. I think everyone has experienced a version of it. And this is where peers and colleagues and supervisors have their chance to shine. Because just as people need to learn how to write scientific reports and people need to learn how to build posters and people need to learn how to deliver talks, people need to learn how to write good code.

Really, the only teacher is experience. But you can help experience along. Work through good code with your group. Ask for draft code. Review it. Just like the way you’ll say “the intro needs more “Piff! Pop! Woo!” because right now I’m getting “*Sad trombone*” and you’ve done amazing work so this should reflect that”, you need to say the same thing about the code. Fix one smell at a time. Be kind. Be present. Be curious. And because you most likely were also not trained in programming, be open and humble.

Get members of your lab to swap code and explain it back to the author. This takes time. But this time is won back when reviews come or when follow up work happens and modifications need to be made. Clean, nice code is easy to modify, easy to change, and easy to use.

But trainees who are new at programming are nervous about programming.

They’re usually nervous about giving talks too. Or writing. Same type of strategy.

But none of us are professional programmers

Sadly, in the year of our lord two thousand and nineteen if you work in a vaguely quantitative field in science, social science, or the vast mire that surrounds them, you are probably being paid to program. That makes you a professional programmer. You might just be less good at that aspect of your job than others.

I am a deeply mediocre part time professional programmer. I’ve been doing it long enough to learn how code smells, to have decent practices, and to have a bank of people I can learn from. But I’m not good at it. And it does not bring me joy. But neither does spending a day doing forensic accounting on the universities bizarre finance system. But it’s a thing that needs to be done as part of my job and for the most part I’m a professional who tries to do my best even if I’m not naturally gifted at the task.

Lust for gourds that are more than just decorative

In Norwegian, the construct “to want” renders “I want a gourd” as “Jeg har lyst på en kalebas” and it’s really hard, as an english speaker, not to translate that to “I have lust for a gourd”. And like that’s the panicking Norwegian 101 answer (where we can’t talk about the past because it’s linguistically complex or the future because it’s hard, so our only verbs can be instantaneous. One of the first things I was taught was “Finn er sjalu.” (Finn is jealous.) I assume because jealousy has no past or future).

But it also really covers the aspect of desiring a better future. Learning to program is learning how to fail to program perfectly. Just like learning to write is learning to be clunky and inelegant. To some extent you just have to be ok with that. But you shouldn’t be ok with the place you are being the end of your journey.

So did I answer my question? Should we release code with our papers?

I think I have an answer that I’m happy with. No in general. Yes under circumstances.

We should absolutely release code that someone has tried to make good code. Even though they will have failed. We should carry each other forward even in our imperfection. Because the reality is that science doesn’t get more open by making arbitrary barriers. Arbitrary barriers just encourages malicious compliance.

When I lived in Norway as a newly minted gay (so shiny) I remember once taking a side trip to Gay’s The Word, the LGBTQIA+ bookshop in London and buying (among many many others) a book called Queering Anarchism. And I can’t refer to it because it definitely got lost somewhere in the nine times I’ve moved house since then.

The thing I remember most about this book (other than being introduced to the basics of intersectional trans-feminism) was its idea of anarchism as a creative force. Because after tearing down existing structures, anarchists need to have a vision of a new reality that isn’t simply an inversion of the existing hierarchy (you know. Reducing the significance threshold. Using Bayes Factors instead of p-values. Pre-registering without substantive theory.) A true anarchist, the book suggested, needs to queer rather than invert the existing structures and build a more equitable version of the world.

So let’s build open and reproducible science as a queer reimagining of science and not a small perturbation of the world that is. Such a system will never be perfect. Just lusting to be better.

Extra links:

The best way to release code is to bury it in the frozen ground
Alex Hayes has a really excellent blogpost about making nice code for statistical modelling in R.
The TidyModels set of libraries in R is an interesting attempt to unify the interfaces of a lot of statistical modelling packages and clean up the repetitive workflow around building, fitting, and validating statistical models.

33 thoughts on “What if it’s never decorative gourd season?”

Isaac on November 13, 2019 12:12 PM at 12:12 pm said:

I like this a lot! The only thing that I maybe disagree with is the need for good code to be *meaningfully generalizable* in a context of code being published alongside an (applied) analysis. Most often when I’m looking for code accompanying a paper, it’s not because i want to apply it to a new problem, it’s because I *literally have no idea how the authors did what they did based on just reading the methodology section* (usually not the authors fault! It’s hard to write clearly, and realistically, not all analysis decisions fit within journal word counts). The code (as long as it runs, and reproduces the results) gives me a chance to understand what the authors did, even if I have to struggle a bit.

Of course if someone’s publishing a method, I’d want the code to be generalizable and abstracted – but sometimes abstraction (putting everything into clearly named functions, for example, but making the internals hard to find or hard to read) makes it harder to follow code that’s designed to really just document a single analysis. I guess what I’m trying to say is that what makes code smell good for software engineering or package development feels different from what makes code clear when its really only meant as a more precise documentation of the methodology section of an applied paper.

Reply ↓
- Dan Simpson on November 13, 2019 1:00 PM at 1:00 pm said:
  
  So this was definitely a draft that I thought was going to be published tomorrow, so I did a quick add of the stuff I want. And you’re right. The real motivation to make your code good is that you have to keep touching it throughout the publication process. So learning how to make your code cleaner, clearer, and less smelly can really help in the process of doing the work.
  
  Reply ↓
  - Isaac on November 13, 2019 1:15 PM at 1:15 pm said:
    
    Yes, I completely agree — when I refactor my code, I often find mistakes that I wouldn’t have caught otherwise, and a lot of times it can clarify my thinking around what I’m actually trying to do (and how to communicate it). I also like your point about making your code flexible so that it’s easier make changes in response to reviewers. I just sometimes struggle with a lot of the guidance around good coding practices being oriented towards a situation that’s parallel to software development (because of course software developers have a lot more experience in what constitutes good coding), even when the guidance doesn’t completely apply.
    
    Reply ↓
    - Daniel Lakeland on November 13, 2019 1:37 PM at 1:37 pm said:
      
      Absolutely, the lifetime of a typical research code is much much shorter than the lifetime of say the JQuery library, and the purposes are very different. If you’re building a gazillion functions and lots of object oriented data types to do your research code… you’re probably doing it wrong… But if you’re just writing every single command one after another in a long list…. you’re also probably doing it wrong.
      
      The right level of abstraction is somewhere in the middle… things you do repeatedly you should probably convert to a function, things you do in a big batch once (like querying a big database to form your data sample) you might hide in a second file you can “source” etc. But if you’re building a special object with a bunch of methods to represent a graph that you are producing exactly once…. you’re doing it wrong.
    - Dan Simpson on November 13, 2019 3:15 PM at 3:15 pm said:
      
      The barrier for OO depends on the language. It can be quite useful when there’s a moderately lightweight system (like R’s S3 classes). A bit of wrapping can make benchmarking, for example, a whole lot easier. The tidymodels universe in R is an attempt at constructing that type of thing.
      
      But yeah. Everything about coding standards is context dependent. But the lifetime for my research code is usually about a year (writing and getting through review). This is long enough for me to forget what cludged together code was supposed to do, and has forced me to waste a lot of time re-writing code that I shouldv’e been able to re-run.
Daniel Lakeland on November 13, 2019 12:32 PM at 12:32 pm said:

This is a long post and I didn’t make it through to the bottom, but my take on it was, sure, getting people to do a good job of coding and then releasing that would be fantastic, but you can’t underestimate the importance of a *forensic evidence trail*.

Suppose you hold an Election in Ecuador, and the election is basically done with fancy new iPad based election software provided by Republican Election Machines Inc…. The way the thing works is, millions of people come into voting booths, they click on screens with candidates, at the end they get a confirmation screen, they press confirm… and magically at the end of the election the software spits out: Republican Candidate: 93%, Liberal Candidate: 7%

no software, no printout ballots put in boxes to be recounted, no nothing… just take our word for it.

This is almost all of science today.

So releasing the code is *absolutely necessary* even if its a total monster of spaghetti, because it provides SOME KIND of audit trail.

Reply ↓
Andrew on November 13, 2019 12:52 PM at 12:52 pm said:

Dan (Simpson):

What about Markdown? Doesn’t that resolve many of these issues? I agree that Markdown code isn’t perfect, it can run on some computers and not others, etc. Still, it seems like a reasonable step forward.

That said, in many cases, I’d be happy if someone were to send me raw data plus codebook plus an R script to read in the data, do the analyses, and make the graphs. Even ugly code, if it runs and reproduces the results in the report, that’s a big big something.

Reply ↓
- Dan Simpson on November 13, 2019 1:01 PM at 1:01 pm said:
  
  It can, but it doesn’t necessarily. Good markdown code has to be good in the same way as any other code.
  
  Reply ↓
  - Andrew on November 13, 2019 1:03 PM at 1:03 pm said:
    
    Dan:
    
    I still think ugly code is better than no code. Consider Reinhart-Rogoff (the “Excel error” paper). Some runnable code would’ve made it lots easier for people to find this problem sooner.
    
    Reply ↓
    - Colin Wyers on November 13, 2019 10:59 PM at 10:59 pm said:
      
      I also feel like if your code isn’t in a state where sharing it is possible, it’s also more likely that you’ve made a mistake. One problem I see with people who are used to doing REPL/notebook style coding is that they never once have all of the code they ran in order in a document at once. If you can’t reproduce your own results, how could anybody else?
- Dan Simpson on November 13, 2019 1:02 PM at 1:02 pm said:
  
  Also, I’d argue that that’s a small small something. It’s kinda the least possible version of something and we can and should ask for a whole lot more.
  
  Reply ↓
  - Daniel Lakeland on November 13, 2019 1:16 PM at 1:16 pm said:
    
    We can and should ask for more, like we can and should ask for electronic voting machines to spit out paper printouts we can put in an envelope and stuff into a box. Sure it’s fine for them to also provide summary counts right away, but that audit trail is critical. In some ways you are saying the audit trail should be a well organized printout where it’s easy to by hand see exactly what the counts should be… I agree, but even if it’s an ugly mess, having it and being able to comb through it to figure out what happened is the minimum. We shouldn’t reject the minimum, at the same time we should encourage more.
    
    Reply ↓
    - ojm on November 14, 2019 5:13 AM at 5:13 am said:
      
      > We shouldn’t reject the minimum, at the same time we should encourage more.
      
      Exactly. Arguing against the minimum seems very counterproductive.
Brent Hutto on November 13, 2019 1:54 PM at 1:54 pm said:

My earliest training was as a software engineer where that sort of accountability for every detail of the entire software life cycle is expected. I’d love nothing better than to cut back to only, say, 15 or 20% as many projects and do them using a Ready For Prime Time workflow in which everything is justified (in writing) and documented (in standardized form) including test procedures and ongoing quality checks. Following through on one project and getting it polished to a Fare Thee Well before showing it to the world is infinitely more rewarding than doing half a dozen projects, each just well enough to get by before moving on to the next.

But where’s the budget for it? Heck, I’m not sure even professional coders in this day and age operate the way we were taught in engineering school back in the 70’s. And a mere statistical analyst working as a member of various wide-ranging research teams isn’t afforded the luxury of even coding like a guy writing apps for some Internet start-up.

Maybe there’s a magic bullet out there which makes some reasonable compromise on documentation, transparency and reproducibility much more time efficient than I have any idea. But having spent a few hundred hours (on my own time) learning to use RStudio notebooks and combined code/text/graphics using Markdown and so forth, I know that’s not the magic bullet for anything larger than toy projects or small, self-contained analyses.

Reply ↓
- Brent Hutto on November 13, 2019 2:07 PM at 2:07 pm said:
  
  I know it’s bad form to immediately “reply” to ones own post but I left this out the first time…
  
  The form “show your code” takes in my world is I get a request to bundle up a few SAS programs along with a dataset and a basic codebook/list of variables and send it to someone who wants to repeat an analysis as a jumping-off point for a follow-on paper or maybe a meta-analysis. Easy enough to do and generally that’s what is being requested.
  
  But that’s just “showing” the final step in the process. I get this perfectionistic anxiety that without somehow being able to document the Forking Paths stretching back two to four years previously to that final dataset+code I just supplied I’m really just asking someone to take my word that 90% of the work done on the data is OK and letting them “reproduce” the easy part, that final 10%.
  
  Reply ↓
  - Daniel Lakeland on November 13, 2019 2:37 PM at 2:37 pm said:
    
    Ideally the code reads in the raw data you got from some other source, and spits out any intermediate “cleaned data” you use for your final analysis. Lots of times that isn’t the case, back in the depths of time someone sent you some stuff, you cleaned it, put it on the shelf, and then later all your analysis is on the cleaned form… discouraging that is a good thing in my opinion, though it’s not always possible to work from the rawest available data.
    
    Also, if wordpress had an “edit comment” function like in discourse, it’d go a long way to avoiding nested replies.
    
    Reply ↓
Andrew on November 13, 2019 2:42 PM at 2:42 pm said:

Dan:

I like this bit near the end of your post:

Anarchists need to have a vision of a new reality that isn’t simply an inversion of the existing hierarchy (you know. Reducing the significance threshold. Using Bayes Factors instead of p-values. Pre-registering without substantive theory.) A true anarchist, the book suggested, needs to queer rather than invert the existing structures and build a more equitable version of the world.

But there’s no reason to restrict this advice to “anarchists”! I mean, sure, I understand you got the quote from a book about anarchism, but we should be able to apply the principle more generally, even for those of us who don’t consider ourselves as anarchists.

Reply ↓
- Dan Simpson on November 13, 2019 3:11 PM at 3:11 pm said:
  
  Kinda like racism (but hopefully not at all like racism), anarchism is a framework and you don’t need to consider yourself to be an anarchist to actually be one :p
  
  (Also – can we put this in the workflow book? Maybe with fewer gourds? :p)
  
  Reply ↓
  - Andrew on November 13, 2019 3:59 PM at 3:59 pm said:
    
    Dan:
    
    Yes to including this in workflow book. I like the connections between communication workflow, computing workflow, statistical workflow, and science workflow. It’s related to Bob’s chapter on modeling as software development.
    
    Unrelatedly, I don’t think anarchism is just a framework. It’s also a political program of zero government (or something like that). Like Thomas Hobbes and a bunch of other people, I think that some government is a good thing!
    
    Reply ↓
Ben Goodrich on November 13, 2019 2:56 PM at 2:56 pm said:

A feel a bit bad for precipitating all this, but perhaps some good will come of it. In the case of MATLAB code, would you say

(A) No MATLAB code should be released because many do not have a license to run it
(B) Non-trivial MATLAB code should not be released because the MATLAB language is esoteric and hard to follow for people who are not very experienced in MATLAB
(C) MATLAB code should be released only if it is well written by MATLAB standards

?

I, too, was disappointed to recently click on a link in a footnote of a paper to GitHub repo only to find that the code was written in MATLAB, but my reaction was “OK, if necessary, Aki can explain / run this.” Perhaps that was insufficiently ideological. But there is a sizable collection of academics who get access to MATLAB through their university, and they write papers that are written for that audience and reviewed by people from that community. And they may be furiously trying to get promoted or get tenure and I don’t really fault them for writing code in whatever language is quickest for them, even if they could have reached a broader audience by writing it in another language or including more comments.

But that isn’t a good state of affairs for science. I have been a (largely passive) supporter of Sage, but I would agree with its founder that Sage has largely failed in its mission statement of “Creating a viable free open source alternative to Magma, Maple, Mathematica and Matlab.” So, we keep getting pulled back into this cycle where every file code written in the 4Ms makes it harder and harder for others to avoid writing code in the 4Ms.

Reply ↓
- Dan Simpson on November 13, 2019 3:05 PM at 3:05 pm said:
  
  I’ll stick with what I said in the post, which is that code may be good even if it can only be run under some circumstances (eg something that doesn’t work on Windows or something that requires a software licence). This basically limits the audience you’re talking to, but it may not limit it very much. It’s context dependent.
  
  But I was definitely in the situation with the PSIS paper that I couldn’t modify the old experiments or graphs because they were done in Matlab. So that was *deeply* annoying and increased everyone’s workload.
  
  My bigger view is that if you’re adding value to a commercial project by developing for it and expanding its userbase, you should be paid. I’m not against proprietary software per se, just the way they interact with the communities (kinda like I’m not against Elseivier etc, I just think that their model as it currently exists is unfair).
  
  As for the 4Ms, Matlab is the easiest M to ditch for a lot of people. The others all have core functionalities that I’ve not seen replicated anywhere.
  
  Reply ↓
  - Dan Simpson on November 13, 2019 3:09 PM at 3:09 pm said:
    
    On a personal note though, if you want *me* to run code, it would be better to print it onto punch cards and bury it in an unmarked time capsule beneath a cathedral than to release it in Matlab. I can’t run it. I can’t read more than simple code in it (partly because it’s not a language that promotes good code practices). And so I’m defined out of the community that the code is being released for. People are welcome to be ok with that.
    
    Reply ↓
    - Ben Goodrich on November 13, 2019 3:29 PM at 3:29 pm said:
      
      I’m ok with that. I was more hoping that someone had already spent time on it and could tell us whether it was or was not a good candidate to implement in Stan. But I’m kinda ok with the first author releasing the MATLAB code too. It looked to me reasonably well organized, and they needed some derivatives, which the MATLAB symbolic toolbox (that people have to pay extra for) can give you. Of course, auto-differentiation could have given them that too, and I wish that everyone in that situation would make a little bit of effort to get something working with Stan instead of defaulting to one of the 4Ms.
Noam Ross on November 13, 2019 5:14 PM at 5:14 pm said:

I gave a talk some time ago called “Reproducibility from a mostly selfish point of view,” in which I argued that one should release data and code as a mechanism to increase the impact, audience, or perceived quality of your research, just as one uses writing and good figures to do so. One can be a bad writer and publish an adequate but unimpressively written paper containing good science, or one can strive to improve one’s writing so more people like your paper. It’s similar for code – one can choose to invest in publishing code that is readable/runnable/generalizable so as to make more people aware of and impressed by your work. Or one can choose to invest elsewhere.

Reply ↓
- Andrew on November 13, 2019 6:10 PM at 6:10 pm said:
  
  Noam:
  
  Sure, for you or me, replication is a good thing. We want more people to replicate our studies, run our code, etc., go through our data to learn more, etc. But for Brian Wansink, Marc Hauser, etc. . . . not so much. They are aware that sharing their data could cause them to lose the impact, audience, and perceived quality of their research. That could be called “Avoiding reproducibility from a mostly selfish point of view.”
  
  Reply ↓
  - Noam Ross on November 14, 2019 1:17 AM at 1:17 am said:
    
    Fair! I’m all for minimal disclosure standards. I think they are more likely to be adopted, as they have been by some journals/funders, when a critical mass of researchers see those requirements as in their interest.
    
    Reply ↓
- jim on November 13, 2019 10:40 PM at 10:40 pm said:
  
  ” argued that one should release data and code as a mechanism to increase the impact, audience, or perceived quality of your research, just as one uses writing and good figures to do so.”
  
  Superb!!! Clap clap clap!!
  
  As an undergrad, I went to a conference and went to the talk of a person who’d written one of my class textbooks. It was a great book. And guess what? His talk was in a big room and it was packed. And he gave a great talk.
  
  Good communication – well organized writing, presentations, tables, figures / charts, photos and – yep – code, will take your career a long, long way.
  
  Reply ↓
Carl on November 13, 2019 10:17 PM at 10:17 pm said:

Here’s one way I find it useful to have code of any quality released.

– If code isn’t released with a paper, I don’t know whether the code that produced the paper was “good code” or “bad code.”
– Even if I can’t run the code, or don’t know the language, I can usually still tell whether the code smells.
– In my experience, any non-trivial amount of code written poorly and organized sloppily almost certainly has bugs. (Whether they’re fatal or not is less certain.)
– So seeing that a paper was produced with crap code helps me discount my trust of the paper’s results.

I’m sympathetic to the point that letting people release unreadable, unwritable code doesn’t really provide any benefits to the community. But I don’t know how you incentivize people to write useful code unless there are strong norms expecting code to be released. You’re right that we should also develop similar community expectations about quality. But I don’t see how a policy of “don’t bother to release code if it’s crappy” gets you there.

Reply ↓
- jim on November 14, 2019 1:14 AM at 1:14 am said:
  
  “But I don’t know how you incentivize people to write useful code unless there are strong norms expecting code to be released. ”
  
  Excellent! There’s no telling how much mistake-ridden stinky code is out there, producing mistake-ridden stinky results.
  
  Releasing code has two very important side benefits:
  
  1) Ensure better research – people test and clean code that has to be released. This will reduce errors.
  2) Reward good research practices overall – people who have efficient, bug-minimized understandable code will be recognized for this work.
  
  People should release code as a matter of general practice.
  
  Reply ↓
Martin Modrák on November 14, 2019 3:33 AM at 3:33 am said:

I agree that we shouldn’t settle on just releasing code and we should really help and incentivize people to make their code smell less. But I think you are ignoring one more variable: the ability to read and understand code also differs wildly, so what might be unusable for some maybe useful for others. With a bit of bragging, I’ve gained useful insights from loose piles of MATLAB scripts written by very novice programmers, with filenames like “my_f1.m”, “my_f2.m”,”my_f2_better.m”. Did it take more effort and time than it should have? Yes! Was it better than not seeing the code? Very much! So, for me, if you are finishing a project on the last minute and thinking whether adding a zip file with your code folder as-is is worth it (because you don’t have energy for much more), please do!

Reply ↓
ojm on November 14, 2019 5:06 AM at 5:06 am said:

This just comes across as trying to be clever and putting people down for not doing things exactly how you want them done. Not a very encouraging or supportive attitude imo.

Also, having ’work on Windows’ as a key feature hardly seems ‘open’.

My advice is:
Release your code and your data as necessary SI. Areas like computational biology are way better since everyone starting sharing even shitty code.

The incentive for better, open source code is the better the code the more likely people are to build on and cite your work, but at a minimum *someone*, but not necessarily everyone, should be able to verify your analysis.

Small steps, don’t be a snob about language choices, don’t try to be too clever and just encourage people from wherever they’re starting.

Reply ↓
Rheophile on November 14, 2019 10:32 AM at 10:32 am said:

I think the discussion about what code is valuable, and why, is important. It’s making me think about my own code more. But from the point of view of my field, “open terrible code” would be a huge, valuable step forward.

For instance, it would probably have prevented this giant waste of everyone’s time:
https://physicstoday.scitation.org/do/10.1063/PT.6.1.20180822a/full/
“The war over supercooled water – How a hidden coding error fueled a seven-year dispute between two of condensed matter’s top theorists.”

Reply ↓
Dan F. on November 14, 2019 11:19 AM at 11:19 am said:

There is nothing more useless than papers (in whatever genre) that tell the reader that A and B have been coded and the code does C but give only superficial indications of how the code works, and no indication of where to examine it. Too many papers are reports of claims whose justification is known only to the authors and their gods.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

What if it’s never decorative gourd season?

33 thoughts on “What if it’s never decorative gourd season?”

Leave a Reply Cancel reply