Another way of saying this is that you should treat inline code comments as a last resort when there is no other way to make your intentions clear.
I used to teach a session of Andrew’s statistical communication class once a year and I’d focus on communicating a computational API. Most of the students hated it because they signed up for the class to hear Andrew talk about stats, not me talk about API design. At least one student just up and walked out every year! So if you’re that student, now’s your chance to bail.
Comments considered harmful
Most academics, before they will share code with me, tell me they have to “clean it up.” I invariably tell them not to bother, and at best, they will dilly dally and shilly shally and apologize for lack of comments. What they don’t realize is that they were on the right track in the first place. The best number of inline code comments is zero. Nada. Zilch. Nil. Naught.
Why are comments so harmful? They lie! Even with the best of intent, they might not match the actual implementation. They often go stale over time. You can write whatever you want in a comment and there’s no consistency checking with the code.
You know what doesn’t lie? Code. Code doesn’t lie. So what do professional programmers do? They don’t trust comments and read the code instead. At this point, comments just get in the way.
What’s a bajillion times better than comments?
Readable code. Why? It’s self documenting. To be self documenting, code needs to be relatively simple and modular. The biggest mistake beginners make in writing code is lack of modularity. Without modularity, it’s impossible to build code bottom up, testing as you go.
It’s really hard to debug a huge program. It’s really easy to debug modules built up piece by piece on top of already-tested modules. So design top down, but build code bottom up. This is why we again and again stress in our writing on Bayesian workflow and in our replies to user questions on forums, that it helps immensely to scaffold up a complicated model one piece at a time. This lets you know when you add something and it causes a failure.
Knowing where to draw lines between modules is, unfortunately, a matter of experience. The best way to get that experience? Read code. In the product coding world, code is read much more often than it’s written. That means much more effort typically goes into production code to make it readable. This is very unlike research code which might be written once and never read again.
There is a tradeoff here. Code is more readable with short variable names and short function names. It’s easier to apprehend the structure of the expression a * b + c**2 than meeting_time * number_of_meetings + participants**2. We need to strike a balance with not too long, but still informative variable names.
And why are beginners so afraid of wasting horizontal space while being spendthrifts on the much more valuable vertical space? I have no explanation. But I see a lot of code from math-oriented researchers that looks like this, ((a*b)/c)+3*9**2+cos(x-y). Please use spaces around operators and no more parens than are necessary to disambiguate given attachment binding.
When should I comment?
Sometimes you’re left with no choice and have to drop in a comment as a last resort. This should be done if you’re doing something non-idiomatic with the language or coding an unusual algorithm or something very involved. In this case, a little note inline about intent and/or algebra can be helpful. That’s why commenting is sometimes called a method of last resort.
But whatever you do, comment for people who know the language better than you. Don’t write a comment that explains what a NumPy function does—that’s what the NumPy doc is for. Nobody wants to see this:
int num_observations = 513; // declare num_observations as an integer and set equal to 513
But people who feel compelled to comment will write just this kind of thing, thinking it makes their code more professional. If you think this is a caricature, you don’t read enough code.
The other thing you don’t want to do is this:
##################################################### ################## INFERENCE CODE ################### ##################################################### ... ... ...
This is what functions are for. Write a function called inference() and call it. It will also help prevent accidental reuse of global variables, which is always a problem in scripting languages like R and Python. Don’t try to fix hundreds or thousands of lines of unstructured code with structured comments.
Another thing to keep in mind is that vertical space is very precious in coding, because we want to be able to see as much of the code as we can at a time without scrolling. Don’t waste vertical space with useless or even harmful comments.
Do not, and I repeat, do not use /* ... */ style comments inline with code. It’s too easy to get confused when it’s a lot of lines and it’s doubly confusing when nested. Instead, use line comments (// in C++ and Stan, # in Python and R). Use the comment-region command in emacs or whatever does the same in your IDE. With line comments, the commented out code will be very visible, as in the following example.
for (int n = 0; n < N; ++n) {
// int x = 5
// int y = x * x * 3;
// int z = normal_rng(y, 1);
z = n * 3
}
Compare that to what I often see, which is some version of the following.
for (int n = 0; n < N; ++n) {
/* int x = 5
int y = x * x * 3;
int z = normal_rng(y, 1); */
z = n * 3
}
In the first case, it's easy to just scan down the margin and see what's commented out.
After commenting out and fixing everything, please be a good and respectful citizen and just delete all the edited out code before merging or releasing. Dead code makes the live code hard to find and one always wonders why it's still there---was it a mistake or some future plan or what? When I first showed up at Bell Labs in the mid 1990s, I was handed a 100+ page Tcl/Tk script for running a speech recognizer and told only a few lines were active, but I'd have to figure out which ones. Don't do that!
The golden triangle
What I stressed in Andrew's class is the tight interconnection between three aspects of production code:
$latex \textrm{API Documentation} \leftrightarrow \textrm{Unit tests} \leftrightarrow \textrm{Code}$
The API documentation should be functionally oriented and say what the code does. It might include a note as to how it does it if that is relevant to its use. An example might be different algorithms to compute the same thing that are widely known by name and useful in different situations. The API doc should ideally be specific enough to be the basis of both unit testing and coding. So I'm not saying don't document. I'm saying don't document how inline code works, document your API's intent.
The reason I call this the "golden" triangle is the virtuous cycle it imposes. If the API doc is hard to write, you know there's a problem with the way the function has been specified or modularized. With R and Python programmers, that's often because the code is trying to do too many things for a single function and the input types and output types become a mess of dependencies. This leads to what programmers identify as a "bad smell" in the code. If the code or the unit tests are hard to write, you know there's a problem with the API specification.
Clients (human and computational) are going to see and "feel" the API. That's where the "touch" is that designers like to talk about in physical object design. Things need to feel natural for the application, or in the words of UI/UX designers, it needs to offer affordances (in the past, we might have said it should be intuitive). It needs to feel natural for the application. Design the API first from the client perspective. Sometimes you have to suffer on the testing side to maintain a clean and documentable API, but that clean API is your objective.
What about research code?
Research code is different. It doesn't have to be robust. It doesn't have to be written to be read by multiple people in the future. You're usually writing end-to-end tests rather than unit tests, though that can be dangerous. It still helps to develop bottom-up with testing.
What research code should be is reproducible. There should be a single script to run that generates all the output for a paper. That way, even if the code's ugly, at least the output's reproducible and someone with enough interest can work through it.
And of course, research code needs to be tested that it's doing what it's supposed to be doing. And it needs to be audited to make sure it's not "cheating" (like cross-validating a time-series, etc.).
Notebooks, Quarto, and other things that get in the way of coding and documenting
With all due respect to Donald Knuth (never a good start), literate programming is a terrible way to develop code. (On a related topic, I would totally recommend at least the beginning part of Knuth's notes on how to write math.)
I don't love them, but I use Quarto and Jupyter (nee iPython) notebooks for writing reproducible tutorial material. But only after I've sorted out the code. These tools mix text and code and make too many compromises along the way to make them good at either task. Arguably the worst sin is that it winds up obfuscating the code with a bunch of text. Jupyter also makes it possible to get into inconsistent states because it doesn't automatically re-run everything. Quarto is just a terrible typesetting platform, inheriting all the flaws of pandoc, citeproc, with the added joy of HTML and LaTeX interoperability and R and Python interoperability. We use it for Stan docs so that we can easily generate HTML and LaTeX, but it always feels like there should be a better way to do this as it's a lot of trial and error due to the lack of specs for markdown.
Well, I will never be as good a coder as Bob Carpenter (not just because I use comments, but in general!), but I like comments for certain situations. Maybe I shouldn’t. Most of the code that I write will never be read by anyone other than myself. I often write comments to help me remember certain chains of thought – yes, I could just read the code, but I prefer comments sometimes. They don’t explain how the code is working, they just remind me why I am writing that piece. I often do this in lengthy data wrangling tasks. I also like notes in my Stan model to explain my reasoning about why I am specifying the model in a particular way. Sometimes I will have a separate document that contains this commented code that is separate from the clean file with all code and no comments. The comments aren’t intended to explain the code so much as the line of reasoning for it.
It seems to me that use of comments in code depends on both the kind of code you are writing and to whom you are writing it.
I do agree that writing clear, modular code is better than trying to explain opaque, poorly written code with comments. However, not all comments have the same purpose, and maybe the types of comments I mention aren’t really what Bob is talking about anyway. In some respects, a markdown file like you might see for case studies published on the web is heavily commented code.
I’m not a top tier programmer by industrial coding standards. I know people in industry who can code circles around me and even people in semi-academia like Brian Ward and Steve Bronder, who I work with regularly, who are both way better programmers than me. Being a great programmer is a full-time job! It’s really hard.
I didn’t make it clear enough in the post that I think different standards should be applied to different kinds of code. Research code isn’t anything like production code and most of this advice is for production code that lots of people are going to read. In that context, it’s worth making the code readable, which means organizing it modularly into well-separated and well-documented components. At that point, you can usually read the “what” out of the code and the “why” will also often be clear.
I’m not sure what you mean by the line of reasoning for the code. Do you mean something like the following, where you might tell them that if you square a standard normal you get a chisquare distribution?
This kind of comment can be useful if you don’t think someone can understand the math. Want to know a better way to do this?
Now you have documented what the function does by giving it a name and you have created a unit that can be tested and applied elsewhere.
For research code, you have a choice here. I would recommend writing the function here.
Yes, markdown files are heavily commented code in some ways. That’s why they’re a difficult medium in which to develop code. They mix concerns.
I completely agree that separate documents for algorithms, etc. are useful. There’s not enough space in code comments to define your new sampling algorithm and prove it satisfies detailed balance.
Another problem is that once you start down the path of pedagogical comments, it’s never clear where to stop. Does the reader know how rngs work in Python? That the numpy ones can be used in scipy? Do you have to explain each obscure tidyverse call with doc on what it does? If I use “melt”, do I have to explain what melting a matrix means?
>most of this advice is for production code that lots of people are going to read.
Ok, that’s what I thought after I wrote my response. I never write that kind of code! I’m sure your advice is great for production code.
>I’m not sure what you mean by the line of reasoning for the code.
Assume that, like I mention in my response, that you are writing code for an analysis of some study, that this code will only be read by you, and that a few months later you will need to go back to this code and modify it for some other experiment. A comment you might write is something like:
target += normal_lpdf(lambda_mu | 7.5, 0.25); // laboratory stock preparation methods insure at least 7 and no more than 8 log10 number of bacteria per 10ul
>Another problem is that once you start down the path of pedagogical comments, it’s never clear where to stop.
Yes, that is true, but I am mostly writing for myself. I suspect a lot of analysts mostly write for themselves. A situation where someone else might read my code is when the study is published and the code is uploaded as a supplemental file. In that case, I always assume that non-coders are also reading the code! So I think it is a good idea to comment out basic steps, because readers might want to get a gist of what was done but are actually incapable of reading the code itself!
That is why I think that your advice really depends on the kind of code you are writing and to whom you are writing to. In general though, I think your advice is excellent, and I do try to write clean and modular code with minimum comments.
100% with jd here on code comments that bring external information into the “why” of code. Why do you want a chi-squared distribution? Are there some research assumptions that ensure the distribution must be nearly chi-squared? Stick those in a comment. That kind of comment doesn’t “lie about what the code does” it says “why did we choose to do this kind of computation?”. It might still lie, but reading the code isn’t going to get you the truth.
I really failed to emphasize enough in the post that I agree with everyone saying commenting depends on the use. It’s that “I always assume that non-coders are also reading the code” which I have trouble with. Does a non-coder know what the
+=operator does?This is precisely the kind of comment that can cause trouble:
The doc seems to say that
lambda_muis constrained to fall between 7 and 8 but the code makes no such guarantee. There’s a 95% chance that a random draw from a normal(7.5, 0.25) is in that range, it does not “ensure at least 7 and no more than 8”. In the context of one statement in a larger model, it’s not even making this probabilistic assertion that there’s a 95% chance the value is between 7 and 8. It’s just applying that as a prior.@Daniel Lakeland: There’s rarely enough room in the code to explain why I’m coding things as I am. Sometimes that will go in an API doc if the what isn’t clear enough, but it’s usually just a pointer to a paper. For example, I just wrote the R-hat and ESS code for BayesKit, and there are a ton of statistical assumptions in the way the functions are defined and also the heuristics used in the estimator. If I tried to explain the why, it would be much longer than the code.
>This is precisely the kind of comment that can cause trouble:
Bob, if you read my response carefully, you will clearly see that this is an example of a comment *to myself* and not to anyone else (see the paragraph just before the example code line: “Assume…this code will only be read by you”). It is *not* an example of a comment to readers who are non-coders. A comment on that line to readers who are not coders might simply say something like, “this is a normal(7.5, 0.25) prior probability distribution on the mean of the laboratory stock solution”. But since the comment was to *me*, it simply states something about the *laboratory stock preparation methods* (not what the prior is ensuring) to remind me why I would use this prior. Notice that the comment does not state that the prior ensures any kind of constraint. It simply talks about the lab methods. It’s simply a reminder as to why I wanted an informative prior there. Again, the comment isn’t about how the code works, it’s simply a note to myself in the margin.
Bob. I think a pointer to a paper is a legit “why” comment. But if that pointer to the paper isn’t there, how do people not intimately familiar with the topic figure out whether you’re doing the “right” calculation or not? They don’t. That’s exactly the kind of thing I’m talking about.
In other instances it might be something like:
a ~ normal(0,3) // although we have no information about the sign of the coefficient a, if it were overall bigger than about 10 it would imply an unstable situation in which frogs replicate out of control and consume all mass in the universe.
or whatever. Just give me some hint at reasons why you’re doing what you’re doing.
Outside of models of scientific processes, this kind of comment would be more likely to discuss something like a choice of algorithm or a choice of initial condition when running an iterative scheme, or discussions about stability of a numerical calculation:
// in order to preserve numerical accuracy we split this into two algebraically equivalent
// cases so the argument to exp is always negative:
invlogt = x > 0 ? (1/(1+exp(-x))) : exp(x)/(1+exp(x))
Why did you do this? Oh, to preserve numerical accuracy and to keep the exponent negative… makes sense.
I defer to Bob on all aspects of coding, but, like commenter Jd above, I find comments to be helpful in my own code. After each class, I post to the students the code that we covered that day, and I don’t think it hurts to have labeled paragraphs, even labels such as
or
I agree that it’s stupid to have obvious comments of the form,
But the only time I ever see that sort of comment is in documentation telling people not to do that sort fo thing.
I also agree that it’s better to encapsulate code into functions rather than putting it in as labeled paragraphs. Maybe I should do that more for the code that I present to students and for the code that I use in my own research. One thing I’ll definitely take from the above post is a reminder to express more of my code as functions.
Andrew says, “But the only time I ever see that sort of comment is in documentation telling people not to do that sort fo thing.”. Andrew doesn’t read a lot of other people’s code. Every time I tell someone this they say the same thing, that people don’t do this.
Without naming names, the following is an excerpt from the the top post at this very second on Stan’s Discourse. Almost every line of code is dcumented with something. Here’s an excerpt, with original vertical and horizontal spacing:
// initialize linear predictor term vector[N] mu = rep_vector(0.0, N); mu += Intercept; for (n in 1:N) { // add more terms to the linear predictor mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n] + r_3_1[J_3[n]] * Z_3_1[n] + r_4_1[J_4[n]] * Z_4_1[n]; }It’s comments like “initialize linear predictor” and “add more terms to the linear predictor” that I’m talking about. These just get in the way. If there wasn’t so much vertical commenting in the code cluttering things up, it’d be clear visually that mu was being used as a linear predictor and at that point.
Also, we can replace this loop with a one-liner in Stan:
Bob – At a glance, I am 99% sure that this is code taken from Paul Burkner’s brms package using the make_stancode() or stancode() function. This function allows users to specify a model in the typical brms (or lme4 or rstanarm) regression style format and then generate Stan code for a Stan model. Those notes are there for readers of the Stan code who use that function. Also, I think the loop is because this code is sort of standardized to be used in many different ways depending on what is specified by the user in the brm() call – i.e. it is meant for maximum flexibility.
If quarto is terrible then what is a not-terrible alternative?
I wish someone would develop one! The fundamental problem is that everyone’s relying on pandoc under the hood to process markdown. Yet they add their own quirks on top, so Quarto is subtly different than R markdown in a way that’s taking us forever to update our doc.
Comments should only be used for ‘why’ or to refer to requirements, specifications, design documents etc to indicate their place in the overall picture
They should never be used to describe what is being done or how, code is a precise and accurate language for doing that (provided the function and variable names are well chosen)
> The best number of inline code comments is zero.
The Tidyverse people disagree. They write:
> In code, use comments to explain the “why” not the “what” or “how”. Each line of a comment should begin with the comment symbol and a single space: #.
Indeed, I don’t think I have ever met a professional (non-academic?) programmer who would agree with your view. (Contrary opinions welcome.) The easiest way to distinguish, at a glance, professional from academic code is that the latter has way fewer comments.
I think that commenting is so important, and so under-utilized, by beginners that I require students to have as many lines of comments, approximately, as they have lines of code.
Your views on bad/useless comments are, absolutely, correct.
But there is always so much to say about the “why!” Why did you do this approach? Which other approaches did you consider? Which approaches failed? What are the biggest remaining weaknesses? What would you have tried next, if you had more time?
Good comments are notes to future You. When you re-visit this code a year or two from now, what do you need to know? Write it down now!
As I tried to say, you can use a comment like this as a last resort if the code isn’t clear itself. Sometimes that goes for the what and how, too, if they’re unusual.
dkane says, “I don’t think I have ever met a professional (non-academic?) programmer who would agree with your view.” We must know a very different circle of professional programmers (by which I roughly mean someone with experience writing production code in a team setting that gets used by a larger group than that team—it doesn’t have to do with academia vs. industry, though it’s very hard to get the right experience in academia because academics don’t generally write production code unless they work on open source systems like Stan).
I’m actually going to have to jump in and defend Bob on this one. The better professional code I deal with is light on comments. And my career has seen a steady decline in “inline” or “in-code” comments. Exceptions tend to be #TODOs, a quick note on something that is, at the moment, necessary but for what would seem like completely arbitrary reasons, or links to related material. And each of those cases is a hint that I should probably revisit the code for improvements. And, in my experience, this is a common development.
I started as a huge fan of Knuth’s idea of “literate code”. It really seemed like an excellent way to force you to think carefully about what you were setting out to do and how you were going to accomplish it. You would describe the details in words and that would constrain your development. And it just never worked that way. Either you end up in a loop of trying to keep your documentation in line with the changes you inevitably have to make in your implementation or you end up with a beautiful set of docs and no code. I can believe that this might have worked for Knuth, but I am definitely not Donald Knuth and I also suspect he didn’t have to produce functional code with quite the same time constraints. For me and almost everyone I’ve worked with, the conclusion is to focus on the code itself as a document that makes its own meaning clear.
I will say that, even though he does mention it in a comment above, and it’s at least implicit in the post, it’s kind of easy to miss that Bob is making a distinction between inline comments and API documentation. You don’t comment a Numpy function because you can read the docs. But those docs are often voluminous and are auto-generated from extensive and carefully written comments for each function or class. And this I absolutely support and have done this myself whenever I’ve written stand-alone packages. Those are cases where clients need to use your code and need to have a clear idea of how to do that and what to expect. But what they need to know is still at the API level. They don’t need to, or probably want to, know every implementation detail. And if that is actually important for their use case (and it certainly can be, especially in the DS/ML/numerical computing world of trust-but-verify) they’ll read the code, where Bob’s comments apply.
Thanks! I also didn’t do a very good job separating research code from production code. Most of what I wrote was aimed at production code, not research code.
I think a lot of things worked for Knuth that wouldn’t work for ordinary people. If you look at Knuth’s code for Metafont (he published it in like 6 hardcover volumes), you get the feeling Knuth is the only person who could have understood that code. I think this is often a problem for people who are really smart and have large short-term memory stacks—their code can wind up being impenetrable to the rest of us.
On very smart people with large short-term memories:
1) Dennis Ritchie was a superb programmer … but he thought Ken Thompson was way above him.
2) I once was in MH Lab 127 terminal room ~10PM. Ken came in, sat down at terminal having written nothing down.
He started typing. I came back next morning, he’d written ~3,000 lines of code, which worked (sorry I don’t recall the program.)
3) We conjecture he had a bigger short-term memory than anyone else we knew, and his code was very clean. In 1973 the way one learned UNIX was by reading the code.
4) There was nothing to kill an idea faster than bouncing it off Ken and getting back a comment that he didn’t understand it.
But for sure, there was always a real difference between research code and production code, including the problem when some research program over in BTL research would be found to be more generally useful and might need some help from software engineering folks to ruggedize it, make it maintainable.
I’m very glad that designing languages and programming in them are two different skills!
Or maybe he had a smaller one? I find programmers who are smarter in the sense of being able to understand and write complicated code have less of a need to make their code readable. It’s the poor people like me who have to carefully structure code to scaffold up to 3000 lines. Then again, with 3K lines of C, you’re just getting scaffolded :-).
The research on short term memory has mostly concluded that people don’t have better memories, they have more structure in their memory. For example, chess masters aren’t better than amateurs at remembering random chess piece positions, but they are way better at remembering positions that correspond to reasonable games (this goes back to de Groot and Simon’s great experiments in the 1950s, and they’ve been replicated widely since then).
P.S. Just saw they’re finally closing the doors at Murray Hill (MH). I was surprised it was still open!
I write (biostatistics) research code in R notebooks/Quarto. All my output is product of the data + my assumptions. I state (and explain/defend) my assumptions in text in the notebook (not in comments in the code). The code alone can’t contain this “why” information for future readers and reviewers.
I find it very painful to develop code in notebooks or Quarto, but I think they’re the best way we currently have to present code. So when I wrote a getting started tutorial for Stan, I did it in Quarto. There I put in lots of comments in things for pedagogical purposes that I’d never include in actual code in a production code base.
Super interesting. Thanks Bob.
If you’d ever like to present a talk version of this post at Toronto Data Workshop please do let me know. We’d love to host it.
An exception to this is when the code is going to be read by students who are still learning how to program. There’s a certain skill associated with reading several lines of code and understanding how they all fit together. Meanwhile, students can read the code and the comments simultaneously and work on developing that skill.
I agree 100%. Bob said “But whatever you do, comment for people who know the language better than you.”, but for pedagogical and research code, the people most likely to be reading and trying to understand your code do not know the language better than you.
After 50 years of writing and reading research code, I have found that comments in the code are the only documentation I can depend on. In an ideal world maybe in-code comments could be bad practice, but for research code, (with occasional pleasant exceptions) any documentation outside the code is usually inadequate, confusing, or lost.
I also agree. In these cases, as I rplied to theCamel above, I like doing this kind of thing in Quarto or Jupyter noteboks. You definitely need to develop the skill of reading code if you’re going to work in a team setting on code other people use.
The problem I find is that I never know where to stop with pedagogical commenting. Does this poor statistician understand recursion? Do they know that Stan lets you apply an array to an array of indices to vectorize? Should I explain how the language works. Do I say that the transformed parameters block in Stan is where transforms of parameters go that get output and should I comment every local variable saying I made it local instead of a transformed parameter so it doesn’t get printed out? I just never know where to stop, so I have to fix some particular user in mind and write to that level as best I can.
My main experience with comments is that they lie. I can depend on the intent statement often matching the intent, but the code often doesn’t match the comments.
Also, as I said above, I think resarch code shared among a bunch of non-professional programmers is a different beast. Still, a lot of professional coding practices can help, but you have to judge whether they’re worth the effort. I find that for me, this often leads to decision paralysis around how solid and robust to make research code.
“do as I say, not as I do”? I think that the only way to teach good coding skills is to write good code – what Bob says above:
“Readable code. Why? It’s self documenting. To be self documenting, code needs to be relatively simple and modular. The biggest mistake beginners make in writing code is lack of modularity. Without modularity, it’s impossible to build code bottom up, testing as you go.”
It’s appalling that any student would walk out of a guest lecture! I found the post very interesting, though I don’t completely agree — I like dkane’s note that “Good comments are notes to future You.” I do agree that Jupyter notebooks foster bad habits — I can’t stand them. When teaching image analysis, I’ve spent a lot of time with students trying to untangle the mess of cells.
At the very least I would expect to see references to specifications or perhaps academic papers where this explains and justifies the code. I believe that classics such as “The Elements of Programming Style” provide information on the use and abuse of comments which is still valid. What I have seen change since this time is the use of automated testing frameworks. I note that where these can serve as examples of the use of the programs they test, they are providing information which, if it had nowhere else to go, might been preserved as comments within the code.
That’s a really good point about unit tests (or something less rigorous like vignettes). They can serve as excellent doc and I sometimes hear programmers say they don’t want to write API doc about things like edge conditions because it’s all implied by the unit tests. Especially at the level of how to put a bunch of low-level tools together.
My coding days are long in the past, but I’m reminded of the most famous comment (well there weren’t many, which was OK, given clarity of code) in Sixth Edition UNIX:
“you are not expected to understand this.”
https://thenewstack.io/not-expected-understand-explainer/
Thanks for this… a nice bit of time travel for me. Bob’s post rings (mostly) true for me. Comments age poorly and often lie in my experience. I like “#Shirley, there is a better way to do this.” once in a while to remind me that it might be worth reconsidering. In R I still feel a sense of panic recalling the comment in the code for model.matrix:
“# Now, recurse your brains out.”
Interesting post. Intriguing and motivating. As other have commented above, I also disagree with some of the points made. But I am always a fan of reading other people’s code to learn betters ways to do stuff. So I went to your github to read some code and see your approach in action. Not sure if I got the github account wrong (github.com/bob-carpenter, isn’t it?) but I did find some examples of what you suggest not to do. Like this file for example (https://github.com/bob-carpenter/anno/blob/master/R/em-dawid-skene.R)
“`
##### EM ALGORITHM #####
### INITIALIZATION
theta_hat <- array(NA,c(J,K,K));
…
### EM ITERATIONS
epoch <- 1;
…
### M step
beta <- 0.01;
pi_hat <- rep(beta,K); # add beta smoothing on pi_hat
“`
The example above is not an attempt at shaming, it just illustrates my inability to find your approach in action. I would really love to see it.
Could you point me to some piece of your code that you think is almost flawless or some code that you're particularly proud of?
Check out the Stan code base, such as the math library:
https://github.com/stan-dev/math
For example, here’s a nice stat function:
https://github.com/stan-dev/math/blob/develop/stan/math/rev/fun/Phi_approx.hpp
You’ll see that there’s doc at the top about what it does, a reference to where the algorithm came from, but zero code comments. We don’t explain what coeffRef(i, j) does in Eigen (returns reference to value, not a copy), that Eigen::Index is a long int, and we don’t even document the magic numbers in the code other than to reference the paper from which they were drawn. We don’t document the general callback pattern for autodiff. You’ll also see that variable names are kept relatively short, which makes the code easier to read.
As I said above in comments, I didn’t do a good enough job in the post of distinguishing between research code and production code. That EM algorithm for crowdsourcing you cite was written in a tutorial way for colleagues around 20 years ago. It would have been better had I taken the time to break it into simpler functions.
For contrast, check out the code I wrote for the same model in Stan for the Stan User’s Guide:
https://mc-stan.org/docs/stan-users-guide/data-coding-models.html
I remember when I was a grad-student tutor at Edinburgh University (a very, very long ago – teaching Lisp (or maybe Prolog – I forget)) and I gave students this advice (because I was convinced by the argument – it’s been around for a while), and I was forcefully reprimanded by a senior academic for doing so. In theory I do see the point, but the argument (a) assumes that most programmers can write code like Brian Kernighan, and (b) does not acknowledge that even in code as good as Brian Kernighan’s, commentary that says _what_ is going on is helpful (necessary). A string of composed algebraic transformations may be a platonically clean extensional description of the computation, but it is not much of a guide to the _why_ . Comment provides a large intentional hint.
It is not that comments are bad, but comments should focus on the intention. I agree that well written code documents its own extension.
Also, personally, I don’t encounter well-written code all that often.
Hi, Sean. I was a Ph.D. student in Edinburgh, too in the mid-80s and did both Lisp and Prolog. In fact, I even wrote a book on logic programming type theory back when I was a professor in the early 90s.
I wouldn’t pay too much attention to most academics on coding. They’re generally pretty bad at it. I say this as an academic who wrote a lot of open source code in the 1980s and 1990s, then went into industry in the 00s. It completely changed my attitude on code, and I thought I was a decent programmer before going into industry.
I think I must have buried the pieces of this post where I recommend thoroughly documenting the API to the level where it is sufficient for formulating a test plan. That’s where you document what something is doing. If the algorithm is so complicated that the function name and code isn’t enough, comments are unlikely to save you. For example, let’s look at something like HMM decoding using the forward-backward algorithm. It’s a relatively simple piece of code, but it’s fairly complex as to what all the pieces mean. I don’t see how it’d be possible to document HMMs and the general forward-backward algorithm in the code itself. Do I assume the reader knows sufficient statistics and is really good at marginalizing and at dynamic programming or do I explain the concept of dynamic programming in the comments in the forward-backward algorithm? This is what I meant by it being a slippery slope once you start trying to get pedagogical with comments.
It’s rare to see well written code in research settings or from academics. I’d say it’s not worthwhile to write research code up to a production standard as that’s not its intended use. Academics aren’t incentivized to write code, they’re incentivized to raise funding and write papers. Given the lack of incentives, they rarely get the experience to develop good coding skills. It’s why the Flatiron Institute, where I work, was founded—to provide a home for people writing academic scientific software and algorithms. It’s also really hard to support a full-time programmer in academia as there’s no way I know of to give them a permanent job—it’s just hand to mouth with grants the way I lived for 10 years at Columbia.
1. Agreed about the incentives. Time is money too, and writing good, production-ready software and doing science are not necessarily comparable. Nor should they be in most cases, I’d argue.
2. I tend to agree also about API documentation, although it must be noted that generally much of API documentation is nowadays pretty useless (generated) garbage too. In this regard, scientific software likely exceeds the quality of other software, given that many people write books about their APIs and implementations, publish tutorials, and there are even journals for showcasing your scientific software implementation.
3. As for code comments: the examples you gave were indeed silly, and many languages (like Stan) but not all (say, R, perhaps) already resemble mathematical notation closely enough. But it all depends on a context: when dealing with huge, highly complex, old, and security-sensitive code bases, such as with those of operating systems, smart code comments are often preferable for fellow and particularly new developers. Someone mentioned referencing relevant (and not only of your own) papers in scientific software; that sounds fairly analogous.
4. A. Then, regarding tests and scientific software, I would very much prefer to see also continuous, automated integration, regression, and deployment tests, as has been common in the industry already for the past decade or so. Particularly regression tests would be important to have, although the widespread use of software repositories should go hand in hand with integration and deployment tests too.
4. B. While most scientific software does not necessarily have to fulfill the many requirements of other fields (such as those related to, e.g., security, system-wise reliability, and customer requirements), it carries its own peculiar requirements. Among these is reproducibility, which in this context includes numerical accuracy, the stability of algorithms already implemented even when new ones are added in new versions, the stability of pre-defined options and tunables across versions, and so forth and so on.
I can only speak to the academic research context, but in my experience, comments in code are quite useful for the QA process. If the coder whose work I’m reviewing has commented “here’s what the following code does” (or better yet, “here’s why it does what it does”), it’s much easier for me to check that the code is doing the right thing (or at least, aligns with their intentions). Thorough documentation fills this role too, but it’s nice to have the comments there when you’re going line by line.
We agree on this, KH. This is what I meant by API documentation. I’m 100% behind writing API doc that is complete enough to form the basis of unit testing. Generally you want to test the API, not line-by-line in the code, because that inline code may change. The beautiful thing about comprehensive unit tests is that they let you completely change the functionality without changing the tests. For instance, I might do something like add caching to an HMM decoder, which makes it more efficient, but doesn’t change behavior.
Every Spring, I use some code to….eh, never mind, doesn’t matter what it’s for and that would just make this post longer. The key point is that I use the code once a year. I wrote the initial version six years ago, and every year a few things change about the code requirements or some details so the code has gradually grown to be able to handle special cases, test what happens if I do things thisaway verus thataway, etc. As a result, at the start of a main program I have a whole bunch of flags I can set. Here’s an example:
By the time I do the final production run, I will have set various flags in various ways to test for sensitivity to the results, and I might decide that some case has to be handled differently from the way I’ve done it previously so I might even add another flag or parameter. (I prefer to do it that way, rather than simply changing things and relying on git versions to keep track of the changes between this year and last year, because I want to be able to look at just the current code and see what is different from what I did last year). Code flagged with XXXPNP makes it easy for me to search and make sure I have these critical settings right, before I do the production run.
In most cases, if I want to understand what the flag actually does I will have to go look at the code, but of course that’s easy to do just by searching on the variable name.
I think a lot of your examples of comments getting in the way are quite good, I’m convinced. But I’m not convinced that MY comments are anything other than extremely useful. Indeed it’s hard for me to imagine getting by without them. Since I only look at this code once a year, I can’t necessarily remember what each of the flags does or even why they are there in some cases, and the parameter names alone don’t always do the trick. I’ll put it this way: I agree with just about everything you say, and I’m happy to take just about any coding advice you would give…but I’m not deleting those comments!
Thanks for the example, Phil. Let’s go through them. All the disbelievers who don’t believe people write comments like I said, here you go:
I would argue that comment is harmful because it leaves the user to wonder why there’s a comment there when the variable is clearly named CURRENT_YEAR. If the “program” is so important, put it in the variable name!
The second comment is redundant.
ENROLLMENT_SCENARIO_YEAR = 2022 # Should be CURRENT_YEAR for production, but use earlier year if we don’t have forecasts yet. XXXPNP if (ENROLLMENT_SCENARIO_YEAR != CURRENT_YEAR) { warning(message = ‘Enrollment year is not set equal to current year.’) }I would argue this code would be much better if you made the warning message equal to the comment and deleted the comment. Also, I had no idea what “XXXPNP” meant from looking at the code. I thought it was a typo until I read further into your post.
Put all these “XXXPNP” things together into a struct where they’re localized and easy to inspect. That way you can tell if they’re consistent and they’re completely set much more easily than searching through code for magic strings.
> Put all these “XXXPNP” things together into a struct
I’m guessing maybe he’s using R, and R doesn’t really *have* structs. Though I suppose it has lists or whatever. I hate R.
Julia is a great language, they’ve made it way faster to start up which was the biggest complaint. I’m able to do all the things I used to do in R in Julia, and it has real structs and real data type libraries and excellent reproducibility features. It compiles to machine code. It’s typically 50-100x faster than R. You can write proper loops and it’s still fast. Worthwhile to use a “real” language.
Bob, the current year is usually _not_ the program year! Right now it is 2024 but I am analyzing program year 2023. (The word ‘program’ in this context does not refer to a computer program, it’s a reference to California’s demand response program). But just recently I reran the analysis for program year 2022, with a slight modification to a statistical model, so I set CURRENT_YEAR to 2022 for that. The real problem here is not the comment, it’s the choice of ‘current’: this should just be ‘PROGRAM_YEAR’. Probably I could change CURRENT_YEAR to PROGRAM_YEAR everywhere, and get rid of the comment. On the other hand, why change anything?
As for XXXPNP, my initials are PNP and I find that the string XXXPNP leaps out at me. This is like TODO except there is not necessarily a TODO item; these are settings I need to check, but may or may not need to change.
I’m not sure what you’re saying about putting all these XXXPNP things into a struct. All those flags are at the top of the main program, and I’ve got XXXPNP to remind me to check the setting of the flags that I need to check before a final run. For instance, I want to make sure I check the WINSORIZE flag. If I can’t remember exactly what that flag does, I search for WINSORIZE and find that somewhere later in the code there’s a line like:
if (WINSORIZE) {
…
}
If this process seems ridiculous to you, because you think I should spend a couple of hours fixing up this code that I use once per year in order to make it theoretically better in some way, even though it works fine for me and it is already easy for me to find what I am looking for, then congratulations, you’re a professional programmer and I am not.
That said, it’s not that I think your advice is bad. Pas de tout. I’m just not going to follow it in this case.
What are you referring to when you mention Knuth’s notes on writing math?
As for the core of the post: Polite but hard disagree. This advice might be useful (albeit still contentious) for competent programmers in certain environments. But most code we read is not written by excellent programmers. And even if it is, we readers are often shoddy programmers.
You mention that research code is often messy, but as long as it’s reproducible it’s okay. Disturbingly, I’ve found that research code is often NOT reproducible. The researchers often don’t care—there’s an attitude of they’ve published the paper and the code is just the appendix stuff, so leave them alone. Given the lack of comments in the code, its messiness, and my lack of expertise, I often don’t know exactly which lines are meant to do what.
In such situations, comments would help a lot. They rarely grow stale, since the code is rarely updated. They don’t lie, because the code encodes the writers’ intentions, not what the code actually does. And comments are usually written in natural language, which its writers are often more competent with than whatever lines of code they’ve thrown together to output the data visualizations or stats that they need.
I mean the first Google hit when you search for [Knuth’s notes on writing math]:
https://jmlr.csail.mit.edu/reviewing-papers/knuth_mathematical_writing.pdf
I don’t think the best way to help “shoddy readers” is to leave breadcrumbs. If you’re reading code and don’t understand what a function does, look up its API doc, don’t rely on the user to have commented it properly in their code.
I didn’t mean to imply that as long as research code is reproducible it’s OK. It’s easy to write something that reproduces the wrong answer! The code needs to be tested for correctness. This isn’t testing on the boundaries of the API and for portability, etc., like production code, but it still needs to be tested.
I agree that research code is often not reproducible. I think in those cases, the developers have failed in their primary goal, which is recording what they did in a way that can be reproduced. What I meant to say is that if there’s reproducible code, at least there’s a chance of tracing back through what was actually done.
I think this is well intentioned, but misguided.
See the examples above on bounds for a variable that aren’t actually enforced! The problem is that the programmer doesn’t test that the execution matches the intent, the reported intentions may be nothing more than wishful thinking. You see that in the example I commented on above about bounded parameters that were in fact, only “softly” nudged toward that range, not constrained to it.
The mismatch between comments and code is too common for me to trust comments. When I see poorly structured code, it doesn’t give me a lot of confidence that the comments will be even better.
LLM are really useful for adding comments to code. I just put code into ChatGPT and say, “please add comments”. It’s just a c+p job back and forth!
You’re doing something very badly wrong. You can easily tell what’s commented out because it will be in a different color than the rest of the code. You mentioned using emacs, but if you’re having trouble recognizing what is and isn’t part of a comment… you can’t actually be using emacs? Syntax highlighting is a feature of absolutely everything you might use to produce any kind of code; it certainly isn’t missing from emacs.
Thanks, Michael—that’s a good point. This isn’t nearly so important now that editors are better. I do use emacs, including with extensions for Stan and all the other languages I use. I still find the color/font signal weak compared to structure/placement and I still find that extensive commenting gets in the way of my understanding the structure of code because it starts separating things if there’s too much of it. But if it’s just on an end-of-line basis and in a different font/color, that’s not nearly so distracting.
>This should be done if you’re doing something non-idiomatic with the language or coding an unusual algorithm or something very involved.
This comment reminded me of the famous Quake III Arena fast inverse square root function definition (link here: https://en.wikipedia.org/wiki/Fast_inverse_square_root).
“`
float q_rsqrt(float number)
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df – ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs – ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs – ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
“`
I’m am a researcher who’s written code for about 10ish years now. I’m starting to write more production-style code these days and I agree with Bob. I agree a lot.