The golden rule of code layout is that code should be written to be readable. And that means readable by others, including you in the future.

Three principles of naming follow:

1. Names should mean something.

2. Names should be as short as possible.

3. Use your judgement to balance (1) and (2).

The third one’s where all the fun arises. Do we use “i” or “n” for integer loop variables by convention? Yes, we do. Do we choose “inv_logit” or “inverse_logit”? Stan chose “inv_logit”. Do we choose “complex” or “complex_number”? C++ chose “complex”, as well as choosing “imag” over “imaginary” for the method to pull the imaginary component out.

Do we use names like “run_helper_function”, which is both long and provides zero clue as to what it does? We don’t if we want to do unto others as we’d have them do unto us.

P.S. If the producers of *Silicon Valley* had asked me, Winnie would’ve dumped Richard after a fight about Hungarian notation, not tabs vs. spaces.

fabs

Floating point absolute value for those who aren’t C++ or Stan coders. In retrospect, I see I made a mistake in just following C++ function naming conventions in Stan. I should’ve just went with:

I still plan to deprecate `fabs` and go back to that. Part of the reason we didn’t do that was the C99/C++03 conflicts that have been better sorted in C++11 and our getting better at traits metaprogramming.

Instead, what we have in Stan now is

But the middle one’s not necessary anywhere.

I hadn’t quite anticipated how hard users trained in R would find the distinction between integers and floating-point value, which is fundamentally baked into our CPUs.

it’s baked into the CPU but it’s mathematically completely artificial. Real numbers contain the integers as a subset. people want real numbers, the closest they get is float64.

The natural numbers (0, 1, …) and integers (…, -1, 0, 1, …) are very natural mathematically. So much so there’s a commonly used mathematical notation (N and Z). Their algebra’s different than that of real numbers. For example integers aren’t closed under addition, making the expression “1 / 2” a thorny issue. In C++, you get integer rounding, and the result is 1; in R, if you evaluate “1 / 2” you get 0.5.

But the real problem here isn’t what the ideal mathematician would do, but rather what a programmer in 2020 has to do on their computer to get code to work efficiently and transparently (and by transparently, I mean according to the IEEE 754 spec, which is where the float64 behavior is defined).

> integers aren’t closed under addition, making the expression “1 / 2” a thorny issue

pretty sure you meant aren’t closed under *division*, but yes.

I guess what I meant was that a typical “modeler” is thinking about real numbers, which the integers are a subset of. So if you already admit the entirety of the real line into your concept of “number”, then there’s no distinction about “the integers are special”. But the CPU makes this distinction. The distinction is artificial as far as a person working with real numbers is concerned. It doesn’t “help” them in any way mathematically, on the other hand, computationally it’s entirely a different set of instructions and different speed.

If someone is programming, they have to learn at least a little bit about these things, that’s just the way it is. I used to prefer interpreted languages, but I despise R so much I have come around to prefering strongly typed languages.

Type casting can have catastrophic consequences

https://around.com/ariane.html

Just be glad for IEEE 754.

1) The SAS folks had to work very hard to get the exact same results on IBM mainframes, DEC VAX machines, and IEEE &54 microprocessors.

2) When we were doing SPEC benchmarks ~1989, we had to use fuzzy comparisons & algorithms whose iteration counts didn’t vary drastically or depend on least significant bit. One benchmark dropped out because it gave noticably different results for VAX, Motorola 68K machines and RISC micros (754).

You may recall regarding PL/I, “The product exhibits some lovely features but, unfortunately, the expression

25 + 1/3

yields 5.33333333333333″

https://web.archive.org/web/20200201104326/https://plg.uwaterloo.ca/~holt/papers/fatal_disease.html

Obmention of Leopold Kronecker’s quote (or at least attributed quote): “God made the integers; all else is the work of man.”

This is what makes programming in R such a pain. So many different function names for doing the same thing. If I want to change the values of a column I have to use “transmute”. If I want to change the values of a list I have to use use “map” or one of the other “map_” variants. If I want to change the values of levels I have to use fct_recode. If I want to change the values of column names I have to use df.colnames = c(‘a’, ‘b’, ‘c’). I’d rather have one poorly named function that works everywhere, than 5 well named functions that work only on specific types.

Note that this isn’t the R language or even the standard library, but rather the tidyverse. Base R does most of that via indexing for values and names() for names.

Obmention of Phil Karlon’s quote: “There are only two hard things in computer science: cache invalidation and naming things.”

And the long form of this quote: “There are only two hard things in computer science: cache invalidation and naming things and off-by-one errors.”

Here are the variable names I used in a set of R functions I came up with this past weekend. I think I probably get a failing grade from Bob…

covV

pd

typ

dcv

iir

rwQuant

fsQuant

Qual

covS

InclV

amb

Xamb

x

y

InclH

PRI

InclS

VeriCov

vp

expr

lastrow

res1V

res1H

res1

res2V

res2H

res2

resH

resV

stats

veriplot

whiteout

Although in fairness, the two most egregious ones (x and y) were chosen because they are the de facto standard variables in the literature for the calculations I was implementing.

If you use autocomplete and name “backwards” (specific to general):

subsubgroupname_subgroupname_groupname_supergroupname

autocomplete is your friend and length doesn’t matter that much.

In R, I’ve tended to extend the convention that dataframes are called something.df. So a gam model is something.gam and a nlme model is something.nlme. Since R is all about objects then it’s nice to know to have the type of object in the name.

In surveys/questionnaires/medical charts, I got taught the convention of having the question number as the start of the name. So, question 4 in section B which asked about diabetes would become B4_diab. But that was back when SAS only allowed 8 letters in a variable name but I think that is still useful even when variable names can be longer.

We have one survey dataset in SAS with somewhere upwards of 1,700 variables (it’s a merge of multiple surveys, each repeated at several points in time in a “wide” format). If we still had 8-character variable name restriction in SAS I’m not sure what the heck we’d be using for names.

@mpledger: That’s Hungarian notation for R.

@jim: I should have clarified that I meant for writing programs, not interactive scripting in a REPL environment like R’s or Python’s.

Research code done for exploration is a bit diffeent than that done for a submitted paper, but neither demand the kind of naming effort required for a code base with a couple dozen developers.

For reference, it ight be easy to type this with autocomplete:

but it’s hard to see the basic structure compared to:

I always have to look up

multiply_lower_tri_self_transpose

Richard dumped Winnie.

D’oh. I think I was fooled by the tone, or at least my recollection of it, in which he stumbles down the stairs in shame. The show really nails that furtive, outsider feeling.

When I was teaching R to my friend, she thought I was just being crass when the function cumsum came up.