Migrating from dot to underscore

My C-oriented Stan collaborators have convinced me to use underscore (_) rather than dot (.) as much as possible in expressions in R. For example, I can name a variable n_years rather than n.years. This is fine. But I’m getting annoyed because I need to press the shift key every time I type the underscore.

What do people do about this? I know that it’s easy enough to reassign keys (I could, for example, assign underscore to backslash, which I never use). I’m just wondering what C programmers actually do. Do they reassign the key or do they just get used to pressing Shift?

P.S. In comments, Ben Hyde points to Google’s R style guide, which recommends that variable names use dots, not underscore or camel case, for variable names (for example, “avg.clicks” rather than “avg_Clicks” or “avgClicks”). I think they’re recommending this to be consistent with R coding conventions.

I am switching to underscores in R variable names to be consistent with C. Otherwise we were running into difficulties because Stan, following C, does not allow dots in variable names. I don’t want to have a variable that’s called sd.y in R and sd_y in Stan. Much easier to have the same name in both. We don’t want to be changing Stan’s rules (too much of a mess given that Stan is written in C++) so I have to change my R conventions. Then once I switch to underscores for variables that go into Stan models, I’m inclined to be consistent and use underscores throughout.

30 thoughts on “Migrating from dot to underscore

  1. For the most part, we get used to pressing Shift and use an editor with completion support so that we usually never have to type most of the variable name anyways.

  2. I press shift and never think twice about it. Hmm. Maybe I should. (Of course, I don’t usually write C for a living.)

    Chuck

  3. I just press shift, but I hate underscores because they are hard reach and I have to move my hands. I instead use camel case so n.years becomes nYears. It is still a shift key, but it is almost automatic since I use caps when writing papers. I think I picked camel case when I started programming in Java. (Remapping the backslash key would be death for me because it is used so much in LaTex and for control characters like \n.)

  4. Could you convince us as well? I mean, why should we use underscores instead of dots? Honestly interested in the answer.

  5. In C, for commonly used functions, use emacs abbrevations, which allow you to type the function name without underscores or spaces, and the abbrevation expands it. Also, we use shift key. One spends so much time pressing Ctrl that Shift does not seem like a burden.

  6. The advantage of underscores is that your variable names can be used as variable names in a much wider variety of languages. At least, that’s what I can come up with. I’m curious what convinced Andrew.

  7. My biggest keyboard headache (but I have a latin american keyboard, with ñ) is (mostly in laTeX), to press
    \ ~ ^ (needs alt gr + own key, and must use two fingers on same hand). Any good tips for solving that?

  8. The dots really confused me about R when I first saw some code. I naturally assumed these were properties of an object. I guess I’m partial to camel case, coming from the Java world.

  9. This caused me headaches with reading R code, and still does. A dot looks like it is accessing a property or a method. I must know a dozen languages well, and using a dot as part of a variable name is just counter-intuitive for me. I used to use underscores, but now I prefer camel case after writing a lot of objective-c.

  10. Why can’t stan support “sd.y”? It’s just a string, after all. If you’re writing code directly in C++ then sure you have to use an underscore, but really you can just represent all of these as some specific data type and then have labels attached to them (thinking in terms of calling stan from R, for instance). Either way, you’ll soon become comfortable with pressing two keys at once. :)

  11. I got used to using dots when starting R, but really underscores are better. They’re easier for non-R programmers to read, they don’t cause ambiguity with S3 methods, and as you say, they’re better for compatibility with other languages. The only reason underscores aren’t used more in R is that years ago they weren’t allowed. The reason they weren’t allowed is that underscore was a synonym for the assignment operator. The reason underscore was a synonym for the assignment operator is that on old ASR33 teletype machines, the ASCII code for underscore was actually printed as a backarrow. Really, this isn’t a good reason to keep using dots in variable names.

    As for CamelCase, it’s just an abomination. Plus you never can remember whether the first character is in upper or lower case (people do both).

  12. Dots are for object/structure components in C so n.years would be the years component of an n structure. I would say it is most common to eliminate redundant characters entirely and use nyears or nYears and keep them local to the function or subroutine. Global variables are a different story and you might want to lead with some designation where defined or define a structure to hold them, though underscores are often used in system interfaces.

  13. My ears were burning, so I thought I’d clarify our reasoning.

    The main reason we didn’t want to use dots in R parameter identifiers because we wanted the RStan interface to have variable names for options that matched those in Stan. We didn’t want function names with dots because of the issue with S3 methods (as pointed out to us by Ben Goodrich and above by Radford).

    Because we wrote Stan in C++, we used C++ conventions.

    Python, C++ and Java all forbid dots in variable and function identifiers because of confusibility with method calls on objects. A typical requirement for variables is that they start with a letter or underscore and may be continued with any number of letters, digits or underscores.

    I’m OK with camel case. It’s the standard in Java. For some reason, lots of people find it jarring. The C++ community doesn’t even like capital letters for class names, though you see it in some packages like Eigen.

    The biggest bummer about underscores is that in emacs’s R mode, it tries to convert them to “<-". Personally, I prefer "=" to "<-" because it matches other programming languages, but it's the convention in R. The top-row keys always do get short shrift in typing class, but underscores become second nature after decades of practice. And who notices typing shift except on the sub-optimal iPad virtual keyboard?

    • People with most European keyboards notice typing shift. The left shift key is usually shorter and requires more finger contortion.

    • R uses arrow assignment because equal sign is reserved for passing something to a function’s argument.

      For example:

      print(x = “hello world”) passes the string to the internal variable x.

      print(x <- "hello world") assigns the string to variable x in the global environment and passes the variable to the function.

      • You can use the equals sign for assignment in R. Even in the print() function.

        > print(y = 7)
        Error in print.default(y = 7) : argument “x” is missing, with no default
        > print((y = 7))
        [1] 7

        I wouldn’t advise it, though.

      • i prefer using = for the assignment operator. but is your comment the rationale for using <- instead of = in R? can someone comment more on this?

  14. If you’re using ESS in Emacs to edit R code, you can get rid of the confusing underscore behavior as follows:

    1. M-x customize-variable RET ess-S-assign RET
    2. Change the variable’s value from ” <- " to "_".
    3. Choose to "Save for Future Sessions" your customization.

    And then you're back to typing underscores as underscores :-)

  15. As a programmer who became a statistician, going from C++ to R (and SAS), I have come to prefer camelCase, followed by underscore. I never use dot in a name — it just feels wrong, for all the reasons mentioned above (what were the R developers thinking when they thought dot was a good idea?). I prefer camelCase to underscore mainly because it yields shorter and more readable variable names, and the variable names work in most any language that I’m aware of. Occasionally I’ll use a blend of camelCase and underscore if it improves readability.

    I also
    {
    like
    {
    to
    {
    indent
    like
    }
    }
    this,
    }
    for vivid clarity of the blocking structure. Unfortunately, R constrains my options in this regard.

    Gosh, remember early versions of languages like BASIC where we were limited to variable names of 8 alphanumeric characters or less, in all-caps? I’m still getting used to arbitrarily long variable names.

    • Yikes. I value my vertical space too much to devote whole lines to open curly braces.

      I hope we can all agree to just say “no” to tabs in code (other than Python and make, of course, speaking of “what were their designers thinking?”).

      I prefer camel case myself to underscores, but it’s not really done in C++, and style of punctuation/spelling is no place to innovate.

      In the R/S developers’ defense, R/S have been around longer than C++ or Java (since 1976, according to the Wikipedia).

Comments are closed.