I have a problem that comes up a lot in programming, and I’m sure the experts can help me out. It has to do with giving names to objects that I’m working with. Sometimes the naming is natural; for example, if I have a variable for sex in a survey data, I can define “male” as 1 for men and 0 for women and then continue with names such as “male00” for data in the 2000 survey, “male04” for the 2004 survey, and so forth, or else use a consistent set of names and define things within dataframes. In either case, this is no problem.
What is more of a hassle is coming up with names for temporary varaibles. For example, should I use “i” for every looping index, with “j” for the nested loops, “k” for the next level, and so forth? This can get confusing when referring to array indexes (sometimes you end up needing things like a[j,i]), or should I go for mnemonics, such as “i” to index survey respondents, “s” for states, “e” for ethnic groups, “r” for religious denominations, etc? Sometimes I’ve tried to reserve “j” for the lowest level of poststratification cel, but that isn’t always convenient.
At other times I’ve tried more descriptive names, for example n.state for the number of states (50 or 51, depending on whether D.C. is included) and i.state for state indexes, thus giving loops such as “for (i.state in 1:n.state)” or “for (i.state in states),” where “states” has been pre-defined as the vector of state numbers. This approach is currently my favorite–I tried to stick to something like it in my book with Jennifer–but can also create its own difficulties, as I have to remember the names of all the indexing variables.
Beyond looping indexes, there are all sorts of temporary variables I’m creating all the time; for example, after fitting a multilevel model, pulling out coefficient estimates: “fix <- fixef (M2000.all)." There's always a tension in naming these temporary quantities: on one hand, I don't want meaningless names such as "temp" floating around; on the other, every new variable name is another thing to remember, which is annoying if it's only being used in two lnes of the program. I guess the real solution is to ruthlessly compartmentalize: in R, this essentially means that you turn every paragraph of code into a function and get rid of all globally-defined variables. I haven't always had the discipline to do this, but maybe I should try.