Stan uses Nuts!

We interrupt our usual program of Ed Wegman Gregg Easterbrook Niall Ferguson mockery to deliver a serious update on our statistical computing project.

Stan (“Sampling Through Adaptive Neighborhoods”) is our new C++ program (written mostly by Bob Carpenter) that draws samples from Bayesian models. Stan can take different sorts of inputs: you can write the model in a Bugs-like syntax and it goes from there, or you can write the log-posterior directly as a C++ function.

Most of the computation is done using Hamiltonian Monte Carlo. HMC requires some tuning, so Matt Hoffman up and wrote a new algorithm, Nuts (the “No-U-Turn Sampler”) which optimizes HMC adaptively. In many settings, Nuts is actually more computationally efficient than the optimal static HMC!

When the the Nuts paper appeared on Arxiv, Christian Robert noticed it and had some reactions.

In response to Xian’s comments, Matt writes:

Christian writes:

I wonder about the computing time (and the “unacceptably large amount of memory”, p.12) required by the doubling procedure: 2^j is growing fast with j! (If my intuition is right, the computing time should increase rather quickly with the dimension. And I do not get the argument within the paper that the costly part is the gradient computation: it seems to me the gradient must be computed for all of the 2^j points.)

2j does grow quickly with j, but so does the length of the trajectory, and it’s impossible to run a Hamiltonian trajectory for a seriously long time without making a U turn and stopping the doubling process. (Just like it’s impossible to throw a ball out of an infinitely deep pit with a finite amount of energy.) So j never gets very big. As far as memory goes, the “naive” implementation (algorithm 2) has to store all O(2j) states it visits, but the more sophisticated implementation (algorithm 3) only needs to store O(j) states. Finally, the gradient computations dominate precisely because we must compute 2j of them—NUTS introduces O(2jj) non-gradient overhead, which is usually trivial compared to O(2j) gradient computations.

In summary, if we assume that NUTS generates trajectories that are about the optimal length of the corresponding HMC trajectory (which is hard to prove, but empirically seems not too far off), then NUTS adds a (usually negligible) computational overhead of order O(2jj) and an (acceptable for non-huge problems) memory overhead of order O(j) compared with HMC. These costs scale linearly with dimension, like HMC’s costs.

Trajectory lengths will generally increase faster than linearly with dimension, i.e., j will grow faster than log_2(number_of_dimensions). But the optimal trajectory length for HMC does as well, and in high dimensions HMC is pretty much the best we’ve got. (Unless you can exploit problem structure in some really really clever ways [or for some specific models with lots of independence structure–ed.].)

Christian writes:

A final grumble: that the code is “only” available in the proprietary language Matlab!

Apologies to non-Matlab users! There’s also a c++ implementation in Stan, but it’s not (yet) well documented. I guess I [Matt] should also write python and R implementations, although my R expertise doesn’t extend very far beyond what’s needed to use ggplot2. [Matt doesn’t need to translate into Python and R–anyone who wants to do that should be able to do so, easily enough. And in any case, Stan will have the fast C++ version. Matt was just using Matlab as a convenient platform for developing Nuts.–ed.]

See also Bob’s comments which he posted directly on Xian’s blog.

Stan’s nearly ready for release, and we’re also working on the paper. You’ll be able to run it from R just like you can run Bugs or Jags. Compared to Bugs and Jags, Stan should be faster (especially for big models) and should be able to fit a wider range of models (for example, varying-intercept, varying-slope multilevel models with multiple coefficients per group). It’s all open-source and others should feel free to steal the good parts for their own use. The work is partly funded by a U.S. Department of Energy grant.

World Class Speakers and Entertainers

In our discussion of historian Niall Ferguson and piss-poor monocausal social science, commenter Matt W. pointed to Ferguson’s listing at a speakers bureau. One of his talks is entitled “Is This the Chinese Century?” The question mark at the end seems to give him some wiggle room.

I give some paid lectures myself and was curious to learn more about this organization, World Class Speakers and Entertainers, so I clicked through to this list of topics and then searched for Statistics. Amazingly enough, there was a “Statistician” category (right above “Story Teller / Lore / Art / Power of Story Telling,” “Strategist / Strategies / Strategic Planning,” and “Success”).

There I found Gopal C. Dorai, Ph.D., who offers insights such as “vegetarians usually will not eat meat products, no matter how hungry they feel.” And “Cheating or lying for the sake of obtaining favorable treatment from others will be anathema to some people.” And “Life is a one-way-street; we cannot turn the clock back.”

I’d think a bigshot like Niall Ferguson could do better than that!

P.S. Another one of the categories is “Political / President Ford’s Son.”

P.P.S. I searched on “Paterno” but, no, he wasn’t there.

“Tobin’s analysis here is methodologically old-fashioned in the sense that no attempt is made to provide microfoundations for the postulated adjustment processes”

Rajiv Sethi writes the above in a discussion of a misunderstanding of the economics of Keynes.

The discussion is interesting. According to Sethi, Keynes wrote that, in a depression, nominal wages might be sticky but in any case a decline in wages would not do the trick to increase hiring. But many modern economics writers have missed this. For example, Gary Becker writes, “Keynes and many earlier economists emphasized that unemployment rises during recessions because nominal wage rates tend to be inflexible in the downward direction. . . . A fall in price stimulates demand and reduces supply until they are brought back to rough equality.” Whether Becker is empirically correct is another story, but in any case he is misinterpreting Keynes.

But the actual reason I’m posting here is in reaction to Sethi’s remark quoted in the title above, in which he endorses a 1975 paper by James Tobin on wages and employment but remarks that Tobin’s paper did not include the individual-level decision modeling that we’d usually expect today in an academic economics paper.

Not including microfoundations—that’s actually just fine with me! My impression is that microfoundations in economics and political science are typically a version of long-obsolete psychological models of human cognition and behavior. Such folk psychology can be helpful at times—I myself have written on the rationality of voting, using a simple utility model—but I think it’s a mistake of many social scientists to suppose that these models are in any sense necessary. Rather than seeing microfoundations as supportive of empirical or theoretical economics arguments, I think the opposite: if the economics is strong, the value of the microfoundations is to give insight into the real results.

Greece to head statistician: Tell the truth, go to jail

Kjetil Halvorsen writes:

This should be of interest for the blog: The leader of the Greece national statistics faces prison charges for telling the truth!

I followed the link, and my initial reaction was: Interesting–but I don’t think something appearing at that “Zero Hedge” site can be trusted! Did they ever apologize for this bit of misinformation?

Halvorsen replied:

I don’t know! But we do not need to trust zerohedge, here is financial times, also with more details.

Unfortunately the FT article has some sort of registration barrier, but I take exception to Tyler Durden’s snide remarks about “Banana Republics.”

Historian and journalist slug it out

Apparently I’m not the only person to question some of the political writing in the London Review of Books.

But, the latest fight between author Niall Ferguson (encountered on this blog several years ago) and reviewer Pankaj Mishra (link from Tyler Cowen) is fascinating.

Usually when I see one of these exchanges of letters, it’s immediately clear that one guy has a point and the other guy’s got nuthin. This time the fight was a little messier.
Continue reading

Richard Stallman and John McCarthy

After blogging on quirky software pioneer Richard Stallman, I thought it appropriate to write something about recently deceased quirky software pioneer John McCarthy, who, with the exception of being bearded, seems like he was the personal and political opposite of Stallman.

Here’s a page I found of Stallman McCarthy quotes (compiled by Neil Craig). It’s a mixture of the reasonable and the unreasonable (ok, I suppose the same could be said of this blog!).

I wonder if he and Stallman ever met and, if so, whether they had an extended conversation. It would be like matter and anti-matter!
Continue reading

“To Rethink Sprawl, Start With Offices”

According to this op-ed by Louise Mozingo, the fashion for suburban corporate parks is seventy years old:

In 1942 the AT&T Bell Telephone Laboratories moved from its offices in Lower Manhattan to a new, custom-designed facility on 213 acres outside Summit, N.J.

The location provided space for laboratories and quiet for acoustical research, and new features: parking lots that allowed scientists and engineers to drive from their nearby suburban homes, a spacious cafeteria and lounge and, most surprisingly, views from every window of a carefully tended pastoral landscape designed by the Olmsted brothers, sons of the designer of Central Park.

Corporate management never saw the city center in the same way again. Bell Labs initiated a tide of migration of white-collar workers, especially as state and federal governments conveniently extended highways into the rural edge.

Just to throw some Richard Florida in the mix: Back in 1990, I turned down a job offer from Bell Labs, largely because I didn’t want to live in its bucolic suburban location. Otherwise it was my dream job. If they’d been located in NYC, I might still be working there today.

Tenure lets you handle students who cheat

The other day, a friend of mine who is an untenured professor (not in statistics or political science) was telling me about a class where many of the students seemed to be resubmitting papers that they had already written for previous classes. (The supposition was based on internal evidence of the topics of the submitted papers.) It would be possible to check this and then kick the cheating students out of the program—but why do it? It would be a lot of work, also some of the students who are caught might complain, then word would get around that my friend is a troublemaker. And nobody likes a troublemaker.

Once my friend has tenure it would be possible to do the right thing. But . . . here’s the hitch: most college instructors do not have tenure, and one result, I suspect, is a decline in ethical standards.

This is something I hadn’t thought of in our earlier discussion of job security for teachers: tenure gives you the freedom to kick out cheating students.

Note to student journalists: Google is your friend

A student journalist called me with some questions about when the U.S. would have a female president. At one point she asked if there were any surveys of whether people would vote for a woman. I suggested she try Google. I was by my computer anyway so typed “what percentage of americans would vote for a woman president” (without the quotation marks), and the very first hit was this from Gallup, from 2007:

The Feb. 9-11, 2007, poll asked Americans whether they would vote for “a generally well-qualified” presidential candidate nominated by their party with each of the following characteristics: Jewish, Catholic, Mormon, an atheist, a woman, black, Hispanic, homosexual, 72 years of age, and someone married for the third time.

Between now and the 2008 political conventions, there will be discussion about the qualifications of presidential candidates — their education, age, religion, race, and so on. If your party nominated a generally well-qualified person for president who happened to be …, would you vote for that person?

Yes, would
vote for

No, would not
vote for

%

%

Catholic

95

4

Black

94

5

Jewish

92

7

A woman

88

11

Hispanic

87

12

Mormon

72

24

Married for the third time

67

30

72 years of age

57

42

A homosexual

55

43

An atheist

45

53

The Republican frontrunner, former New York City mayor Rudy Giuliani, is Catholic, and Illinois Sen. Barack Obama, currently running second in the Democratic nomination trial heats, is black. Americans express little hesitation about putting a person with either of those backgrounds in the White House — 95% would vote for a Catholic candidate for president and 94% would vote for a black candidate.

Rudy Giuliani as frontrunner, huh? Talk about a blast from the past. America’s mayor, indeed.

To be serious for a moment, though: Numbers like that make it clear how little information is in these survey questions. As cognitive psychologists have learned in their research, people tend to set their general attitudes aside when confronted with particular cases. For example, 42% of respondents said they would not vote for a generally well-qualified 72-year-old. Given that McCain did about as well as might be expected given his party and economic conditions, it’s hard to believe that he was starting off the election 42 points in the hole. Similarly, I don’t take seriously the idea that 24% of Americans would not vote for a Mormon or that 53% would not vote for an atheist.

P.S. Yes, a graph would be better than a table. I copied-and-pasted the table. If you want to pay me to write this blog, I’ll make a graph for you.

P.P.S. I’m not trying to be mean or sarcastic here. If you’re a journalist, it can be great to interview an expert. But you’ll get a lot more out of the interview if you google yourself up to speed first.

Always check your evidence

Logical reasoning typically takes the following form:

1. I know that A is true.
2. I know that A implies B.
3. Therefore, I can conclude that B is true.

I, like Lewis Carroll, have problems with this process sometimes, but it’s pretty standard.

There is also a statistical version in which the above statements are replaced by averages (“A usually happens,” etc.).

But in all these stories, the argument can fall down if you get the facts wrong. Perhaps that’s one reason that statisticians can be obsessed with detail.
Continue reading

Of hypothesis tests and Unitarians

Xian, Judith, and I read this line in a book by statistician Murray Aitkin in which he considered the following hypothetical example:

A survey of 100 individuals expressing support (Yes/No) for the president, before and after a presidential address . . . The question of interest is whether there has been a change in support between the surveys . . . We want to assess the evidence for the hypothesis of equality H1 against the alternative hypothesis H2 of a change.

Here is our response:

Based on our experience in public opinion research, this is not a real question. Support for any political position is always changing. The real question is how much the support has changed, or perhaps how this change is distributed across the population.

A defender of Aitkin (and of classical hypothesis testing) might respond at this point that, yes, everybody knows that changes are never exactly zero and that we should take a more “grown-up” view of the null hypothesis, not that the change is zero but that it is nearly zero. Unfortunately, the metaphorical interpretation of hypothesis tests has problems similar to the theological doctrines of the Unitarian church. Once you have abandoned literal belief in the Bible, the question soon arises: why follow it at all? Similarly, once one recognizes the inappropriateness of the point null hypothesis, it makes more sense not to try to rehabilitate it or treat it as treasured metaphor but rather to attack our statistical problems directly, in this case by performing inference on the change in opinion in the population.

I like the line about the Unitarian Church, also the idea of hypothesis testing as a religion (since people are always describing Bayesianism as a religious doctrine).

Going Beyond the Book: Towards Critical Reading in Statistics Teaching

My article with the above title is appearing in the journal Teaching Statistics. Here’s the introduction:

We can improve our teaching of statistical examples from books by collecting further data, reading cited articles and performing further data analysis. This should not come as a surprise, but what might be new is the realization of how close to the surface these research opportunities are: even influential and celebrated books can have examples where more can be learned with a small amount of additional effort.

We discuss three examples that have arisen in our own teaching: an introductory textbook that motivated us to think more carefully about categorical and continuous variables; a book for the lay reader that misreported a study of menstruation and accidents; and a monograph on the foundations of probability that over interpreted statistically insignificant fluctuations in sex ratios.

And here’s the conclusion:

Individually, these examples are of little importance. After all, one does not go to a statistics textbook to learn about handedness, menstruation, and sex ratios. It is striking, however, that the very first examples I looked at in the Zeisel and von Mises books – the examples with interesting data patterns – collapsed upon further inspection. In the Zeisel example, we went to the secondary source and found that his sketch was not actually a graph of any data, and that he in fact misinterpreted the results of the study. In the von Mises example, we reanalysed the data and found his result to be not statistically significant, thus casting doubt on his already doubtful story about ethnic differences in sex ratios. In the Utts and Heckard example, we were inspired to collect data on handedness and look at survey questions on religious attendance to find underlying continuous structures.

You can do it yourself!

These are examples that I’ve encountered during the past twenty years of teaching. The real message I want to send, though, is that you can do it yourself. Anything you read, you can check, for example this implausible (and, indeed, false) claim by a public health expert that “Consumption [of chicken] in the US has increased . . . a hundredfold between 1934 and 1994.” (It actually increased by a factor of six.)

Textbooks are commonly written in an authoritative style, but that doesn’t mean everything in them is correct. You can learn a lot by going back to the original source of the data, and even running the occasional chi-squared test of your own!

Progress for the Poor

Lane Kenworthy writes:

The book is full of graphs that support the above claims. One thing I like about Kenworthy’s approach is that he performs a separate analysis to examine each of his hypotheses. A lot of social scientists seem to think that the ideal analysis will conclude with a big regression where each coefficient tells a story and you can address all your hypotheses by looking at which predictors and interactions have statistically significant coefficients. Really, though, I think you need a separate analysis for each causal question (see chapters 9 and 10 of my book with Jennifer, follow this link).

Kenworthy’s overall recommendation is to increase transfer payments to low-income families and to increase overall government spending on social services, and to fund this through general tax increases.

What will it take for this to happen? After a review of the evidence from economic trends and opinion polls, Kenworthy writes, “Americans are potentially receptive to a more generous set of social programs, but their demand for it is far from overwhelming.”

P.S. See Kenworthy’s blog for related material and here for our paper on income inequality and partisan voting.

Don’t judge a book by its title

A correspondent writes:

I just want to spend a few words to point you to this book I have just found on Amazon: “Understanding The New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis” by G. Cumming. I have been attracted by the rather unusual and ‘sexy’ title but it seems to be nothing more than an attempt at alerting the psychology community on considering point estimation procedures and confidence intervals, in place of hypothesis testing, the latter being ‘a terrible idea!’ in the author’s own words.

Some more quotes here. Then he says: “‘These are hardly new techniques, but I label them ‘The New Statistics’ because using them would for many researchers be quite new, as well as a highly beneficial change!’”

Of course the latter is not stated on the book cover.

That’s about as bad as writing a book with subtitle, “Why Americans vote the way they do,” but not actually telling the reader why Americans vote the way they do.

I guess what I’m saying is: Not everybody can write a good book title. Even the great John Updike and Gore Vidal couldn’t manage it, most of the time.

No no no no no

I enjoy the London Review of Books but I’m not a fan of their policy of hiring English people to write about U.S. politics. In theory it could work just fine but in practice there seem to be problems. Recall the notorious line from a couple years ago, “But viewed in retrospect, it is clear that it has been quite predictable.”

More recently I noticed this, from John Lanchester:

Republicans, egged on by their newly empowered Tea Party wing, didn’t take the deal, and forced the debate on raising the debt ceiling right to the edge of an unprecedented and globally catastrophic US default. The process ended with surrender on the part of President Obama and the Democrats. There is near unanimity among economists that the proposals in the agreed package will at best make recovery from the recession more difficult, and at worst may trigger a second, even more severe downturn. The disturbing thing about the whole process wasn’t so much that the Tea Partiers were irrational as that they were irrationalist: they were consciously pursuing a course of action which made no economic sense, as part of a worldview which is essentially theological [emphasis added]. They know that everyone else knows that they truly don’t care about the consequences of their actions, and the prospect of the Tea Party wing being in government is truly frightening. ‘Sane Republican’ is not an oxymoron, not yet – but we’re heading that way.

Huh? The Tea Party activists have several goals, #1 of which is to unseat Obama in 2012, and one step of that goal is to shoot down any stimulus plans that might juice the economy between now and then. So it’s not at all “irrational” (let alone “irrationalist”) for them to pursue a strategy which, in Lanchester’s words, “will at best make recovery from the recession more difficult, and at worst may trigger a second, even more severe downturn.”

You can also think about it tactically. By refusing to compromise, the conservative Republicans got the Democrats to give in.

Or you can take the long view. Conservative Republicans would like a long-term balanced budget with low inflation and low taxes on the rich. With that as a goal, it’s not unreasonable to fight any expansion of spending on items they do not support.

I’m not saying you have to agree with Republican politicians or Tea Party activists here; it just seems silly to describe them as irrational. They just have goals which are much different from Lanchester’s (and, for that matter, from many Americans).

Validation of Software for Bayesian Models Using Posterior Quantiles

I love this stuff:

This article presents a simulation-based method designed to establish the computational correctness of software developed to fit a specific Bayesian model, capitalizing on properties of Bayesian posterior distributions. We illustrate the validation technique with two examples. The validation method is shown to find errors in software when they exist and, moreover, the validation output can be informative about the nature and location of such errors. We also compare our method with that of an earlier approach.

I hope we can put it into Stan.

Tempering and modes

Gustavo writes:

Tempering should always be done in the spirit of *searching* for important modes of the distribution. If we assume that we know where they are, then there is no point to tempering. Now, tempering is actually a *bad* way of searching for important modes, it just happens to be easy to program. As always, my [Gustavo’s] prescription is to FIRST find the important modes (as a pre-processing step); THEN sample from each mode independently; and FINALLY weight the samples appropriately, based on the estimated probability mass of each mode, though things might get messy if you end
up jumping between modes.

My reply:

1. Parallel tempering has always seemed like a great idea, but I have to admit that the only time I tried it (with Matt2 on the tree-ring example), it didn’t work for us.

2. You say you’d rather sample from the modes and then average over them. But that won’t work if if you have a zillion modes. Also, if you know where the modes are, the quickest way to estimate their relative masses might well be an MCMC algorithm that jumps through them.

3. Finally, pre-processing to find modes is fine, but if pre-processing is so important, it probably needs its own serious algorithm too. I think some work has been done here but I’m not up on the latest.

Lack of complete overlap

Evens Salies writes:

I have a question regarding a randomizing constraint in my current funded electricity experiment.

After elimination of missing data we have 110 voluntary households from a larger population (resource constraints do not allow us to have more households!). I randomly assign them to threated and non treated where the treatment variable is some ICT that allows the treated to track their electricity consumption in real tim. The ICT is made of two devices, one that is plugged on the household’s modem and the other on the electric meter. A necessary condition for being treated is that the distance between the box and the meter be below some threshold (d), the value of which is 20 meters approximately.

50 ICTs can be installed.
60 households will be in the control group.

But, I can only assign 6 households in the control group for whom d is less than 20. Therefore, I have only 6 households in the control group who have a counterfactual in the group of treated. To put it differently, for 54 households in the control group, the overlap assumption is violated because these 54 households could never have been treated. I am correct to say this?

Please, could you send me to a paper on Program Evalution/Causal Inference that addresses such issue? Should I discard the 54 households who could not be treates (due to the distance constraint). This would be unfair in such a small trial.

My response:

I don’t know of any references on this (beyond chapters 9 and 10 of my book with Jennifer). My quick answer is that you should model your outcome conditional on the treatment indicator and also this distance variable. If distance doesn’t matter at all, maybe you’re ok, and if it does matter, maybe a reasonable model will correct for the non-overlap. Or maybe you’ll be able to keep most of the data and just discard some cases with extreme values of the predictor.