This post is by Bob Carpenter.
Stan is Turing complete!
There seems to a persistent misconception that Stan isn’t Turing complete.1, 2 My guess is that it stems from Stan’s (not coincidental) superficial similarity to BUGS and JAGS, which provide directed graphical model specification languages. Stan’s Turing completeness follows from its support of array data structures, while loops, and conditionals.
What is a probabilistic programming language?
A free, early-access chapter of:
Avi Pfeffer. 2014 partial draft. Practical Probabilistic Programming. Manning Publications Co.
neatly sums up (on p. 9) the emerging definition of “probabilistic programming language”:
Representation language: a language for encoding your knowledge about a domain in a model
Probabilistic Programming Language: A probabilistic representation language that uses a Turing-complete programming language to representation [sic] knowledge.
The next paragraph continues with
Although probabilistic representation languages with programming language features have been developed that are not Turing-complete, like BUGS and Stan, this book focuses on Turing-complete languages.
And yes, I’ve let the author know Stan’s Turing complete on the Manning feedback forum, and he’s already responded that he’ll fix the next draft.
I’m also curious why Frank’s tutorial on probabilistic programming lists Stan on page 7 under the heading “Application Driven” rather than the heading “Probabilistic Programming Language”.
Stan’s really an imperative language for specifying a log posterior (or penalized likelihood function if you want to roll that way) up to an additive constant. Bayes’s rule shows that it’s sufficient to specify the log joint up to a constant. We then translate to a C++ class that reads in data in the constructor and provides the log posterior as a function of the parameters as a method in such a way that it can be differentiated at first (or higher) order. We then use this class for inference with MCMC or for point estimation with L-BFGS or for letting users explore their own inference algorithms.
If a taxonomist forced me to split, I’d say Stan is an imperative probabilistic programming language, Pfeffer’s language, Figaro, is an object-oriented probabilistic programming language, and Church is a functional probabilistic programming language.
Expressive in practice or in theory?
I’d also like to discuss the following claim, on page 9 of Pfeffer’s pre-release book version, and echoed in every other intro to probabilistic programming I’ve seen:
Naturally, representation languages with greater expressive power will be able to capture more knowledge, making them more useful.
The true measure of usefulness for a user is how easy it is to express the class of models the user cares about and how efficiently, scalably, and robustly the software can fit the model. Assembly language is Turing-complete and can encode a pseudo-random number generator, but that doesn’t make it “able to capture more knowledge” than BUGS in any practical sense.
Mark Chu-Carroll has an insightful discussion on his Good Math/Bad Math blog of Turing equivalence and Turing completeness and what it’s good for.
What we’re really after is modularity. Stan’s going to add user-defined functions in version 2.3.0, which will be handy. For one thing, that should make it clear that Stan’s Turing equivalent. But functions are not not going to solve our modularity problem.
By modularity, I mean the ability to express a Bayesian “submodel” directly in the language. For instance, I might drop in a hierarchical logistic regression module with a configurable set of hyperpriors and then re-use it in many models. But I can’t do that in Stan as it will be in version 2.3.0, because that would require somehow exporting variables for parameters and data from the function to the global environment so that our inference algorithms, like HMC and L-BFGS, can see them jointly with all other parameters. Maybe in Stan 3.
Suggestions welcome, as always!
1 Informally speaking, a computable system of defining functions is Turing complete if it can compute any computable function. All of the standard programming languages (C, Java, Python, R, Lisp, etc.) are Turing complete, as are Turing machines and general recursive functions and untyped lambda calculus and first-order logic (the connections to Goedel’s incompleteness theorem run through this connection).
2 Technically speaking, no modern computer is Turing complete because there is a finite physical bound on available storage, whereas the full definition requires unbounded storage. So we squint a little and pretend we have infinite amounts of storage.