Ben writes:
Starting with the 2.13 release, it is much easier to use external C++ code in a Stan program. This vignette briefly illustrates how to do so.
He continues:
Suppose that you have (part of) a Stan program that involves Fibonacci numbers, such as
functions { int fib(int n); int fib(int n) { if (n <= 0) reject("n must be positive"); return n <= 2 ? 1 : fib(n - 1) + fib(n - 2); } } model {} // use the fib() function somehow
On the second line, we have declared the
fib
function before it is defined in order to call it recursively.For functions that are not recursive, it is not necessary to declare them before defining them but it may be advantageous. For example, I often like to hide the definitions of complicated utility functions that are just a distraction using the
#include "file"
mechanismfunctions { real complicated(real a, real b, real c, real d, real e, real f, real g); #include "complicated.stan" // defines the above function } model {} // use the complicated() function somehow
This Stan program would have to be parsed using the
stanc_builder
function in the rstan package rather than the defaultstanc
function (which is called bysampling
andstan
internally).Returning to the Fibonacci example, it is not necessary to define the
fib
function using the Stan language because Stan programs with functions that are declared but not defined can use the standard capabilities of the C++ toolchain to provide the function definitions in C++. For example, this program produces a parser error by defaultmc <- ' functions { int fib(int n); } model {} // use the fib() function somehow ' try(stan_model(model_code = mc, model_name = "parser_error"), silent = TRUE)
## SYNTAX ERROR, MESSAGE(S) FROM PARSER:
##
## Function declared, but not defined. Function name=fib
##
## ERROR at line 2
##
## 1:
## 2: functions { int fib(int n); }
## ^
## 3: model {} // use the fib() function somehow
##
However, if we specify the
allow_undefined
andincludes
arguments to thestan_model
function, and define afib
function in the named C++ header file, then it will parse and compilestan_model(model_code = mc, model_name = "external", allow_undefined = TRUE, includes = paste0('\n#include "', file.path(getwd(), 'fib.hpp'), '"\n'))
Specifying the
includes
argument is a bit awkward because the C++ representation of a Stan program is written and compiled in a temporary directory. Thus, theincludes
argument must specify a full path to the fib.hpp file, which in this case is in the working directory. Also, the path must be enclosed in double-quotes, which is why single quotes are used in the separate arguments to thepaste0
function so that double-quotes are interpreted literally. Finally, theincludes
argument should include newline characters ("\n"
) at the start and end. It is possible to specify multiple paths using additional newline characters or include a “meta-header” file that contains#include
directives to other C++ header files.The result of the
includes
argument is inserted into the C++ file directly after the following lines (as opposed to CmdStan where it is inserted directly before the following lines)#include <stan/model/model_header.hpp> namespace some_namespace { using std::istream; using std::string; using std::stringstream; using std::vector; using stan::io::dump; using stan::math::lgamma; using stan::model::prob_grad; using namespace stan::math; typedef Eigen::Matrix<double,Eigen::Dynamic,1> vector_d; typedef Eigen::Matrix<double,1,Eigen::Dynamic> row_vector_d; typedef Eigen::Matrix<double,Eigen::Dynamic,Eigen::Dynamic> matrix_d; static int current_statement_begin__; // various function declarations and / or definitions
Thus, the definition of the
fib
function in the fib.hpp file need not be enclosed in any particular namespace (which is a random string by default. The “meta-include” stan/model/model_header.hpp file reads as#ifndef STAN_MODEL_MODEL_HEADER_HPP #define STAN_MODEL_MODEL_HEADER_HPP #include <stan/math.hpp> #include <stan/io/cmd_line.hpp> #include <stan/io/dump.hpp> #include <stan/io/reader.hpp> #include <stan/io/writer.hpp> #include <stan/lang/rethrow_located.hpp> #include <stan/model/prob_grad.hpp> #include <stan/model/indexing.hpp> #include <boost/exception/all.hpp> #include <boost/random/additive_combine.hpp> #include <boost/random/linear_congruential.hpp> #include <cmath> #include <cstddef> #include <fstream> #include <iostream> #include <sstream> #include <stdexcept> #include <utility> #include <vector> #endif
so the definition of the
fib
function in the fib.hpp file could utilize any function in the Stan Math Library (without having to prefix function calls withstan::math::
), some typedefs to classes in the Eigen matrix algebra library, plus streams, exceptions, etc. without having to worry about the corresponding header files. Nevertheless, an external C++ file may contain additional include directives that bring in class definitions, for example.Now let’s examine the fib.hpp file, which contains the C++ lines
int fib(const int&n, std::ostream* pstream__) { if (n <= 0) { stringstream errmsg; errmsg << "n must be positive"; throw std::domain_error(errmsg.str()); } return n <= 1 ? 1 : fib(n - 1, 0) + fib(n - 2, 0); }
This C++ function is essentially what the preceding user-defined function in the Stan language
int fib(int n) { if (n <= 0) reject("n must be positive"); return n <= 2 ? 1 : fib(n - 1) + fib(n - 2); }
parses to. Thus, there is no speed advantage to defining the
fib
function in the external fib.hpp file. However, it is possible to use an external C++ file to handle the gradient of a function analytically as opposed to using Stan’s autodifferentiation capabilities, which are slower and more memory intensive but fully general. In this case, thefib
function only deals with integers so there is nothing to take the derivative of. The primary advantage of using an external C++ file is flexibility to do things that cannot be done directly in the Stan language. It is also useful for R packages like rstanarm that may want to define some C++ functions in the package’s src directory and rely on the linker to make them available in its Stan programs, which are compiled at (or before) installation time.In the C++ version, we check if
n
is non-positive, in which case we throw an exception. It is unnecessary to prefixstringstream
withstd::
because of theusing std::stringstream;
line in the generated C++ file. However, there is no correspondingusing std::domain_error;
line, so it has to be qualified appropriately when the exception is thrown.The only confusing part of the C++ version of the
fib
function is that it has an additional argument (with no default value) namedpstream__
that is added to the declaration of thefib
function by Stan. Thus, your definitionof thefib
function needs to match with this signature. This additional argument is a pointer to astd::ostream
and is only used if your function prints something to the screen, which is rare. Thus, when we call thefib
function recursively in the last line, we specifyfib(n - 1, 0) + fib(n - 2, 0);
so that the output (if any, and in this case there is none) is directed to the null pointer.This vignette has employed a toy example with the Fibonacci function, which has little apparent use in a Stan program and if it were useful, would more easily be implemented as a user-defined function in the
functions
block as illustrated at the outset. The ability to use external C++ code only becomes useful with more complicated C++ functions. It goes without saying that this mechanism ordinarily cannot call functions in C, Fortran, R, or other languages because Stan needs the derivatives with respect to unknown parameters in order to perform estimation. These derivatives are handled with custom C++ types that cannot be processed by functions in other languages that only handle primitive types such asdouble
,float
, etc.That said, it is possible to accomplish a great deal in C++, particularly when utilizing the Stan Math Library. For more details, see The Stan Math Library: Reverse-Mode Automatic Differentiation in C++ and its GitHub repository. The functions that you declare in the
functions
block of a Stan program will typically involve templating and type promotion in their signatures when parsed to C++ (the only exceptions are functions whose only arguments are integers, as in thefib
function above). Suppose you wanted to define a function whose arguments are real numbers (or at least one of the arguments is). For example,mc <- ' functions { real besselK(real v, real z); } model {} // use the besselK() function somehow '
stan_model(model_code = mc, model_name = "external", allow_undefined = TRUE, includes = paste0('\n#include "', file.path(getwd(), 'besselK.hpp'), '"\n'))
Although the Stan Math Library (via Boost) has an implementation of the Modified Bessel Function of the Second Kind, it only supports the case where the order (
v
) is an integer. The besselK.hpp file reads astemplate <typename T0__, typename T1__> typename boost::math::tools::promote_args<T0__, T1__>::type besselK(const T0__& v, const T1__& z, std::ostream* pstream__) { return boost::math::cyl_bessel_k(v, z); }
because, in general, its first two arguments could either be integers, known real numbers, or unknown but real parameters. In the case of unknown real numbers, Stan will need to rely on its autodifferentiation mechanism to keep track of the derivative with respect to those arguments during estimation. But if either of the first two arguments is an integer or a known real number, then Stan avoids taking derivatives with respect to them. Thus, it is useful to utilize C++ templates that can generate all (four in this case) versions of this Bessel function with only a few lines of C++ source code. The first line of besselK.hpp states that the first two arguments are going to be templated with typenames
TO__
andT1__
respectively. The second line is convoluted but merely states that the return type of thebesselK
function depends onTO__
andT1__
. In short, if either is an unknown real parameter, then the result will also be an unknown real parameter. The third line contains the generated signature of the function in C++, which involves the typenamesTO__
andT1__
, as well as the pointer to astd::ostream
(which is again not used in the body of the function). The body of thebesselK
function is simply a call to the coresponding function in the Boost Math Library, whose headers are pulled in by the Stan Math Library but theboost::math::
prefix is necessary due to the absence of ausing boost::math;
statement.An easy way to see what the generated function signature will be is to call the
stanc
function in the rstanpackage withallow_undefined = TRUE
and inspect the resuling C++ code. In this case, I first didtry(readLines(stanc(model_code = mc, allow_undefined = TRUE)$cppcode))
## Warning in file(con, "r"): cannot open file '// Code generated by Stan version 2.14 ## ## #include <stan/model/model_header.hpp> ## ## namespace model79d3b505e97_mc_namespace { ## ## using std::istream; ## using std::string; ## using std::stringstream; ## using std::vector; ## using stan::io::dump; ## using stan::math::lgamma; ## using stan::model::prob_grad; ## using namespace stan::math; ## ## typedef Eigen::Matrix<double,Eigen::Dynamic,1> vector_d; ## typedef Eigen::Matrix<double,1,Eigen::Dynamic> row_vector_d; ## typedef Eigen::Matrix<double,Eigen::Dynamic,Eigen::Dynamic> matrix_d; ## ## static int current_statement_begin__; ## ## template <typename T0__, typename T1__> ## typename boost::math::tools::promote_args<T0__, T1__>::type ## besselK(const T0__& v, ## const T1__& z, std::ostream* pstream__); ## ## class model79d3b505e97_mc : public prob_grad { ## private: ## public: ## model79d3b505e97_mc(stan::io::var_context& context__, ## std::ostream* pstream__ = 0) ## : prob_grad(0) { ## typedef boost::ecuyer1988 rng_t; ## rng_t base_rng(0); // 0 seed d [... truncated]
to see what function signature needed to be written for besselK.hpp.
When using external C++ code, care must be taken to ensure that the function is numerically stable over a wide range of floating point numbers. Indeed, in the case of the
besselK
function, the derivative with respect to the order argument (v
) may not be sufficiently stable numerically. In general, it is best to strip the underlying double-precision numbers out of Stan’s custom scalar types, evaluate the desired function, and then calculate the necessary derivatives analytically in an object whose class that inherits from thevari
class in Stan. The details of doing so are beyond the scope of this vignette but are discussed in the links above. Once you go to the trouble of writing such a C++ function, we would welcome a pull request on GitHub to include your C++ function in the Stan Math Library for everyone to benefit from, provided that it can be licensed under the 3-clause BSD license and its use is not overly-specific to your particular Stan program.The Stan Math Library is compliant with the C++11 standard but currently does not utilize any features that were introduced by the C++11 standard nor does it require a compiler that is compliant with the C++11 standard. However, almost any modern C++ compiler is compliant, so you can use (at least some subset of the) features that were introduced by the C++11 standard in external C++ code and your Stan program should compile (perhaps with some warnings). In particular, you may want to use the
auto
keyword to avoid having to learn a lot of the messy type-promotion syntax used in the Stan Math Library and the rules for what kind of object is returned by various mathematical operations. For example, under the C++11 standard, thebesselK.hpp
file could be written astemplate <typename T0__, typename T1__> auto besselK(const T0__& v, const T1__& z, std::ostream* pstream__) -> decltype(v + z) { return boost::math::cyl_bessel_k(v, z); }
where the
auto
keyword combined with-> decltype (v + z)
results in the same code as the Boost metaprogramtypename boost::math::tools::promote_args<T0__, T1__>::type
.