Maurits Van Wagenberg writes:
Coming from the traditional side, started to use Bayes, quickly limiting it to models with less variables, notwithstanding the lure. Am not in academics but have for many years researched design processes of complex objects such as engineering complex process plants. These processes have a lead-time from 12 to 18 months.
Aim was to check on development of variables that could indicate derailment of process. Felt comfortable in using posterior to update new prior, a week later, especially two months into the process.
This winter, was asked to look into a new group of design projects where my concept failed. My previous body of knowledge was limited as were empirical data at hand.
Started to look at your approach (in your Bayesian Data Analysis 3rd and presentations, incl. your French presentations).
My question is: could I find more time-series related examples?
Any suggestions? Our forthcoming Bayesian Econometrics in Stan book should have a few such examples, although this person’s application area seems a bit different. I know that some of you out there work on engineering problems so maybe you have some thoughts for him.
I’m in the process of writing a series on my blog about Bayesian timeseries analysis – the intended audience is programmer types looking to do more data analysis. First post is up:
https://www.chrisstucchio.com/blog/2016/has_your_conversion_rate_changed.html
Second post about analyzing stock prices with MCMC is about half done, hopefully will go up next week.
Also have a related post which is just a one-off, not strictly part of the series.
https://www.chrisstucchio.com/blog/2016/bayesian_calibration_of_mobile_phone_compass.html
Maybe this is useful? It’s hard to tell exactly what he’s looking for.
Can think of a few examples:
– The Bayesian VAR literature – see Litterman’s work in the 80s including the Minnesota Prior specification for p order lags.
– Bayesian ARIMAs – including discussion on prior specification and how to enforce properties like stationarity (truncated normals etc)
– The changepoint detection literature
– Papers on causal impact (eg http://projecteuclid.org/euclid.aoas/1430226092)
Also, as a subfield, financial time series is full of different types of Bayesian models – as simulation methods are often the direct way to attack non-linear models such as stochastic volatility models and non-linear state space models. One of the main concerns in this field (historically at least) is how to sample effectively from the latent states which are usually highly dependent (and so naive simulation methods are ineffective). Particle filters (and more recently PMCMC) rose as a way to tackle this problem; but offline solutions also exist (e.gz . H.Rue’s argument is that time series shouldn’t always crave for a filtering-type solution).
These are all great examples. The concern I have with some of the early stuff is that it was conceived during a time when computation was hugely limiting (on the sorts of specifications that could be estimated). It’s still really important to understand these models, but now we have Stan! There is also the issue that many applications of the Litterman/Minnesota prior employ the data in specifying the prior, which is a bit naughty.
I recommend highly this paper by Koop and Korobilis, which talks through many multivariate Bayesian models. A very gentle introduction, if a little dated (inverse Wishart priors etc).
https://ideas.repec.org/p/rim/rimwps/47_09.html
weather in central park as analyzed by Jayne is an interesting application
http://www.hep.fsu.edu/~wahl/phy5846/statistics/jaynes/pdf/cc17h.pdf
I sound like a broken record about this, but my talk about electric load in buildings is about a Bayesian time series model. I haven’t written it up as a Stan example yet, nor am I making the code available quite yet. Soon, though, since any Stan user can recreate it from the talk anyway I hope.
Forthcoming special issue of Psychonomic Bulletin and Review has a paper on Bayesian longitudinal modeling.
I can’t see anything related in the online first articles, possibly those aren’t an exhaustive list of soon-to-be published articles?
They’re a little farther down the line, I’m afraid. End of 2016, I’m guessing.
I worked on designing complex chemical process plants. Though never on predicting project failures.
Would be curious to see more of your model details if you can post any. I never knew of any good models of this. It used to be all heuristics and intuition. Sounds quite challenging.
Here’s a writeup of a simple time series model for survey data that I implemented in Stan:
http://kevinsvanhorn.com/2014/05/17/time-series-modeling-for-survey-data/
It doesn’t give the Stan code, but the translation is straightforward.
I’ve done a number of other simple time series models in Stan, specifically, dynamic linear models, that I could write up if there’s any interest:
* Local level with trend.
* Local level with trend and seasonality.
* Time-varying tobit model (local level with trend).
* Univariate linear regression where the intercept varies over time.
Would definitely be interested in seeing the Stan code please. Available yet, Kevin?
I work for Adobe now and no longer have access to that code. Sorry.
This brings to mind something I’ve been thinking about. I have some data on a large number of entities that can have any of a discrete set of states, where an entity’s state can change from one time step to the next.
The simplest model is to treat the data as a large number of independent Markov chains: infer a transition matrix T, where T[i,j] is the probability that an entity in state i is in state j on the next time step.
p ~ Dirichlet(alpha);
count[i,j,t] ~ Multinomial(n[i, t], p);
where n[i,t] is the number of entities in state i at time t, and count[i,j,t] is the number of these that are in state j on the next time step.
This turns out not to fit the data well; a better fit is an overdispersed multinomial:
alpha <- alpha0 * p0;
p[t] ~ Dirichlet(alpha);
count[i,j,t] ~ Multinomial(n[i, t], p[t]);
where p0 is a simplex (probability vector) and alpha is a positive scalar controlling the degree of overdispersion. I've implemented thismodel in Stan.
Going farther, if you plot the empirical probabilities, you see some clear trends and seasonality, so it might make sense to make p0 a time varying latent variable:
p0[t+1] ~ Dirichlet(alpha_evolve * p0[t]);
where parameter alpha_evolve controls the rate of time evolution. This is a lot like a dynamic linear model, except that the measurement model is an overdispersed multinomial and the state evolution is via a Dirichlet distribution instead of a multivariate normal distribution.
This paper from Steve Scott and Hal Varian, Predicting the Present with Bayesian Structural Time Series, seems relevant. Here’s the link to an ungated copy:
http://people.ischool.berkeley.edu/~hal/Papers/2013/pred-present-with-bsts.pdf
Well, an early Bayesian time series tracker is, of course, the Kalman-Bucy filter. It’s been used for more than half a century.
See https://en.wikipedia.org/wiki/Kalman_filter.
Bob
I’m not sure if I understood the email properly, but to my mind the research problem is:
– Maurits has many high-frequency (longitudinal) variables that may indicate the derailment of a process.
– He observes the derailment of a process fairly infrequently (?) Let’s say it’s the last observation for each individual process.
– He wants to make a real-time estimate of the probability of the process being derailed.
If this is the case, then the low-to-high-frequency-interpolation literature is probably a good start. I put together a little example in Stan using a univariate state-space model for this sort of problem. The changes I’d make to model my interpretation of Maurit’s problem would be a) turn the measurement model into a logit, b) implement it longitudinally, with the last observation for each process having the measurement for success or derailment (0 or 1). (I didn’t know it when I wrote the following, but the term “Nowcasting” is actually trademarked by Now-Cast, who do some very good work).
https://github.com/khakieconomics/nowcasting_in_stan
Look at:
https://github.com/sinhrks/stan-statespace
As the ReadMe says:
Reproducing “An Introduction to State Space Time Series Analysis” using Stan (this is the Durbin-Koopman book on state-space models). All sorts of examples of Bayesian analysis of time series, and even better, all done in Stan.
Might want to look at West and Harrison’s textbook Bayesian Forecasting and Dynamic Models
Andrew, could you give us some more info on the Bayesian econometrics book? When will it be published?
Thanks!
Christian:
I don’t know, we have to write it first!
“Our forthcoming Bayesian Econometrics in Stan book should have a few such examples…”
Cool, but what about Regression and Other Stories?
Gary Koop has some nice information on Bayesian VAR’s and such:
http://personal.strath.ac.uk/gary.koop/bayes_matlab_code_by_koop_and_korobilis.html
Edward Greenberg in his book “Introduction to Bayesian Econometrics” derives some time series models as well:
http://www.amazon.com/Introduction-Bayesian-Econometrics-Edward-Greenberg-ebook/dp/B00A8ICIBS/ref=sr_1_1?ie=UTF8&qid=1460084556&sr=8-1&keywords=edward+greenberg+bayesian
I second the idea that the Bayesian Econometrics book should come before sleeping and eating :)