Would it be possible to print to PDF and put the PDF up, just as a quick way to make the inline formulas available? I am really looking forward to reading this post in full.

]]>I understand this will become moot for your model once you remove the unused parameters, but might be relevant for other models. Coincidentally, I ran your original model and also those two variations, and the plots were practically the same.

]]>James Savage: Actually, in my example, I in the end used a smoothing pass to sample from the states, so that all information available today is used and the sampler thus targets exactly the same thing as the direct modeling approach. There would be an interpretation problem with using the filtering distributions of the KF (those using information only upto t): when the parameters are given priors (as in: not fixed), information from future leaks into the filtering distributions via the parameters anyway. Because the ‘filtering distribution’ of the sampler would be p(state | information only upto t, parameters) but integrated over the parameters using the distribution p(parameters | all information available today). This construct does not have any reasonable interpretation that I can think of.

]]>To me, the question I have today is how much variability has there been in public opinion over the past year or so, and for that it’s relevant to use bi-directional inferences.

]]>I experimented with a somewhat similar model — different parameters/priors, simulated data, no application, and I have only one “poll” per time instant, but based on this it should be straightforward to use the same trick with the poll-averaging-model, too.

Code: https://github.com/juhokokkala/kalman-stan-randomwalk

Writeup: http://www.juhokokkala.fi/blog/posts/kalman-filter-style-recursion-to-marginalize-state-variables-to-speed-up-stan-inference/

plainly extracted from their Poissonian character, so it neglects any systematic effect. As you say, even though the values quoted by the

polling companies (the square root of the number of interviews) are already large, I have the strong impression that they have to be much much

larger. This is, in essence, what happened in the last December elections. Almost all the polling companies were wrong because they were

underestimating their uncertainty. I have plans to see whether it is possible to infer some asymmetry in the sampling distribution. It is

a well-known fact (at least in Spain) that people voting to right wing parties tend not to answer that in the polls, while the opposite happens

for left-wing voters. Additionally, it is also well-known that abstention is more volatile in left-wing parties. ]]>

All great stuff. ]]>

Looking at the source, I think it corresponds to \mathcal{N} in MathML. Is there any widespread practice where the private-use char U+E23A corresponds to a calligraphic N, or is our MathML (or whatever) bugged? I stumbled upon a \mathcal{N} in Stack Exchange and apparently it was rendered correctly. (I think ideally \mathcal{N} should be U+1D4A9, MATHEMATICAL SCRIPT CAPITAL N).

]]>A better job than my effort (he accounts for bias in polling firms), and again, all in Stan. Unfortunately it didn’t predict the outcomes very well!

]]>Fair enough. That sounds entirely reasonable.

]]>But somehow I’ve gotten to be quite skeptical of the “it behaves in the way that we want our model to behave” approach.

Perhaps I am wrong.

]]>For example, suppose you have a model for how the kinetics of some complex chemical reaction works. You’d want to encode that model as accurately as possible to correspond to what you think is going on, so that the inferences about the unobserved quantities are as accurate as you can get them. It’s a little like focusing a microscope. Why bother focusing the microscope if the thing you’re trying to see is smaller than the resolving power? Well, focusing it is going to give you the best chance of inferring the position of the tiny particle even though the particle will be too blurry to get a perfect position estimate.

]]>These distinctions are in the manual under data types somewhere. The manual is QUITE comprehensive, actually one of the better ones I’ve seen for any Free Software project.

]]>But, you could for example plot spaghetti plots of individual state-space paths and see whether one or the other model does or doesn’t conform to your beliefs about realistic movements of the state space. For example if you see individual paths wiggling up and down on short time scales but you think things should be more stable, that would be a diagnostic tool you could use to help you specify the model better.

]]>See chapter 7 of BDA3 or my papers with Aki on Waic and Loo.

]]>If yes, I’d be very curious to see this done.

]]>Maybe it will. Maybe it will make it worse. But how can we tell without some sort of predictive validation?

And I rarely see that part of the predictive validation loop in the Bayesian models I see posted. And decision-theory even more rarely.

]]>vector[T] mu

any different from

real mu[T]

Any reason why the notation for declaring size is different from vector to real?

]]>?

]]>Thanks for posting this counterexample illustrating when order matters.

Are there any “non-weird” i.e. realistic examples where order matters (within the model block)? I skimmed through the manual & couldn’t locate any yet. Just curious.

]]>You quantify uncertainty in your predictions, you can include prior knowledge if you want to (not including any is a form of prior knowledge), and you can fit the amount of smoothing you apply rather than hack it. You can also calculate event probabilities (e.g., probability that Clinton wins), and make predictions into the future with the time series model. It gives you the right information for making decisions (e.g., betting on outcomes, making plans to move to Canada, etc.).

]]>Daniel Lakeland’s syntax is outdated (and direct increment of `lp__` is no longer supported). The current syntax is as follows

`y ~ foo(theta);`

and

`target += foo_lpdf(y | theta);`

The only difference is that the former drops normalizing constants that only depend on data and constants (i.e., don’t depend on variables declared as parameters, transformed parameters, or local variables in the model block in the Stan program). Also, use `_lpmf` if foo is a probability mass function (pmf) instead of density function (pmf).

You’re interpretation of shrunken_polls and mu is what I had in mind. And yes, I just used a random walk for the unobserved state for simplicity. I don’t much like random walk state, as they have funny implications for the (unbounded) range of values you consider to be possible in the future. And, as you say, a given draw of the state will look like a random walk (and we probably think it moves slower). So I guess I’d prefer something more structural as the state.

Cool idea on the GP state. I’ll have a play! Though that might make it more difficult to jointly model Trump and Clinton, which is a cinch in the current setup.

]]>y ~ normal(0,sigma) means we find it less and less plausible that y/sigma would be more and more distant from 0 and beyond a distance of 3 or so it becomes dramatically implausible.

That’s the notion of Bayesian logic in Cox/Jaynes theory. Plausibility is assigned by our knowledge, and is a real number that sums up across all the possibilities to equal 1.

]]>the ~ operator describes a probabilistic fact about the object on the left hand side. It says that the probability you are assigning (or maybe “assuming” is the better terminology) to the value of the left hand side object being x is dx times the value of the distribution function on the right evaluated at x.

as an example, if y is a parameter

y ~ normal(0,1); says that your information about what the value of y is is that it has probability dnormal(yy,0,1) * dy of being within infinitesimal distance dy of yy.

(where now dnormal is R notation for the normal density function, not Stan notation)

this doesn’t modify the value of y in any direct way, what it does is it modifies the dynamics of the sampler so that the sampler ensures this probabilistic constraint is included in the calculation.

the thing on the left doesn’t need to be just a data value, or just a parameter value, it can actually be any expression containing data and parameters. It’s up to you to understand what that means, and declare “true” facts (that is, to create a good model of reality). I started some mild holy wars on the Stan mailing list about this topic. I called the distinction “declarative vs generative”. If you put only data or only parameters on the left you’ve got a “generative” model, where you imagine that the values are “generated by” sampling from the distribution on the right. If you put some other expression on the left, you’ve got a declarative model, where you’re declaring that the transformed quantity on the left has a certain probabilistic constraint on its values. In any case, the fact that you can put an expression on the left hand side makes it clear that this isn’t an assignment to a variable, it’s a declaration of a probabilistic fact.

]]>“Apply a smoothing filter to shrunken_polls” is more or less what “mu” is. The specific smoothing filter is that mu[i] is simultaneously (soft) constrained to be within 0.25 of mu[i-1] and to be within “tau” of shrunken_polls[i]

having “mu” be a weighted average of shrunken_polls for several days in either direction would be more or less the gaussian process I am suggesting above.

]]>