I did know that in the real paper the prior is not directly on log(PM2.5) but felt that it would be helpful for people to see the kind of modeling that can go into choice of priors, so I abstracted that away for the comments here.

However, when it comes to setting priors on parameters that indirectly affect another measurement, I do think we can do well by using a “pseudo-likelihood” function. That is, imagine you have a,b,c with some kind of simple order of magnitude priors:

a ~ normal(10,5);

b ~ normal(10,5);

c ~ normal(10,5);

But through some function you have a fourth quantity that is physically meaningful about which you have prior information.

d = f(a,b,c)

In the Stan language there’s nothing that prevents you from calculating d in the transformed parameters block, and then in the model block doing:

d ~ gamma(5,4.0/10);

or whatever is meaningful for the physical information. The idea is, you know that d needs to be something in the vicinity of 10 so you reweight your prior on a,b,c by a function that downweights the regions of a,b,c space when they result in d being well outside the range you expect. The prior is now implicitly a multidimensional correlated one, but if your information is primarily in d space, this prior should be much more effective than the generic ones on the individual coefficients.

]]>1) In the paper, the prior is not directly on log(PM2.5) but on model parameters. It is more difficult to set prior on several parameters so that it would be constrained in some range for the outcome.

2) You justified constrained range by avoiding “wacky” initial values. With your constraint, due to transformation, the initial values by default in Stan would be from range [-7.85, 5.85]. Without constraint the default initial values are from range [-2, 2]. ]]>

All that being said, the *real* concern here is that my back of the envelope calculation suggesting that 10 to 12 is a reasonable quantity completely fails to cover the range of the actual data which is 1-5 !!!

I think the problem is that I was using parts per million as what I understood the airnow values to be, but in fact they are of course micrograms/m^3 for pm2.5 (they are parts per billion for ozone for example so it’s always *really important* to check the units of measurement), so my calculations are off because I was multiplying by density of air to get micrograms/m^3

Always check the sensibility of your basic math first. Issues like whether or not you’re putting a hard boundary out pretty deep in the tail of your distribution come second ;-)

So, correcting my math: log(1000) giving the high end of pollution = 6.9 and -10 as the lower bound, with 50 ug/m^3 being a typical value during the day in Pasadena so log(50)=3.9 we can set our prior as something like

real<lower=-10,upper=8> logpm25;

logpm25 ~ normal(4,2);

and now going back to the graph we’ll find that all the data is in fact somewhere between 1 and 5, and the tail at either -10 or 8 does not enter into the calculation so whether it’s got a hard bound or not won’t really matter (I only do it as I say because if it gets initialized to 15 or something things could go very wrong and take a long time to get initialized)

]]>I think hard constraints are generally a mistake. Instead of bounding a parameter between A and B, better to give it a normal prior with mean (A+B)/2 and sd (A-B)/2. I say this both for computational reasons and modeling reasons.

Two advantages of the normal rather than uniform prior:

1. Partial pooling toward the center of the range.

2. No pathological behavior at the ends of the range.

> I had not seen that paper!

So you missed Xi’ans and my comment it in 2013

https://xianblog.wordpress.com/2013/11/21/hidden-dangers-of-noninformative-priors/

As to why it’s not in a prominent technical journal, I know the answer to that: prominent technical journals have no interest in publishing this sort of work, so you put it where you can.

]]>real<lower=-11,upper=15> logpm25;

]]>ln(3.6e-5) = -10

so now in Stan we can do something like

real logpm25;

Now if we leave this uniform it’s relatively uninformative, all it does is say “we’re somewhere between the smallest conceivable non-zero value and the highest you can breathe without dying rapidly of smoke inhalation (1000ppm).”

But beyond that, we know it should be typically something like 10ppm to 200ppm with median around say 50ppm because we live in LA and watch the airnow.gov so our kids don’t stay out at summer camp too long in bad air. 200ppm is something like 12 on the log scale, and 20 ppm is 10, so let’s do

normal(11,2)

having done all this we now need to ask if something went wrong, because although it seems that 11 is a reasonable pm2.5 number corresponding to 5.9e4 micrograms/m^3 and density of air is 1.2e9 micrograms/m^3 so that we’re talking 50ppm which is a typical mid-day september reading near Pasadena… the observed levels in Dan’s paper are log(pm2.5) of say 3 or exp(-8)= 3.3e-4 times the mass density typical on a good day where I live.

which suggests a units issue, like using cm^3 instead of m^3 or kg instead of g or something, or maybe he’s taking readings in an industrial clean room?

]]>Wouldn’t it be better to label both axes with “Log (PM_2.5)”, followed by “[ground measurement]” / “[satellite data estimate]” / “[simulated satellite data estimate]”?

]]>Next let’s look at the airnow.gov site where they state 500ppm smoke is beyond the range of air quality and into hazardous to life… So we can go a factor of 1000 smaller ln(1.2e6)=14

So now I think we’ve finally got to an uninformative prior, in that we’ve only ruled out the completely impossible.

]]>Plots often are good at making the implications of poor priors obvious, but my guess almost anyone who has done Bayesian analyses in real applications has been blinded sided by this issue. Wonder what percent of the time it was noticed (within and over analysts)? On the other hand, there did seem to be a reluctance to simulate and plot priors, which has hopefully past.

Unlike your claim of it being hazardous to breath concrete, I can point to some empirical evidence here – Hidden Dangers of Specifying Noninformative Priors. John W. Seaman III, John W. Seaman Jr. & James D. Stamey The American StatisticianVolume 66, Issue 2, May 2012, pages 77-84. http://amstat.tandfonline.com/doi/full/10.1080/00031305.2012.695938

Now why did this only? arise in “amstat” journal rather than a more prominent technically oriented stats journal and only in 2012?

(To date about 40 citations, so not totally being ignored.)

I think this is also a really good way of reminding people that their inferences are a blend of prior information and information in the data, so that if you don’t have much data, you need more prior information to make reasonable inferences. In the extreme case of not having collected any data, you get the prior predictive distribution.

]]>This happened to Susanna and me with the Poisson distribution once! We were setting up a prior distribution for a model involving survey weighting—I don’t remember the details, but I think it’s related to what’s in this paper), and we were having some computing difficulties, and then we did some prior predictive simulation and we realized that our prior for the log of the rate parameter was way too vague; we were getting some simulations where the rates were all things like 0.001. The discreteness of the Poisson distribution implies that the scale of the prior can really matter. It’s not like the normal distribution this way.

Also this was a good example of the folk theorem.

]]>