integrate(P(Data | Parameters) P(Parameters) U(Outcome(Parameters)), Parameters)

The p(Data|Parameters) is a description of how you think the world works… but the p(Parameters) U(Outcome(Parameters)) is a function of parameters, and the two components aren’t “identifiable”

So, the right way to think about this issue is I think to realize that you’re not “picking a prior” but rather choosing a risk function which is both a prior and a utility multiplied together… And if your Utility has a strange form like “the number of nonzero parameters needs to be exactly N” then you’ll expect all of it to depend on lots of things that a pure inference about the truth wouldn’t depend on.

]]>One way to get out of this particular problem is to throw the “sparsity” part of the model to some sort of utility function and treat sparsity as a decision analysis. While I’m completely and totally happy with that (as I said in a previous comment), that’s not the soil on which the Bayesian Lasso was built. And I think it’s very important to meet the proposed method where it is.

This is not the only case where parameters in the prior need to be scaled based on outside information in order for them to have the interpretation that they are intended to have. But that discussion is too long for a comment, which probably means I’ll blog about it at some point.

]]>Regarding your comment, “A legitimate prior doesn’t change as you add new observations: in Bayes’ rule, the factor for the prior is the same regardless of what data you observe or how much data you collect”:

Not necessarily. See this recent article with Simpson and Betancourt, “The prior can often only be understood in the context of the likelihood.”

]]>In general, it’s not nearly as straightforward.

]]>The analysis should stand on the data set as much as is reasonable – not any more or less. The goal being to get a purposeful and convincing analysis rather than truly representing one’s prior and truly updating that with all the data. I don’t believe anyone can truly state their prior and its almost never the case that all the data is used/usable.

This may sound like “don’t take any wooden nickels” but maybe the meta-statistics of discerning fully adequate and separate priors and data generating models is just a poor meta-statistics.

]]>I don’t know anything about lassos. But since you weirdly enough have quoted two Swedish singers you should also check out Anna Järvinen. Like with Säkert and Frida Hyvönen it’s much better if you understand the lyrics. Not that that have stopped you before. Oh yeah, also Nina/Nino Ramsby.

]]>As you can probably guess, this is not an accident.

]]>In general, you also need to know something about the *precision* of the experiment, which is encoded in n and X. Why do you need this? Because we need to choose the cut off epsilon, which will depend on how well the individual beta can be resolved, which is a function of n and X.

So the scaling is needed to reflect our substantive prior knowledge of sparsity. The extra $latex \tilde{\lambda}$ reflects the fact that we only have an “order of magnitude” idea of the scaling, so we still need to learn the exact value from the data. But with this scaling, we know that $latex \tilde{\lambda}$ should be $latex \mathcal{O}(1)$.

]]>Why? This strikes me as nonsensical. Neither the number of observations (n) nor the points at which you took the obervations (X) influence the effect sizes, nor do n and X *alone* generally give you any information about the effect sizes. Put another way: does your opinion of likely effect sizes change after someone tells you how many observations they took, or even the predictor values for these observations, without telling you anything about the values for the outcome variable?

]]>Sort of like my favorite RPG blogger, The Angry GM. Only without the cussing and with cabaret instead of fantasy fiction and statistics instead of role-playing games.

I also laughed out loud at his summary of the whole blog endeavour:

]]>But it’s a blog. If ever there was a medium to be half-arsed in it’s this one. It’s like twitter for people who aren’t pithy.

A typo?

Or maybe http://consc.net/misc/proofs.html

]]>So those lawyer like limitations often in the discussion section that seem to just say “don’t use when not appropriate” are not enough?

You want to do away with caveat emptor?

]]>