## David Hogg on statistics

We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there is a dimension along which the data points have negligible uncertainties, and another along which all the uncertainties can be described by Gaussians of known variance; these conditions are rarely met in practice. We consider cases of general, heterogeneous, and arbitrarily covariant two-dimensional uncertainties, and situations in which there are bad data (large outliers), unknown uncertainties, and unknown but expected intrinsic scatter in the linear relationship being fit. Above all we emphasize the importance of having a “generative model” for the data, even an approximate one. Once there is a generative model, the subsequent fitting is non-arbitrary because the model permits direct computation of the likelihood of the parameters or the posterior probability distribution. Construction of a posterior probability distribution is indispensible if there are “nuisance parameters” to marginalize away.

In this pedagogical text aimed at those wanting to start thinking about or brush up on probabilistic inference, I review the rules by which probability distribution functions can (and cannot) be combined. I connect these rules to the operations performed in probabilistic data analysis. Dimensional analysis is emphasized as a valuable tool for helping to construct non-wrong probabilistic statements. The applications of probability calculus in constructing likelihoods, marginalized likelihoods, posterior probabilities, and posterior predictions are all discussed.

1. K? O'Rourke says:

Neat!

> Construction of a posterior probability distribution is indispensible if there are “nuisance
> parameters” to marginalize away.
Hard to argue with that and in applications there are almost always are “nuisance parameters”.

I suspect David is correct to encourage people to (re)-think these issues from different perspectives.

2. Anonymous says:

Do you want a model of the world or a summary of the world?

Angrist and Pishke argue for the latter, as in “best linear approximation”.

We all know the data sucks.

• K? O'Rourke says:

That (or a summary) just changes the likelihood for the data into a summary (summed or marginalized over all data that could have been observed with with exact same summary) – so no real change in issues.

This is the likelihood Approximate Bayesian Computation (ABC using the rejection method) gets exactly via likelihood = posterior/prior. This relative change in probabilities, prior versus posterior is a very direct way to motivate likelihood and what comes from the data and data model.

By the way ABC is just a variation on Two stage sampling, first prarameters are drawn from the prior and then data drawn from probability generating models given the parameters drawn – that I described here as Bayes Theorem being just Nearest Neighbors. Two stage sampling was described long ago by Don Rubin as a conceptual model in his paper on Bayesian frequency calculations where he distinguished justifiable from relevant.

On the other hand, I think my Physics background was too weak to fully appreciate David and colleagues drafts…