## Prior distributions and the Australia principle

There’s an idea in philosophy called the Australia principle—I don’t know the original of this theory but here’s an example that turned up in a google search—that posits that Australia doesn’t exist; instead, they just build the parts that are needed when you visit: a little mock-up of the airport, a cityscape with a model of the Sydney Opera House in the background, some kangaroos, a bunch of desert in case you go into the outback, etc. The idea is that it would be ridiculously inefficient to build an entire continent and that it makes much more sense for them to just construct a sort of stage set for the few places you’ll ever go.

And this is the principle underlying the article, The prior can often only be understood in the context of the likelihood, by Dan Simpson, Mike Betancourt, and myself. The idea is that, for any given problem, for places in parameter space where the likelihood is strong, relative to the questions you’re asking, you won’t need to worry much about the prior; something vague will do. And in places where the likelihood is weak, relative to the questions you’re asking, you’ll need to construct more of a prior to make up the difference.

This implies:
1. The prior can often only be understood in the context of the likelihood.
2. What prior is needed can depend on the question being asked.

To follow up on item 2, consider a survey of 3000 people, each of whom is asked a binary survey response, and suppose this survey is a simple random sample of the general population. If this is a public opinion poll, N = 3000 is more than enough: the standard error of the sample proportion is something like 0.5/sqrt(3000) = 0.01; you can estimate a proportion to an accuracy of about 1 percentage point, which is fine for all practical purposes, especially considering that, realistically, nonsampling error will be likely be more than that anyway. On the other hand, if the question on this survey of 3000 people is whether your baby is a boy or a girl, and if the goal is to compare sex ratios of beautiful and ugly parents, then N = 3000 is way way too small to tell you anything (see, for example, the discussion on page 645 here), and if you want any kind of reasonable posterior distribution for the difference in sex ratios you’ll need a strong prior. You need to supply the relevant scenery yourself, as it’s not coming from the likelihood.

The same principle—that the prior you need depends on the other information you have and the question you’re asking—also applies to assumptions within the data model (which in turn determines the likelihood). But for simplicity here we’re following the usual convention and pretending that the likelihood is known exactly ahead of time so that all the modeling choices arise in the prior.

P.S. The funny thing is, Dan Simpson is from Australia himself. Just a coincidence, I’m sure.

1. Ethan Bolker says:

I read a science fiction story years where the protagonist’s world was constructed on the fly for him as he wandered through it. I think he was visiting Seattle. You could probably get a citation from science fiction stackexchange.

I’m also reminded of the sad fact that the longer you look in vain for something lost the less you know about where it is, since you look in the most likely places first so flatten out your prior.

(Neither of these observations contributes much to the statistics discussion here.)

2. Sam Clifford says:

We only build stats postdocs when they’re needed.

3. Junpeng Lao says:

Funny enough I am still not sure Dan Simpson actually exists.

4. jrkrideau says:

Potemkin returns!

There does seem to be some question if the country of Australia exists https://www.buzzfeed.com/davidmack/australia-is-real-i-swear?utm_term=.sabo8d23jP#.xtm3y64DoN.

Dan Simpson seems like a ‘bot to me.

Do you build parts of your blog as needed for the readers?

6. Jonathan says:

I think of this as you’re going down a path and you come to a dark space. You toss in a pebble to see if there’s a floor and how far away it may be: gather information. Then you take that information and try to narrow down where you might step, if you step in at all. It’s not the same as visualizing it as you sticking your foot carefully out to see what’s there or not, where you can step or not, because that’s drawing the line of motion through you along a path, which means you’ll tend to over-value your priors. I mean, bluntly, you don’t have the energy to go back to the beginning to check every path to determine the best. You have too much mass, which means you need food to cover that much ground when ‘feeding’ and converting that into energy is one side of the square we call time, where the other side is the meanings of feeding in your simulation or time level. So more meanings to feeding, the more processes stack up, the more time it takes to sort them, and that replicates the problem of walking on a path because a walk is a step in infinite directions in the abstract, and it thus relative to the direction you’re actually going because all those processes condense toward real, which also means real numbers and real results. (I try to raise my ‘thinking about numbers spatially’ game when I post here. It’s a real delight for me. Probability is multi-dimensional, and we slice it into 2D planes and 3D shapes and, where we’re going is the recognition that we can figure out the shapes that push on and connect to the outsides of any probability work space you draw.)

Most people don’t truly realize mathematical models are stories you outline conceptually. Either you begin from the story and figure out how the numbers work or you start from the numbers and figure out the story. And people fail to remember that scientific and mathematical terminology is meant to describe actual events you can abstract to the level of organizing them into Goedel Statements – you know, statements that demonstrate the incompleteness of any particular system by doing their best to be complete, which they can’t be (so at every step, you look in the mirror and try to perfect yourself).

Now imagine you’re at the corner of this square. It’s you. Your life paths are in front of you, as you can best see them. You can stand there paralyzed as you figure out what you want: compare this to that, that to this is hard enough but it gets absurdly hard when you compare more in different ways. Nope, you look to see the greatest cost. Good start might be: let’s see if there’s a bottom to this place or if I only hear the sound of a pebble falling away into nothing. You ask yourself: is it gone or did it stop falling carefully? Try a heavier pebble, and so on. You figure out there’s a map of sorts and you think I’ll go this way, because that looks like a secure place to step, and you start to move and realize you didn’t look past that step. So you start looking past steps and realize that maybe what looked like a good idea is actually a bad idea in the next steps. But that gets complicated, doesn’t it? How many steps do you look into the future? How many steps can you see into the future? You actually count to the point where uncertainty reveals your choice: it isn’t what you want but what you give up, what you pay at every stage. The real argument you make behind garden of forking paths is that you as a researcher are in the garden of truth with yourself, so when you choose bad paths, you are actually valuing and defining exactly how untrue you are to yourself as you could be and the more you are defining yourself as this person, the one who reduces who you could be. Note the reduces: it’s a shape which presses against the outside of you and your results.

7. David Rohde says:

I really like the GP example in that paper.

8. Justin Smith says:

Hi, not to sound super lazy here, but if you already know a sample size is too small to be much good, what good will a prior do? Sure one can model and plug and chug and get an answer/make a decision at the end, but is it any good? My not too useful answer is sometimes yes and sometimes no. I don’t know where I was going with that.

• Andrew says:

Justin:

A few things:

1. Sometimes you are combining two sources of information, both of which are informative. Partial pooling! The sample gives you some info, and the population model (the prior) gives you info also. An example is the radon problem in my book with Jennifer.

2. A model can have lots of parameters, and you can have different amounts of prior information on different parameters in the model. An example is my toxicology model with Bois and Jiang.

3. If nothing else, a prior can be useful in analyzing data collected by others, for example, that beauty-and-sex ratio study where the data are about 1/100 as informative as any reasonable prior.