John Cook writes:

Suppose you are designing an autonomous system that will gather data and adapt its behavior to that data.

At first you face the so-called cold-start problem. You don’t have any data when you first turn the system on, and yet the system needs to do something before it has accumulated data. So you prime the pump by having the system act at first . . .

Now you face a problem. You initially let the system operate on assumptions rather than data out of necessity, but you’d like to go by data rather than assumptions once you have enough data. Not just some data, but enough data. Once you have a single data point, you have some data, but you can hardly expect a system to act reasonably based on one datum. . . . you’d like the system to gradually transition . . . weaning the system off initial assumptions as it becomes more reliant on new data.

The delicate part is how to manage this transition. How often should you adjust the relative weight of prior assumptions and empirical data? And how should you determine what weights to use? Should you set the weight given to the prior assumptions to zero at some point, or should you let the weight asymptotically approach zero?

Fortunately, there is a general theory of how to design such systems. . . .

Cool! Sounds like a good idea to me. Could be the basis for a new religion if you play it right.

Some already take it as a religion. It just requires unquestioning belief in the prior and data generating model.

Indeed. People need to understand Bayes as a *method of argumentation*.

If ASSUMPTIONS and DATA then CONCLUSIONS

… and then change assumptions until conclusions look right :-)

Ideally you should do

if ASSUMPTION then CONCLUSIONS and change assumptions until conclusions look right… then add DATA.

By look right, you don’t mean using the data twice, do you? By look right you mean look plausible according to what you thought prior to data?

Even in the second case, I’m not sure I agree. Some model make terrible predictions for observables. They shouldn’t be ‘healed’ be tweaking the priors for the model parameters until the prior predictives for the observables look like they would for a sensible model.

To put it another way, if we just tweak priors until all models make the same prior predictions for observables, there wouldn’t be any point in Bayesian model comparison at all. As with identical prior predictives, the marginal likelihoods etc would be the same.

Priors shouldn’t just be chosen because you think the parameter is in a certain region of space. Particularly when it comes to multi-dimensional parameter spaces, you should choose priors that result in predictions that are reasonable.

Ideally you make the prior predictive distribution of data as close to “reasonable” as you can, which means displaying as much of your pre-data knowledge as you can figure out how to include.

Then you add your data, and you wind up with a proper posterior.

In many cases, if you have a lot of data, you can get away with vague priors. But there are plenty of cases particularly with more advanced models, where that’s not the case, and you really need to encode real-world information in your priors if you want to have reasonable posterior inference. Wide vague priors are not good enough.

Do you see what I mean though? That if we adjust the priors until the prior predictives all look a certain (and thus similar) way, very bad models won’t be penalized for being very bad. What if a model is just a bad one and it really makes predictions that don’t look reasonable given what we think we know about its parameters? Why should it be fixed by tweaking the priors?

I do see your point about the challenges of picking priors in high-dimensional settings. And I think you have in mind parameter inference rather than model selection, as it the latter vague priors will almost always give you headaches!

By the way, I don’t often comment, and only did because it’s so rare to read one of your comments that I really doubted.

If a model can be made to give good predictions by forcing the priors to be kinda wacky compared to what you think they should be, that’s something to look into. It indicates maybe you don’t really understand the meaning of the parameter. If the model can be made to give good predictions without choosing wacky parameters then I don’t understand in what sense it would be a “very bad” model.

Sometimes parameters are physical things that we understand, like the viscosity of water or the mean income of 20 year old males in the US. If you plug in those values that you know and you get wacky predictions, *then* you have a bad model. You shouldn’t switch to deciding that the mean income of 20 year old males is $800,000/yr just because it makes the rest of your model for their SAT scores or something give a good prediction.

My guess (but maybe I’m wrong) is that what Daniel means is that you have data that consists of measurements of “input” type variables and corresponding “output” type variables. Your goal is to develop a model for getting from ‘input” to “output” so you can make predictions of “output” for new “input” data. You randomly divide the data into “training set” and. “test set”, and use the values from the training set, plus your assumptions, to develop a tentative model. Then you check this tentative model using the test set. If the predictions from the tentative model fit the “test set” data well enough, fine. If not, you question your “assumptions”, changing them in a way that seems plausible to fit the data on hand (as well as any other constraints, such as physical laws), and repeat the process until your model fits the test set as well as the training set.

Daniel posted his reply while I was compoaing mine — so ignore mine.

Misread the title as “Automatic data rewriting” and thought it would be the story of how an AI invented fraud. Maybe I shouldn’t give the “data scientists” any ideas…

Wow, me too!

Hey kids! Worried about committing fraud without supporting data? If not, you should be! Just ask Brian Wansink!

But now you don’t have to worry about being embarrassed for fraud or, worse yet, having to retract your paper!

Get data to back up your fraudulent conclusions! Just use your conclusions as the prior and run your model backward! That’s right, you’ll have data that fits your model in no time! Publish! Present! Discuss! Challenging prevailing wisdom! That’s right, with back-fit fraudulent data, you can do real ECNEIS! Don’t let nature push you around! Want 2^CM = E? You got it! You’re competitors won’t know what’s going on! With the BAck-Fit FraudLEnt Data modelling tool, they’ll be BAFFLED alright!

If you’re getting caught cz somebody looked at your data, you first have to check the AI that was writing the paper. In most cases it made a mistake when generating the title and it was too catchy.

Automatic academic paper generation is a challenging optimization problem where you want to reduce the rejection rate but don’t want to attract too much attention.

Also make sure to disguise the fraud as incompetence and if somebody is sceptical of you, be nice to them at the next conference.

Pro tip: Also when debugging the AI you should first test it with a publisher using your gratuates.

Thank you! I have strong fraud skills but I’m looking to deepen my skill set. Would you be interested in mentoring? You can review my forthcoming PNAS paper: “Five Minute Intervention Increases Lifetime Earnings 22.5557%, Reduces Obesity by 31.22547%, and Increases Lifetime Happiness by 57.325%”.

I am sorry I don’t get the joke. Can someone clue me in? It seems John Cook is jsut trying to describe a Bayesian approach. What am I missing?

The magic of this system is not in Bayes, but in the ‘more data’ part. :)

(likelihood swamps prior, or n/N approaches 1, however you want to think about it)

There are also adaptive weighting, but non-Bayes approaches, like that of O’Gorman, showing adaptive tests using permutations of residuals.