Skip to content

The long pursuit

In a comment on our post, Using black-box machine learning predictions as inputs to a Bayesian analysis, Allan Cousins writes:

I find this combination of techniques exceedingly useful when I have a lot of data on an indicator that informs me about the outcome of interest but where I have relatively sparse data about the outcome itself. A prime example of this is estimating constructions costs where you have 10x-20x as many estimates as you do actual project costs.

So far this is all about postprocessing: the idea that the black-box predictions represent a shortcut that allows us to avoid full Bayesian modeling of the data so we can focus on improving the predictions of interest.

But then, as happens sometimes in comments, the discussion took an interesting and unexpected turn, when Daniel Lakeland replied:

My experience with estimating construction costs was that it was rarely about getting the right answer, or making good decisions, and mostly about some kind of negotiation tactics combined with lawsuit protection.

Cousins responded:

You need to separate cost estimation from bid preparation. They are fundamentally different tasks.

When estimating construction costs your goal should always be to estimate as closely as possible the future costs (as they occur so that financing can be properly incorporated into the bid) for the project as specified.

And then they were off, with the discussion touching on the value of subjective human inputs, the need to calibrate such inputs, and how they fit into statistical modeling.

What I wanted to point out was how several concepts arose in different places of the discussion:

Combining information from different sources, which first arose in the context of postprocessing machine learning estimates, then later when considering how to integrate human predictions with data-based statistical models.

Calibration, which is relevant for when taking predictions from a statistical or machine learning model and applying them in new cases (I recommend hierarchical modeling) and also, as Cousins wrote, for working with expert judgments. I’ve long held the view that, instead of trying to elicit subjective priors, you should eliciting estimates and then construct a statistical model to get probabilistic forecasts from these estimates by calibrating them on data. In any given case, the probabilistic forecast can be used a prior (a “subjective prior,” if you like) for the decision problem at hand. We demonstrate some of these steps in this article on decision making for radon measurement and remediation.

The different goals of statistical modeling, decision theory, and game theory, which came up all over. The need to go back and forth between methods and goals.

One of the fascinating things about statistics is how these and other ideas keep reappearing in different places.

P.S. Title from here; cat picture from Diana Senechal.


  1. Andrew, thanks for providing a forum where lots of interesting people come and have wide ranging discussions on the human endeavor. I’ve learned a LOT from the commenters here over the years, and I’ve had discussions with people in a huge number of fields. It’s a major example of delivering on the promise of the internet.

  2. david says:

    This is a great forum for intellectual discussion indeed.

  3. AllanC says:

    This is a timely reminder for me as construction switches focus from completing existing projects to submitting bids for next year’s builds (at least in Canada). Construction offers a treasure trove of data that can be really fun to work with (it’s also incredibly messy / full of confounders adding to the fun). For those interested about bid spreads and the like there is a nice Canadian website that tracks public procurement where you can lookup bid data and awarded contracts:

    I’ll also echo Daniel’s and David’s comments above. I honestly believe I have learned more from this blog and its commentators then I did via a formal education in engineering & applied math.

    Merry Christmas Andrew and to all of those who make this blog as great as it is

  4. Curious says:

    I find it interesting that Guzey’s only discussion of the effects of sleep deprivation on cognitive performance presented are the results of his own research on himself for a short period of time.

    “If you decide to experiment with your sleeping habits, you can objectively assess the impact on your cognitive abilities using a site like Quantified Mind (a), which lets you design a battery of cognitive tests (on working memory, reaction time, executive control, mental rotation, and many others) to take and compare your results over time, while also allowing to track variables such as amount of sleep at every assessment. When I slept for 6 hours a night with no naps for 5 days, I felt pretty bad, but there was no difference in my cognitive scores, compared to the baseline.”

    A quick google search produced a couple of studies worth reading:

    Sleep deprivation: Impact on cognitive performance
    Paula Alhola1 and Päivi Polo-Kantola2.

    The sleep-deprived human brain
    Adam J. Krause,1 Eti Ben Simon,1 Bryce A. Mander,1 Stephanie M. Greer,2 Jared M. Saletin,1 Andrea N. Goldstein-Piekarski,2 and Matthew P. Walker1,2.

Leave a Reply