Skip to content

Causal Impact from Google

Bill Harris writes:

Did you see Would that be something worth a joint post and discussion from you and Judea?

I then wrote: Interesting. It seems to all depend on the choice of “control time series.” That said, it could still be a useful method.

Bill replied:

The good: Bayesian approaches made very approachable for a large audience, a state-space approach made very approachable, an emphasis on thinking about the applicability of the approach, and a potentially useful method for many cases ….

The questions: model averaging, spike-and-slab priors, and (I think the article said) (the equivalent of) empirical Bayes. That and it’s limited in key ways.

The ugly: the required Boom library can be problematic on Debian.

I have no idea but I thought I’d throw it out there for all of you.


  1. Here’s the original post by Kay H. Brodersen, the author of the package, on Google’s Open Source blog:

    And here’s the paper on which it’s based:

  2. Keith O'Rourke says:

    > It seems to all depend on the choice of “control time series.”

    ” All you need is a second time series to act a a “virtual” control, which is unaffected by your actions but which is still subject to the extraneous effects you’re worried about.”

    Yup, it would be very handy to have that!

    • Anonymous says:

      isn’t this near-trivial when it comes to serving up web pages and ads? You can practically do realtime R”C”Ts? There may be some minor spillover effects (which you can further mitigate if you’re google and have information on social networks).

      Seems like it’s straightforward to get orders of magnitude cleaner study design than in medical sciences/epidemiology.

    • zbicyclist says:

      Very handy, but not sufficient. Your virtual control should not just be subject to the same effects [and in the test period as well as the control period] but react to them in the same way.

      It’s really pretty simple to find an adequate control series in the preperiod, if you have enough potential choices. Whether it stays a good control in the postperiod is an entirely different issue.

      Often in testing you have a period in between the end of the matching period and the start of the test [the period used for test setup; probably less common in web ad testing]. The analysis of this period was always very interesting — nothing should be going on then, but …

  3. numeric says:

    Sounds like cointegration to me, something known in the econometric literature since the 80’s. Is the Google approach somehow different?

  4. Bill Harris says:

    It now works nicely on Debian Wheezy.

  5. Kaiser says:

    Like others, I’d say this is interesting from a methodology perspective but certainly not convincing from a practical perpective. How casually do they throw around a sentence like: “(For the marketing example, you might choose web clicks from a region where the campaign didn’t run.)”! In most digital marketing examples, they should be running a randomized experiment. If they aren’t doing it, already you smell something fishy.

    Some background: I work in this area. The reason for this development is when Google entered the display ad business, they quickly realize that web clicks from display ads are rare (which can be interpreted as display ads are a waste of money). The industry is pivoting and saying clicks do not measure the impact of such advertising accurately. Gone are the days when the clickstream is marketed as the singular proof that digital marketing beats traditional marketing. The irony is that these new methods have a lot in common with older methods developed to evaluate off-line advertising campaigns. As Andrew said, with these methods, you could pretty much prove anything since you pick your own baseline.

  6. Rahul says:

    What would happen if I ran this tool on two regional time series neither of which had any campaign running. i.e. control versus control.

    Would it still throw some spurious effect at me?

    • Anonymous says:

      Forget “spurious”, kaiser and my above point (under Keith) is that it’s stupid to compare two regions in the first place when you have the option not to, simply run a randomized experiment.

Leave a Reply