How do companies use Bayesian methods?

Jason May writes:

I’m in Northwestern’s Predictive Analytics grad program. I’m working on a project providing Case Studies of how companies use certain analytic processes and want to use Bayesian Analysis as my focus.

The problem: I can find tons of work on how one might apply Bayesian Statistics to different industries but very little on how companies actually do so except as blurbs in larger pieces.

I was wondering if you might have ideas of where to look for cases of real life companies using Bayesian principles as an overall strategy.

Some examples that come to mind are pharmaceutical companies that use hierarchical pharmacokinetic/pharmacodynamic modeling, as well as people on the Stan users list who are using Bayes in various business settings. And I know that some companies do formal decision analysis which I think is typically done in a Bayesian framework. And I’ve given some short courses at companies, which implies that they’re interested in Bayesian methods, though I don’t really know if they ended up following my particular recommendations.

Perhaps readers can to supply other examples?

81 thoughts on “How do companies use Bayesian methods?

  1. I haven’t looked at the details but recently spoke to one of the authors of the following paper, which describes the use of a Bayesian approach in the decision-making process regarding the initiation (or not) of phase 3 clinical trials for potential new drugs:

    Tony Sabin, James Matcham, Sarah Bray, Andrew Copas, and Mahesh K. B. Parmar
    A Quantitative Process for Enhancing End of Phase 2 Decisions
    Stat Biopharm Res. Jan 2014; 6(1): 67–77.
    http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3967501/

  2. You might want to think about selective reporting here – as I was taught in MBA school you only discuss approaches in public that you have not figured out how to use successfully. If you have figured it out – thats a proprietary competitive advantage.

    Perhaps ask folks who used to work in various companies …

    (Or call the companies executives, say you are a student, and ask for information that they should know not to divulge but will to students. Worked for me with mangers of audit based consulting firms when I was a student – being supervised by one of the firms that wanted to know what there competitors were planning.)

    • You sometimes also see companies willing to talk about methods which they’ve obtained patents for. That’s the trade — the idea is made public, but the company gets protection for it. Whether statistical methods in themselves are, or should be, patentable, is another question, but you might at least find some interesting classes of application this way.

  3. I work for Akamai Technologies, Cambridge, MA, and we use Bayesian methods and hierarchical models (principally, these days, BUGS via JAGS) for studying network latencies, their components, and for studying Internet activity (time) series of large numbers of users. There are various research efforts underway to, for instance, improve given estimates of position of clusters of our servers using Bayesian adaptations of spatial survey leveling techniques.

    The rest of our work of this kind is done in R and in Python.

    Bayesian methods are in many respects ideal for our purposes, since we often have both decisions to make with limited data, and because there are costs to underestimating and overestimating. I’ve also learned a lot which is useful for our problems by studying Bayesian applications in biology, notably population ecology, and in geophysics, e.g., hydrology.

  4. There this work

    http://www.amazon.com/Bayesian-Multiple-Target-Tracking-Library/dp/1580530249/ref=sr_1_3?ie=UTF8&qid=1413566528&sr=8-3&keywords=stone+bayesian

    I know L. Stone and T. Corwin. They built a successful company that’s been around for 3+ decades, with several spin-offs around this work. They also specialized in Bayesian Search Theory which was used in a major way for anti-submarine warfare. That same experience has been used in searches for downed planes over land and sea over the years. It’s recently gained some exposure because of that recent NYT article to which Gelman contributed.

    Note: The philosopher Dr. Deborah Mayo has recently started a disinformation campaign on her blog and twitter claiming this isn’t real Bayes. Mayo has never done any statistics and doesn’t know the first thing about the technical details of this work. I assure you priors, which either represent positive information, or assumptions that we’re willing to entertain for the sake of the analysis, are key and not incidental to this work.

    • I think this is a mischaracterizaton of Mayo’s (and Spanos and Wasserman, for that matter). The frequentist approach is based in the long run (ha-ha) on being correct in the long run. Bayesian methods can be judged as to how well they improve being correct in the long run, and all three of these researchers have been somewhat triumphalist about the movement of Bayesian practitioners towards evaluating their methods through the prism of long-term success. No current statistician (and even Andrew usually has to go back to Feller to contradict this) doesn’t believe that prior information can be helpful in a number of problems. But helpful is defined as improving long-term accuracy, and that is frequentism.

      • Numeric:

        You write, “No current statistician (and even Andrew usually has to go back to Feller to contradict this) doesn’t believe that prior information can be helpful in a number of problems.”

        No, I don’t usually have to go back to Feller to contradict this. I mentioned Feller because his case is so interesting, because he wasn’t a statistician yet expressed such strong feelings about statistical methods. I routinely use more recent examples. Our article about Feller cited a couple of much more recent anti-Bayesian quotes, and on this blog I’ve also mentioned the uninformed anti-Bayesian statement of others from time to time.

        If anti-Bayesians were to say something like, “Bayesian methods are fine, let’s just evaluate how they work,” I’d have no problem. My problem is when people are trying to actively dissuade people from using prior information because of (what I view as) misplaced concerns about subjectivity or lack of coverage or whatever. The position that you express in your comment—that the systematic use of prior information can improve long-term accuracy—is a position that has been expressed by Rubin, and it’s a view that I hold.

        • I think my main concern was with the word disinformation, which is defined as an act of deception. My experience is that you primarily see disinformation in economics and political science, where people have strong normative beliefs about how people should behave. Erroneous arguments, on the other hand, exist in all fields of inquiry. I agree it is frustrating for a statistician to read Mayo’s arguments because we want to see a model and estimators derived from that model, rather than philosophical arguments (and don’t get me started on Sleeping Beauty arguments). But model validation is the most important aspect of statistics (thank whatever deity you wish that we no longer have to wade through the asymptotic efficiencies of 2SLS, 3SLS, and maximum likelihood for SE models). And the error-statistics philosophy may very well have contributions there (as does the posterior checks Andrew argues for). All models have contradictions, but some contradictions are less important than others.

        • Re “misplaced concerns about subjectivity” in Andrew’s comment: I emphasize to students that deciding what significance level to use in frequentist statistics is a subjective decision. We can’t avoid some subjectivity in statistical inference. The real issue is to examine the subjective decisions and ask if they have good reasoning/evidence behind them. And sometimes the reasoning/evidence for two possibilites may seem to be equally good or bad.

        • As Rubin would say, an advantage of the Bayesian approach is that effort expended in specifying the prior distribution or in specifying the data model is effort spent understanding “the science,” that is, the underlying phenomena of interest and the measurement procedure; whereas effort expended in setting a p-value cutoff or some other such rule is effort spent on mathematics.

          To put it another way, I would do an applied research project and a statistical analysis in order to, say, estimate the effect of A on B, not to see whether A has a statistically significant effect on B.

        • > effort expended in setting a p-value cutoff or some other such rule is effort spent on mathematics.

          That’s what I realised in my DPhil thesis and often all that is practically gained is that an explicit prior that might be argued about is avoided (often with just an approximate p-value or confidence interval almost uniform coverage). Though I would agree with Efron and others, if a convenience prior is used Bayes rule offers no assurance at all that the posterior is in any way sensible (see my Two cheers for Bayes LtoE).

          But it must very hard if not impossible for people who have not actually worked in realistic applications (perhaps Mayo) to realise the mathematical effort required. As Fisher put it “a slight variation in the specification” (e.g. what an investigator is interested in or a slight deviation in study design) can totally change what [mathematical machinery] is required to adequately deal with the application. If it was an easy no brainer to get p-value cutoffs and uniform confidence interval coverage in general they would almost always be sensible secondary if not primary analyses.

      • I may have mis-characterized Spanos and Wasserman’s position. Mayo is a different story since she hasn’t done her homework.

        On twitter she claimed the Bayesian search theory stuff was “empirical Bayes (frequentist)”. Here is a wikipedia explanation of what Bayesian search theory often looks like (as used in that downed airplane in 2009)

        (1) Formulate as many reasonable hypotheses as possible about what may have happened to the object.
        (2) For each hypothesis, construct a probability density function for the location of the object.
        (3) Construct a function giving the probability of actually finding an object in location X when searching there if it really is in location X. In an ocean search, this is usually a function of water depth — in shallow water chances of finding an object are good if the search is in the right place. In deep water chances are reduced.
        (4) Combine the above information coherently to produce an overall probability density map. (Usually this simply means multiplying the two functions together.) This gives the probability of finding the object by looking in location X, for all possible locations X. (This can be visualized as a contour map of probability.)
        (5) Construct a search path which starts at the point of highest probability and ‘scans’ over high probability areas, then intermediate probabilities, and finally low probability areas.
        (6) Revise all the probabilities continuously during the search. For example, if the hypotheses for location X imply the likely disintegration of the object and the search at location X has yielded no fragments, then the probability that the object is somewhere around there is greatly reduced (though not usually to zero) while the probabilities of its being at other locations is correspondingly increased. The revision process is done by applying Bayes’ theorem.

        Priors on the hypothesis always exist in any real problem since there is no way each hypothesis is equally plausible and we usually posses a great deal of important (non-frequency) information about the failure modes. Does that sound like “empirical Bayes (frequentist)” to you? What it actually is full Bayesian discrete model averaging combined with sequential Bayes theorem updates.

        I don’t mind if Mayo doesn’t know or want to know the technical details, but passing yourself off as having expertise you don’t have and then advising people they should reject Bayesian Statistics based on that “expertise” is at least as unethical as self-plagarization or some of the other sins complained about on this page.

        • Anon:

          I haven’t seen Mayo “advising people they should reject Bayesian statistics.” It’s my impression that she sees Bayesian statistics as oversold, to the extent that people often overlook other, important modes of inference.

        • She says or implies that constantly. Sometimes she moderates her criticisms to avoid sounding like a crank. Mayo believes on philosophical grounds alone that there are no legitimate uses of Bayesian statistics that aren’t really frequentist.

        • Ask her: (A) Does she think there are any legitimate Bayesian success stories that aren’t secretly frequentist? (B) What truly non-frequentist Bayesian methods does she think should be kept and promoted?

          She’s never admitted to any of the former (hence the absurd claim that Bayesian Search Theory is really frequentist) and never admitted any truly Bayesian Bayesian methods are alright in practice.

        • Andrew, here is Mayo responding to you the other day on Simply Statistics:

          “Every one of the criticisms of spurious p-values depends on having a notion of spuriousness, i.e., the stated or reported error probabilities differ from the actual ones. What’s the analogous criteria for outlawing a Bayesian inference to a substantive scientific theory? Where’s the Bayesian violation? My point is that you need a non-bayesian critique–e.g., having to live up to reported error probabilities–to block the examples Gelman dislikes. ”

          She doesn’t come right out and say “Bayes isn’t legitimate and should be done away with” because that sounds to much like a crank, but that’s the direct implication of what’s she saying. Bayes is only legitimate if it gets the frequenices right in which case you don’t need Bayesian methods at all. Just use frequentist ones.

        • Anon:

          I don’t think Mayo is saying that Bayes isn’t legitimate. She’s saying that, in order to evaluate model adequacy, you have to go beyond Bayesian principles. I disagree, in that I think of posterior predictive checking as Bayesian, but I see her point: if “Bayes” is taken to be the Wikipedia version of Bayesian inference in which the goal is to compute the posterior probability of models. That was they key point in my paper with Shalizi, that Bayesian posterior distributions can be used in model checking and that there is no need to tie Bayesianism to an inductivist philosophy of inference.

          Anyway, I take Mayo’s perspective to be that Bayesian methods can be useful and non-Bayesian methods can be useful, but to evaluate and criticize statistical methods, error-statistical ideas are necessary. I consider those error-statistical methods to be essentially Bayesian, whereas thinks of me (or, one might say, the “Bayesian Data Analysis” school) as an exception within Bayesian statistics.

          Along with all this is what I see as Mayo’s irritation with Bayes being oversold, which perhaps leads her to a reflexive anti-Bayeisan attitude at times.

          But I do think she recognizes that I am doing real science when I’m using Bayesian inference in my applied work. Unlike, say, Leo Breiman who liked to pretend that applied Bayesians didn’t exist, and who would carefully look away when being exposed to any applied Bayesian work, so as not to destroy his pure vision of the world.

        • Andrew,

          Mayo really does believe Bayes is nonsense. In particular, she’s never admitted anywhere on her blog, twitter feed, or published papers, to a single useful example of Bayes in contradiction to your statement “I take Mayo’s perspective to be that Bayesian methods can be useful”. That’s why she called Bayesian Search Theory “empirical bayes (frequentist)” to deny that it was a legitimate Bayesian success. She wrote a whole post on it.

          If you don’t believe me, or her own words, ask her point blank.

          I have no problem with her believing Bayes is nonsense, but when she uses her position of “authority” to go around spreading disinformation about real Bayesian successes when she clearly doesn’t know the first thing about it, then that ain’t right.

        • And Andrew, while we’re at it. Mayo constantly insinuates you’re really a frequentist.

          For example, suppose I had an assumption H about fixed parameter m, and looked at the consequences of that assumption using Bayes theorem. So I start with a prior P(m|H) and get a posterior P(m|H,D).

          Now to see if that assumption H makes sense, I find the 99% Bayesian Credibility interval from P(m|H,D) and find that the true value of m is way outside this interval. Based on that I informally dismiss H and consider other working hypothesis.

          Mayo calls this “frequentist”. Even though none of the distributions are frequency distributions and there’s no attempt to calibrate P(m|H,D) to match some non-existent frequency distribution for m.

          That’s the basis on which she’s claiming you’re secretly frequentist. As far as I can tell, she just doesn’t know enough about the details of any of this to see the difference.

          Again, I have no problem with her not knowing the details. The problem I have is she passes herself off as having that expertise, which she clearly doesn’t have and then tries to use the “fact” that you’re really a frequentist to talk people out of doing Bayesian work.

        • Anon:

          Again, I’ll go with the Rubin attitude that Bayesian is a way of constructing statistical procedures and frequentist is a way of evaluating statistical procedures. Bayes is pleasantly coherent (conditional on the model) which gives us real firepower in constructing methods: just set up some assumptions and watch the model go. Frequentist is pleasantly incoherent in the sense that there is no single criterion for evaluating a procedure. (Various econometrics-educated people sometimes naively follow a bias-trumps-all attitude, but, as we’ve discussed various times on this blog, that attitude makes no sense, and the more sophisticated econometricians and statisticians realize this.) I don’t mind someone thinking I’m really a frequentist as long as they I accept that I’m really a Bayesian as well.

          And I’m not a fan of putting words in people’s mouths. You write, “Mayo really does believe Bayes is nonsense,” but I’ve never heard her say this or read anywhere that she’s written this.

          I’m not averse to arguing with people—for example, I still hold that David Cox, for all his brilliance, misspoke when he made his statement about situations where a statistician “should absolutely not use prior information”—but I’m not going to be put in the position of disagreeing with Mayo on something that she’s never said.

        • “And I’m not a fan of putting words in people’s mouths. You write, “Mayo really does believe Bayes is nonsense,” but I’ve never heard her say this or read anywhere that she’s written this.”

          If someone constantly writes to the effect that “the only legitimate inferences are ones that have properties XYZ, and Bayesians methods don’t have XYZ”, then that means they reject Bayesian methods. This doesn’t require biblical scholarship to divine the real meaning.

        • Andrew, lets keep it basic. Mayo often says Bayes theorem should only be used if the prior is a true frequency distribution. So a prior on a fixed parameter is not legitimate.

          If she doesn’t think that’s legitimate, then what parts of Bayes is she ok with exactly?

        • Frankly, I’m having trouble believing this is even being questioned. Mayo calls herself a “Error Statistician” because she believes her “Error Statistics” viewpoint is right. The Error Statistics viewpoint is contrary to any part of Bayesian statistics that can’t be given a frequentist foundation.

          She wants others to adopt this viewpoint. That’s why she writes papers and tries to convince people. She wants people to reject Bayes and adopt her viewpoint. She’s not in the eclectic sometimes frequentist/ sometimes Bayesian camp. That’s why she works to so hard — to the point of telling outright disinformation about Bayesian methods — to make any Bayesian successes “frequentist”.

          It’s the hight of silliness to think she doesn’t believe her own stated philosophy.

        • Here’s the evidence you asked for:

          https://twitter.com/learnfromerror/status/528265000897368065

          Mayo admits the only legitimate Bayesian methods are the ones that are actually Frequentist. So for example that stan project you did on that guys height or the Bayesian search theory used to find downed planes should be abandoned.

          Moreover Mayo has been going around telling deliberate lies about any Bayesian success, such as your work or the Bayesian Search Theory is actually frequentist when even the slightest familiarity with Bayesian Search Theory indicates it can’t possibly be Frequentist.

          If a guy who had never done any physics and only knew high school algebra claimed relativity and Quantum Mechanics were bogus and that the atom bomb and microchip only required Newtonian mechanics, then physicists would immediately dismiss them a quacks.

          But that’s exactly the situation Mayo is in. In the statistics community she is lauded as a major contributor and given a six figure salary for life. The way you all rallied around that obvious fraud is disgusting.

        • Anon:

          I followed the link and I disagree with your statement that “Mayo admits the only legitimate Bayesian methods are the ones that are actually Frequentist.” What Mayo was doing was agreeing with a blog post that had expressed annoyance about Bayes being hyped.

          Later in the thread, Mayo says that she agrees that Bayes “is the only way to go” when “priors are frequentist, or it’s just a technical trick.” But I don’t think that she is denying that Bayes can be useful in other settings (such as my own applied research). She is saying that in settings such as mine, Bayes is not “the only way to go”—and that’s fine with me. I agree that Bayes is not the only way to go in the problems I work on. All I claim is that Bayes is an effective framework that works well for me and for many others.

          To put it another way, I don’t see Mayo as being in opposition to the use of Bayes in applied work, even in settings where the prior distribution corresponds to no physical sampling distribution. Rather, she’s expressing opposition to an attitude that Bayes is always best.

          Don’t get me wrong—there are things that Mayo has written that I disagree with—but I think that here you are misrepresenting her views.

        • Lets just sum up the facts:

          (1) No where in any of her writings does she admit any truly Bayesian methods are legitimate.

          (2) She’s a philosopher who’s been preaching an anti-Baysian philosophy for 3+ decades.

          (3) When asked directly if she thinks Bayes is ever the way to go she responds: when it’s frequentist.

          (4) She consistently tells bold faced lies about Bayesian methods such as claiming Bayesian Search Theory is really “empirical Bayes (frequentist)” when in fact it has absolutely nothing to do with “empirical Bayes” or “frequentism”.

          You take 1-4 and come to conclusion she’s really ok with truely Bayesian methods after all. Well that conclusion is so clueless it makes me question everything you’ve ever written.

        • I just hope you don’t question my .234 theorem. I’m really proud of that one!

          More seriously, my experience with Mayo is that she’s a very nice person and that actually counts a lot with me. Beyond this, I like that she’s trying to interpret statistics from a modern Popperian perspective. Yes, she’s coming at it from a different perspective than I am, and she’s said things that I disagree with, but I’m happy calling myself a practitioner of Bayesian error statistics, as long as I expand the definition of “error statistics” enough to include what I do. Which is fair enough, as I’d also like to consider myself a Popperian and a Lakatosian even if I don’t agree with everything they write. In doing research, especially in philosophy, we don’t have the luxury of only working with people who agree with us on everything or even on most things. I’d rather take what is useful for me. And I expect that Mayo feels the same about my work, that she doesn’t agree with everything that I write but she respects my perspective and wants to learn from it.

        • If someone believes fundamentally that the only legitimate use of Bayes theorem is when the prior can be given a frequency interpretation, that doesn’t mean they think Bayes is oversold. It means they reject Bayesian Statistics.

        • Incidentally, even the “likelihoods” in this kind of work aren’t “frequentist”. While a frequentist can imagine repeating something like say a hijacking a hundred thousand times to discover the sampling distribution, this just shows they have vivid imaginations. It doesn’t mean that the “likelihoods” used have any connection to those fictitious “frequencies”.

          In reality, the “likelilhoods” represent reasonable ranges for the position of the missing item. They’re uncertainty distributions in other words, no different than a bayesian prior representing a reasonable range or uncertainty of a fixed parameter.

          Bayesian search theory couldn’t be any more Bayesian. I’ll just skip over the ridiculousness of equating empirical Bayes with frequentism.

        • Anonymous:

          At least Deborah Mayo publishes her views under her own name. Disappointing that you don’t have the guts to use your real name when you accuse her of…whatever it is you’re accusing her of (I still can’t quite tell). And at least she doesn’t try to put words in other people’s mouths, which you’ve done with both her and Andrew, just in this thread.

          As for the ridiculousness of equating empirical Bayes with frequentism, well, there are quite strong linkages, so I’m afraid you’ll have to spell out to me (and to Brad Efron) why it’s “ridiculous”:

          http://www.ams.org/journals/bull/2013-50-01/S0273-0979-2012-01374-5/

          I note that you’ve at least partially misunderstood Mayo’s views as well. She views what she calls “error statistics” not just as a way to control long-run error rates, but also as a way to probe for errors in particular cases. You may want to consider whether your disagreements with her in part reflect your own misunderstandings of her views.

          I share the sense of other commenters that Deborah Mayo and Andrew have sometimes talked past one another to some extent, and that some of their apparent differences may be differences in emphasis, or even semantic. But I’ve also found their conversations to be informative and productive, and I’m sure Andrew would say so if he thought otherwise. It seems like you’re trying to get other people to ignore her or dismiss her out of hand; if so, I think you’re derailing a productive conversation rather than promoting one.

        • I tell you exactly what I’m accusing Mayo of.

          Mayo has never done any statistics. Mayo knows almost no math. No where in any of her writings is there the slightest hint that her competence at math allows her to solve anything more very simple IID Normal type introductory problems. She seems to need help even on those.

          All that is fine. However her bias against Bayes, which she believes is rank nonsense on philosophical grounds and should be abandoned, is so strong she will go to any lengths to deny any truly Bayesian successes.

          In particular, she wrote a blog post denying Bayesian Search Theory was Bayesian. Corey had a conversation immediately in comments pointing out it was Bayesian. After seemingly accepting what Corey said she gets on twitter and tries to talk someone out of their evil Bayesian ways by calling Bayesian Search Theory quote “empirical Bayes (frequentist)”!

          Which confirms she’s unethical and ignorant. That does spell it out clearly enough?

          I, like everyone else who confronts, Mayo should be anonymous. Since Mayo can’t counter any of this with facts she will throw around accusations of misogyny and women hating. I love and respect every woman I know and have promoted the careers of every female I ran across in academia, but I can’t stand the cancerous rot at the heart of modern academia of which Mayo is a prime example.

        • Well, I went the other day to loan Mayo’s book from the library and it appears it has been stolen.

          I guess that counts for something. People stealing your work is a form of compliment (though not to be recommend).

        • Sorry, we’ll have to agree to disagree on every single thing you say. Nothing you say indicates anything more than a garden variety disagreement with her views, along with failure to understand those views and her reasons for expressing them as she does. I have no idea why you’re so upset with her, and why the fact that others who’ve interacted with her a great deal aren’t at all upset with her doesn’t give you the least pause. And you didn’t engage with my comment at all, you’re just repeating assertions you made elsewhere in the thread (so I can only assume you think Brad Efron is ignorant too, and frankly if you deny that you think that then I think you’re being inconsistent). Having gone on record, I don’t see any point in any further engagement with you.

        • @Jeremy (this is a different anon)

          “I have no idea why you’re so upset with her” – I’m going to go out on a limb and guess it maybe has to do with mayo calling possibly-the-above-anon a sexist bayesian after he harshly criticized her work.

          Efron is a deep thinker and a gifted communicator, but not beyond criticism. See for example, Xian’s very-reasonable critiques of Efron: https://xianblog.wordpress.com/2013/06/20/bayes-theorem-in-the-21st-century-really/

        • Anon:

          Thanks for the link. I’d not read that post by X. It seems very reasonable to me. I agree with X that Efron’s argument is a bit incoherent, as he seems to spend about half the time saying that noninformative priors are all that can be done because no good prior information is available, and about half the time criticizing Bayesians for using noninformative priors. Informative priors can be difficult but the alternative classical methods can be even worse when data are weak. Consider the work of Satoshi Kanazawa, for example.

        • @the other anonymous,

          Absolutely, no one’s beyond criticism. But some people–say, knowledgeable experts–are beyond being ridiculous when they speak within their area of expertise. I invoked Efron’s comments not to establish that his position is correct, but merely to establish that the position is non-ridiculous. It might not be correct, but holding it is not evidence that you’re incompetent, as the first anonymous claims. Sorry if that wasn’t clear.

        • If some philosophers sneered and scoffed at statisticians for knowing little philosophy or misunderstanding basic philosophical issues, I suspect that more statistical people (who are well represented here) would reject that as rude, not constructive and unnecessarily personal. You don’t need a philosopher to underline that aiming at the person, not the argument, is irrelevant as well as fallacious. If Virginia Mayo or anybody else is mistaken on statistics, those mistakes are the nub of the matter.

          I go most of the way with Jeremy Fox in preferring debaters with strong opinions to name themselves, as I’ve seen too many forums in which anonymous or cryptic identifiers were shields for nasty remarks, and worse. But it’s Andrew’s blog and he evidently allows anonymous posters, so there we go.

          Nevertheless I remain puzzled that “Anonymous” (the more strident one in this thread) appears to argue that remaining anonymous is a necessary protection against accusations of misogyny or sexism. One, anybody can make sexist comments, even about members of the same or similar gender. Two, “Anonymous” has now made comments clearly implying being male, so that mask is well and truly off (or “Anonymous” is being disingenuous). Three, anyone opposed to Virginia Mayo’s stance should be delighted if she slipped into that kind of defence.

          For what it’s worth, it seems to me that work like Mayo’s that contributes to trying to put statistical inference on a firmer philosophical basis is bold and brave and sorely and surely needed.

  5. I work Anchor QEA, an environmental services firm headquartered in Seattle focusing on water resources and fisheries. I use Bayesian models for mark-recapture analyses primarily. I work mostly in the PNW and Columbia River basin. One recent thing in the field is the development of a very cool but messy data set of millions of tagged fish that can potentially be detected at hundreds of detection sites throughout the region. [PTAGIS](www.ptagis.org) I’m still using JAGS, mostly because I haven’t invested the time to learn STAN, but also because I heard a rumor that STAN doesn’t yet support the hidden data multi-state models.

    I have been pushing for broader adoption of Bayesian models. The ecologists don’t seem to have a problem with it, but many of remedial investigation and clean-up projects tend to get litigious. In fact, I’m planning on presenting on applications of Bayesian estimation in an upcoming meeting. I’ve been asked to address these two question, maybe y’all have some thoughts on good answers:

    1. “Sometimes it’s a lot more work (or at least different – i.e., less efficient for staff – work) to generate the Bayesian result, as compared to the numerically identical frequentist result. Note that I understand that the language surrounding the numerical answer differs between the frequentist and Bayesian approach, but that textual distinction was not a compelling reason to do the extra work, because “in the real world” it is often a distinction without a difference.” This person is referring specifically to the instance of not having any prior information.

    2. “In adversarial projects (like many of ours are), I think there are significant limitations on the use of prior information. What information to use, and what weight to give it, would be prime opportunities for criticism (and charges of client bias) by others. I can see how Bayes works well for searching for a missing fisherman, but Bayes would probably only work well in our more controversial projects if the conclusions were largely independent of assumptions regarding the use of prior information. Granted, those charges can probably be levied anyways (did you use the old data? If not, why not? And vice versa – as compared to Bayesian: what weight did you give the old data and why?). But it might be easier, politically, to justify omitting older data (in a frequentist approach) than to justify assigning it a non-zero weight (in a Bayesian prior analysis).”

    • 1. Often it’s easier, because often a (penalized) MLE estimate doesn’t exist. For instance, K-means clustering to take a machine-learningy example or hierarchical modeling to take one near and dear to statistician’s hearts. For frequentist inference, you need marginal maximum likelihood approaches, which in turn involve a lot of work to formulate.

      The frequentist MLE is only identical to the Bayesian posterior mean under very strong assumptions. And with point estimates, you get the issue of bias versus variance tradeoff. For example, sum of square differences from mean over N is the MLE for variance, but you need to divide by (N-1) to get an unbiased estimate. If you’re going to report uncertainty, you need to choose.

      With point estimates, the Bayesian posterior mean (minimizes expected square error) isn’t guaranteed to be the same as the posterior mode (minimizes expected absolute error).

      2. Use hierarchical models and put everything together into a coherent meta-analysis. If it makes the lawyers feel any better, you can tell them the doctors do it :-) If your court’s in the U.S. and you think anyone’s going to know baseball, ask them whether they’d accept a frequentist estimate that a new rookie baseball player is a 0.600 ability batter after watching him get 6 hits in his first 10 at bats (trot out the Efron and Morris paper on why you don’t want an MLE in this situation, even if you’re a frequentist, or go back to James and Stein). You’d probably want to apply some “common sense” prior that out of thousands of samples, nobody has batted over 0.400 in 50 years and the spread of batting averages is more like 0.175 to 0.340. If not, ask them if their estimate of a team’s win-loss record at the end of the season is a perfect season after a team wins its first three games.

      The usual proof for (2) is predictive accuracy. But if you don’t have a chance to do that and can’t even argue through something like cross-validation on a data set, then it really will just turn into a battle of rhetoric. For that, you need to ask a lawyer.

  6. On a similar note, I’ve read people (bloggers mostly) say that bayesian spam filtering is not truly Bayesian. Is there truth to this? I never really understood that critique.

    • What’s usually referred to as a “Naive Bayes classifier” makes use of Bayes rule but is fit using maximum likelihood. It’s quite common to add conjugate priors and fit the model using MAP — I’ve seen this called “Bayesian Naive Bayes” — but I’ve not seen anyone bother to compute a fully Bayesian posterior predictive distribution, not that it would be hard. Perhaps because the model is such a simplification to begin, it doesn’t seem worthwhile.

    • One problem is, I think, that naming conventions often confuse interpretations of probability with methodology for inference. People can have a frequentist (or similar, such as propensity) interpretation of a probability model (at least in some setups; I think that this characterizes Andrew’s approach well in some applications) but use Bayesian methods.

      Some people then may say that this is “really frequentist” and some others may think that therefore it is somehow not “truly” Bayesian, or at least that the former group of people has claimed that it isn’t “really” Bayes. As far as I’m concerned, it’s just a confusion to think that what is Bayesian can never be frequentist and the other way round.

  7. Bayesian methods have been used in both early stage drug discovery and late stage drug development processes. In early stage, it is one of the important machine learning tools for virtual screening (http://rd.springer.com/protocol/10.1007%2F978-1-60761-839-3_7) as well as for feature selections in biomarkers research. In late stage development, you can follow the link below on how Bayesian methods could be used to establish multivariate predictive distribution in design space reliability for drug manufacturing process. (http://www.pharmamanufacturing.com/articles/2010/097/). For marketing area, I know Google recently has open sourced their Bayesian time-series model which can measure ROI for their advertising campaign (http://google.github.io/CausalImpact/CausalImpact.html).

  8. Look at this company’s product line and list of customers. http://www.smartcorp.com/

    Describing their forecasting tools they say: “Interactive Managerial Adjustments let you refine forecast results on-screen, based on your domain knowledge and business judgment..”

    Sounds like Bayesian reasoning to me—but dressed up in words that will appeal to the corporate world.

    See also a recent patent application that is assigned to this firm: http://www.google.com/patents/US20130166350

    It has 17 instances of Bayes in the claims.

    You could use the website freepatentsonline.com to search for patents with words like “bayes” or “bayesian” in them.

  9. Applied conjoint models in marketing have gone almost 100% HB. Sawtooth is a vendor with a large footprint in that industry. There is a bunch of well documented literature on their website.

  10. I’m surprised no-one’s yet mentioned HP Autonomy (website Wikipedia page) given the blue neon sign outside their offices in Cambridge. Perhaps this blog has fewer UK-based readers than I’d previously imagined (see what I did there…).

    Autonomy’s founder Mike Lynch became a Fellow of the Royal Society earlier this year. His fellowship citation reads, in part (with some added punctuation):

    “Founder of Autonomy, the UK’s largest software company, which was sold to HP for $11Bn in 2011. Creator of the Bayesian framework and platform at the heart of Autonomy’s products…”

  11. In the past, I employed Bayesian methods to adjust sample sizes of subject populations (given stopping rules and when an endpoint was likely to have been made).

    In the financial risk sector, I employed them in evaluating capital requirements wrt operational risk.

    In the tech sector, I’ve used them to explore support interactions when cloud instance go wonky, or are about to do so. I’ve also used them in evaluating the sources whose dynamics can contribute to failure.

    I’ve used them in the Physics of non-linear dynamical systems (which may end up becoming part of my thesis – I’ll never tell…..never)

    I’m now using them in a geospatial application I’m building, and find myself edging toward stan for help (and surprisingly, performance).

    There’s a whole wide world out there……

  12. Most applications of “Naive” Bayes for classification (e.g., spam filtering) are not Bayesian. They are probabilistic models that are fit via maximum likelihood (sometimes with a penalty or Laplace correction). Because they are generative models, one must perform probabilistic inference to predict the class variable from the observables.

    In engineering, some folks confuse probabilistic modeling (including latent variables) with Bayesian statistical inference. Probabilistic graphical models are often called “Bayesian networks”. But I take “Bayesian” to mean that we represent all of our uncertainty probabilistically, including our uncertainty over the values of the model parameters and then we compute predictions by conditioning on data and then marginalizing out this uncertainty (at least approximately).

    • Ah, the idiot Bayes classifier. That’s a fun topic :).

      As far as Bayesian networks, I’ve actually never seen a graphical model rendered as something that is misconstrued as applied Bayes. Now, I have in the past taken a BBN approach to estimation of certain risks, when there are no data at that particular point in time (until it is collected and overwhelms||corrects any subjective priors). And I’ve presented these graphically (in fact wrote a GUI for its use in a process of emerging risk guidance in the absence of data). Would this be included in what you are referring to or are you speaking of some other network’ish graphs wrought from some other idea?

    • I don’t understand your distinction. Almost every Bayesian model I’ve seen used in practice can be represented as a graphical model. Uncertainty over model parameters can be represented in a graphical model representation by adding in nodes representing the hyperprior parameters. How is that confused?

    • I think us engineers do understand that, say, by plugging in MAP estimates we’re not doing true Bayesian inference but an approximation to it. We’d still refer to it as a “Bayesian model” (i.e. hierarchical / with priors) even if we’re not doing “fully Bayesian” inference on it, and I think that’s OK.

      If you restrict the term Bayesian to the strictest sense of the word, very few applied methods would qualify as Bayesian. The way I look at it is that at some level of a hierarchical model you have to stop trying to represent all your uncertainty, since every model is a straw man which you think is false with some positive probability. At that point you’re then left with frequentist model comparison and model checking methods. That doesn’t mean that the Bayesian statistics don’t deserve any credit for the method!

      • I think there’s a broad range of understanding in both industry and academia.

        The truly confusing issue is that if you can always translate “Bayesian” estimates using posterior modes (aka, “MAP”) into frequentist terms by removing the probability notation and calling the prior a “penalty.” Then you’re doing penalized maximum likelihood and you get the same answer. You can even do “empirical Bayes” (scare quotes because it’s no more empirical than full Bayes) through marginal maximum likelihood, as in the lme4 package in R (maximizes the hierarchical parameters in the marginal posterior with the low-level parameters integrated out).

        The techhnical distinction is that the frequentists prohibit distributions over parameters. As soon as you start talking about priors as such, you’ve left the frequentist camp philosophically. I agree with Tom Dietrich above that you don’t get to the full Bayesian camp until you reason by integrating out your posterior uncertainty.

        • > the frequentists prohibit distributions over parameters

          Agree, especially they will choose to do considerably more work even when the frequentist performance is actually not better or even worse…

        • This a semantic quagmire.

          At a broad level we are all Bayesians. After all, every model is a model because it imposes (structural / parametric) restrictions, presumably on the basis of prior knowledge. And most likelihood inference can be seen as a posterior with an implicit prior.

          At a practical level, as Bob argues, many people use Bayes as a regularization “kludge”, and any other kludge will do. Even so, the latter may be reconstrued as having an implicit prior and so as Bayesian kludge. So again we are all Bayesians (or not).

          At a formal level one might say that one uses bayes if the code includes priors specifically, posteriors etc.. but even here if the prior is simply specified as the implicit default in likelihood inference then you are just being more transparent.

          To me it is all a matter of degree. The more informative the prior, and the more explicitly Bayesian the inference, the more I’d say you are usign Bayesian methods.

  13. Bayesian-influenced machine learning models are used quite a lot in NLP, which in turn is used a fair bit in parts of the tech industry for data mining, information retrieval, text classification, customer profiling, stochastic tokenisation, part-of-speech tagging, language modelling etc etc.

    Definitely the empirical Bayes flavour — seen as one tool amongst many others (the others are often ad-hoc non-probabilistic models but can be hard-to-beat). Evaluated empirically, e.g. predictive performance on held-out data, or via end-to-end performance on some larger task which the fitted model is plugged into.

    When applying Bayesian methods to lots to data, there seems to be a preference (born out of necessity?) for fast-but-asymptotically-approximate inference algorithms — MAP via stochastic gradient descent, VB, EM, EP etc — over the MCMC methods more popular with academic Bayesians. Perhaps because the latter are more likely to be working on smaller-scale scientific datasets? Perhaps because faithfulness to a clearly-interpreted generative model is more important when doing science than when applying Bayesian methods as engineering tools?

    • In my experience, they’re not usually empirical Bayes, but rather simple MAP estimates. You very often see priors set by optimizing some kind of cross-validation or versus a held out “development” set as it’s called in NLP. Hal Daume III spent a fair bit of time on this issue in his talk here last week.

      The Wikipedia

      http://en.wikipedia.org/wiki/Empirical_Bayes_method

      has a nice description:

      Empirical Bayes methods are procedures for statistical inference in which the prior distribution is estimated from the data. This approach stands in contrast to standard Bayesian methods, for which the prior distribution is fixed before any data are observed. Despite this difference in perspective, empirical Bayes may be viewed as an approximation to a fully Bayesian treatment of a hierarchical model wherein the parameters at the highest level of the hierarchy are set to their most likely values, instead of being integrated out. Empirical Bayes, also known as maximum marginal likelihood,[1] represents one approach for setting hyperparameters.

  14. I have read in a couple of places that actuaries have been using bayesian methods for years.

    Metron are a company that use bayesian techniques for searching (Air France 447 etc.)

  15. Pingback: bayesian statistics in the private sector | Risky Business

  16. Pingback: Why we need more Bayesian trained data scientists than frequentist post COVID 19 ..

Leave a Reply to Bob Carpenter Cancel reply

Your email address will not be published. Required fields are marked *