Statistics controversies from the perspective of industrial statistics

We’ve had lots of discussions here and elsewhere online about fundamental flaws in statistics culture: the whole p-value thing, statistics used for confirmation rather than falsification, corruption of the pizzagate variety, soft corruption in which statistics is used in the service of country-club-style backslapping, junk science routinely getting the imprimatur of the National Academy of Sciences and National Public Radio, etc etc etc.

Or, to step back and talk about the statistics community: the way that we, as a profession, always seem at war with ourselves: the Bayesians and the anti-Bayesians, still battling after all these centuries, leaving researchers in other fields unsure of what to do (except for the econometricians, who are all too sure of themselves).

We’ve had lots of discussion in this space by psychologists and economists, some political scientists, some philosophers, and lots of academic statisticians.

But we haven’t heard so much from statisticians working in industry.

With that in mind, Ron Kenett sends along this article on controversies in statistics and their relevance for statistical practice . Kenett frames much of this discussion in the form of checklists, which relates to our discussions here and here.

P.S. Given the title of this post, you may ask yourself, What is industrial statistics? I’m not quite sure. Maybe Kenett can give a good definition. To me, some characteristics of industrial statistics are:

– Designed experiments (rather than the use of existing data, as is common in academic social science).

– A focus on costs and benefits (rather than on confirmation, refutation, or adjudication of theories, as is common in academic work).

– Research focused on particular upcoming decisions (rather than driven by past work, as is common in academia).

I’m not arguing here that industrial statistics is better than, or more “real” than, academic statistics. I think there are real benefits to resolving puzzles and working on hard problems from the literature. My point is that “industrial statistics,” whatever it is, has a different feel from much of what we see in textbooks, journals, and blog posts.

28 thoughts on “Statistics controversies from the perspective of industrial statistics

  1. Andrew posed a simple question: what is industrial statistics? my simple answer is lookat tthe TOC of my book with Shelley Zacks titled Modern Industrial Statistics with applications in R, MINITAB and JMP now in its 2nd edition and soon in an updated third edition https://www.wiley.com/en-us/Modern+Industrial+Statistics%3A+with+applications+in+R%2C+MINITAB+and+JMP%2C+2nd+Edition-p-9781118763698
    As Andrew sketched, the topics in covered in the book range from data analytics (including Bayesian decision making and bootstrapping) to acceptance sampling, statistical process control, design of experiments and reliability.
    Who are famous industrial statisticians? Well I guess the list is long, Gosset was certainly one, Walter Shewhart at Bell Laboratories, David Cox started as one, George Box worked in the chemical industry etc.. An active society in Industrial Statistics is the european Network for Business and Industrial Statistics (www.enbis.org) and there are Industrial Statistics divisions in ASA, INFORMS, IEEE and ASQ. For a review paper on challenges and reserach opportunities in Industrial Statistics see https://www.tandfonline.com/doi/abs/10.1080/08982112.2015.1100453?journalCode=lqen20.
    Modern industry is now embedded in the circular economy and sociotechnical systems. A new book I co-edited covers all this https://www.wiley.com/en-us/Systems+Engineering+in+the+Fourth+Industrial+Revolution%3A+Big+Data%2C+Novel+Technologies%2C+and+Modern+Systems+Engineering-p-9781119513926. It includes chapters on transdisciplinary engineering which means combining different disciplibes in applied work. A related blog on this posted by Andrew is available here (with a typo in my name…) https://statmodeling.stat.columbia.edu/2017/08/21/two-papers-ron-kennett-related-workflow/

  2. My eyes lit up when I saw this post! I work in an industrial setting, with heavy emphasis on DOE. Historically industrial DOE has been dominated by NHST and heavy emphasis on p-values. I’ve found it pretty hard to convince people to give bayesian inference a chance, but often I find myself lacking in the alternatives.

    For instance, power calculation, the idea of power calculation isn’t very central to bayesian inference, but literally the first step in frequentist DOE. One can argue we almost always overestimate power, but that’s something clients and stakeholders want to see. If I told them, “run the experiment for some reps, let’s analyze the data, and if we don’t find an effect, let’s keep on running till we run out of money”, they’d think I’m a charlatan.

    Mixture designs – there’s a ton of work been done on mixture designs in the frequentist world, but not so much in the bayesian setting. I’d love some references if there, but I haven’t come across much.

    When it comes to complex nested and crossed designs like the split plot, that’s where the rich theory of multilevel models comes in handy. We read your “why anova is more important than ever” paper last week, and my manager who swears by frequentist methods has asked me to compare a classical split plot analysis on one of our datasets, and a MLM analysis. My opportunity to convince how it’s so easy to get the structure right in split plots!

    • It is interesting to contrast the first edition of Box, Hunter, and Hunter: Statistics for Experimenters, published in 1978, with the second edition published in 2005. In the second edition, there is a Bayesian analysis of analysing a Plackett-Burman (PB) design since the usual method for regular factorials and fractional factorials, via normal probability plots or Lenth’s analysis, is not appropriate for PB experiments. This is important because while a regular fractional factorial of resolution III, for example, gives a full factorial in every set of 2 factors, for the PB you get a full factorial in every set of 3 factors. Unfortunately, Minitab still gives the inappropriate analysis.

      • To me, the first edition is better:
        1. The second edition omitted chapters such as chapter 16 on mechanistic model building which was a precursor to hybrid modeling and other types of models used in industry that combine empirical models with first principles knowledge.
        2. Box and Stu Hunter pushed Bill Hunter to third place in the authorship list (from second place). First time I saw that I could not believe my eyes…. Bill was the spirit behind BH2 and they should not have done that, especially after he passed away.

        • They are both masterpieces. I’m glad I have both.

          1. The treatment of mechanistic model building is abbreviated but still in the second edition.
          2. I’m sure they didn’t take the decision about authorship order lightly. The second edition is dedicated to the memory of Bill Hunter and the first para of the preface makes that clear: “In rewriting this book , we have deeply felt the loss of our dear friend and colleague Bill Hunter (William G. Hunter, 1937-1986). We miss his ideas, his counsel, and his cheerful encouragement. We believe, however, that his spirit has been with us and that he would be pleased with what we have done.”

    • For instance, power calculation, the idea of power calculation isn’t very central to bayesian inference, but literally the first step in frequentist DOE.

      I agree with Fisher, power analysis is the nonsensical product of confused minds:

      The phrase “Errors of the second kind”, although apparently only a harmless piece of tech-nical jargon, is useful as indicating the type of mental confusion in which it was coined.
      […]
      The frequency of the second kind must depend not only on the frequency with which rival hypotheses are in fact true, but also greatly on how closely they resemble the null hypothesis. Such errors are therefore incalculable both in frequency and in magnitude merely from the specification of the null hypothesis, and would never have come into consideration in the theory only of tests of significance, had the logic of such tests not been confused with that of acceptance procedures.

      https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/j.2517-6161.1955.tb00180.x

  3. One of the big differences between industrial and academic research statistics is that you work in bulk–There is typically not one big questions but a series of small unimportant questions, with small variations, that add up. So someone cares, usually a standards setting authority, if vendor A of input A is the same as vendor B of input A even when A is a published formulation. There could be a source of variation in packaging, shipping, etc. but people are confident that a null difference of 0 is true. I know that Andrew likes to point out that nothing has a 0 effect but in this case it might approach it. It is not a situation where people are looking to reject the null like in social or clinical research.

    I am in a situation now where the engineers asked me to do Bayesian statistics but didn’t call it that. I asked them if they were comfortable weighing new results based on what they saw in previous observational data. They want to do it because this is a situation where the variances and differences are far from 0.

    • Another big difference is industry statistics puts more emphasis on the quality of the measurements being used. Often when I read this blog I think that measurement systems analysis tools such as gage R and R could be applied to address some of the issues with measurement error.

  4. Another book on industrial statistics (but that might be overlooked because it doesn’t have “Industrial statistics” in the title) is Statistical Design and Analysis of Experiments, by Peter W. M. John, Classics in Applied Mathematics, Book 22, Society for Industrial and Applied Mathematics, 1998. Some excerpts from the Preface:

    “In 1971 [when the first edition of the book was published by MacMillan], the vast majority of applications of analysis of variance and design of experiments were still in agriculture. There had been relatively little impact in industry in the United States and Europe outside Imperial Chemical Industries Ltd, in Great Britain, where George Box and his colleagues did pioneering work, and in those oil and chemical companies whose members had learned about 2^n factorials and response surfaces from Box, often at Gordon Research Conferences. ..
    ,,, But in the 1980’s, the growth of interest in experimental design accelerated as the emerging semiconductor industry became conscious of Japan’s industrial success. Much of that success was attributed to the use of statistics that William Edwards Deming had advocated in Japan after World War II and, in particular, to the resulting emphasis on designed experiments. The Japanese engineer, G. Taguchi, who had learned about 2^n designs from Mahalanobis in India, introduced to the semiconductor industry his methods for small complete factorials and orthogonal fractions. He was primarily concerned with main effects and paid little attention to such complications as interactions. Tagushi’s method reduces design of experiments to a relative simple drill, to which some companies require strict adherence. That has some merit, as long as it works, but it does not lead to a very deep understanding. However, Tagushi went further. He made us think not just about experiments to find conditions that optimize yield, but also about experiments to control the variability of a manufacturing process….
    …Engineers came to realize that they no longer had to associate designed experiments solely with the large complete factorial experiments, planted in the spring and gathered in the fall, that were the tradition of the agronomists. They could think of series of small experiments: running an exploratory experiment and then, after seeing the results, running further points to answer the questions that the analysis of the first experiment raised. …”

      • Martha – thank you for sharing this. Fascinating account. You (he) mention Taguchi. I was at Bell Labs in the summer of 1981 when Taguchi showed up for a series of 4 seminars at Holmdel. Madhav Phadke was the appointed host and only about a dozen attendees participates in these. Following each seminar Madhav would call the attendees to get their notes on what was said. Frankly it was quite incomprehensible, both because of the poor language and the breakthrough idea. Taguchi had learned about DOE at Bell Labs (an other places) several years earlier and transformed their applications in industry. He came back to Bell Labs to “pay a debt” he felt he owed to where he originally learned about DOE. Taguchi was effective in the Japanese and later, in Western industry. Unfortunately, some of the statisticians in leading positions at the time, criticized his methods on theoretical grounds (some highly justified). The result was that industry who found Taguchi’s ideas helpful, positioned (some) statistics as focused on nitpicking and anti-progress. This seems to have repeated itself in other occasions and instead of being at the forefront of progress in analytics, statistics was often found fighting regroupment battles in the back. This accentuated with the advent pf big data and computer age analytics (with several notable exceptions, many from Stanford). An interview on this appeared in https://www.statisticsviews.com/details/feature/4812131/For-survival-statistics-as-a-profession-needs-to-provide-added-value-to-fellow-s.html

  5. A main difference is that in an industrial setting, we usually get:
    – easier generalization (it’s at least more reasonable to transfer results from… say one machine in plant A to all machines in plant B)
    – easier (cross)validation: If claim your model predicts something, checking is rather easy (just run another batch of samples on your test/production rig)
    – there’s possibly some robust theory that explains what might and might not be of importance and how things relate to each other (think of your golf putting example)

    I don’t want to overpraise, but I believe this explains why simple methods (e.g. p-value-driven decision-making) is such a success in this setting. I work in the automotive industry and in our standard DOE training, people are split into teams each optimizing a mechanical slingshop (modify firing angle and spring strength). In such a setting, you get a very direct feedback on how your methods work or don’t work. Additionally, in such a setting nonlinear models and hierarchical methods are still a huge benefit!

    Cheers, Daniel

    • An addition regarding the whole coronavirus thing:

      There are cases where epidemiological-style models are used, for example when production problems where noticed to late or field-quality problems arise. In such cases, uncertainty is abundant (both regarding theory and causal effects and data quality) and methods are built to reflect this. A major advantage in the industry setting: Every now and then, there’s the chance to validate models with real-world data.
      A made-up example: Say you discover a flaw in the mixing process in your joghurt factory and your manager decides, based on your predictions of 0.1 .. 6 % of yoghurts turning sour before their due date (your model includes e.g. assumptions on how many will be eaten earlier) and the fact that no harm for humans exists (!), no recall of all yogurths is done. It later turnes out that ca. 0.5 % of yoghurt consumers actually complained and your customer service sends a nice package of fancy yoghurt spoons to each of them. At this point, both you and your managers have a nice calibration for your risk models and increased trust for further use cases. (you still have to deal with the discrepancy between complaining and non-complaining but upset customers in this toy example)

      Having seen some of these cases has generally given me an increased trust in modelling assumptions and the large uncertainty intervals we currently see with Covid-19. Interestingly, many of these assumptions are rule-of-thumb-based and not fitted within the modelling environment and I wouldn’t consider this a bad thing as long as assumptions are made transparent.

      Cheers, Daniel

      • Daniel – thank you for your comments. We definitely concur. Specifically I wrote: “In reviewing studies done in Industry 4.0 topics, one finds data collected actively or passively, models developed with empirical methods, first principles or as hybrid models. Industry, as opposed to science, is less concerned with reproducibility of results, but it should. The industrial cycle provides short term opportunities to try out new products of new process set ups and, based on the results, determine follow up actions. Deriving misleading conclusions can be however very costly and time consuming.”

        What is however around the corner, is the significant impact of analytics on industry through sensor technology and manufacturing flexible systems. A wide angle perspective on all this is provided in https://www.wiley.com/en-us/Systems+Engineering+in+the+Fourth+Industrial+Revolution%3A+Big+Data%2C+Novel+Technologies%2C+and+Modern+Systems+Engineering-p-9781119513926..

        In that context, the BH2 FOE approach will soon be outdated. Bh2 assumes you start a discussion from scratch with the domain expert and, in collaboration, design an experiment. These days the domain experts come with memory sticks storing gigabytes of data and your starting point is different. In addition you need to develop complex control systems, way beyond the classical control charts proposed by Shewhart in the 1920s. In addition, data from imaging, vibration tracking, on line measurement (what is called PAT in pharma) is now being integrated and fed to prognostic models used in decision making support systems.

        All this poses new challenges but also opens up new opportunities for industrial statistics.

  6. I’ve been working in industrial statistics for more than 10 years.

    The current big problem I see is that established statistical methods (e.g. Design of Experiments) that have huge value for industry are still not used to anything like the extent that they should be – there is still so much waste in experimental work to develop processes and products.

    And at the same time there is much hype about Big Data, AI, Machine Learning. And this hype has the attention of management.

    For sure, AI/ML have potential for value in industrial applications. But they do not replace good science and appropriate supporting statistical methods.

    Unfortunately “good science” and “established methods” are just not exciting enough.

    Cheers, Phil

  7. To supplement your bullets
    -more likely to engage in causal inference (to steal from your title)
    – more likely to analyse and identify processes
    -more likely to be outcomes focused and druven
    – more likely to evaluate multiple scenarios

  8. Thanks for this! I agree that DOE is important, but don’t forget statistical process control (SPC), classically done with Shewhart control charts. The issue, as I understand it, is important somewhat simple. Take importance: if you have a process that produces a product with a normally-distributed result (weight of product, …), and you want to maximize the output (more weight out for the same products in), you probably want to focus on the process, trying to reduce the variation in the distribution or raising its mean. Detailed studies of why last Wednesday’s results were at a yearly low may not make a lot of sense if the results are really all from the same normal distribution.

    On the other hand, if last Wednesday’s low results were low because the process changed–because the results are now a mixture of two different processes that produced draws from two different normal distributions, then it’s possible that you have mixture data. In that case, improving the predominant process performance likely has little effect; you really want to reduce the probability of errant process B showing up.

    That, I think, is a reasonable description of Walter Shewhart’s common cause and special cause variation. He worked with the Hawthorne Works, and there were more problems than process engineers. Shewhart wanted a way for workers themselves to determine what type of variation they observed and thus how they should intervene, keeping the process engineering population from becoming the constraint on improvement.

    By my understanding of Economic Control of Quality of Manufactured Product (https://archive.org/details/in.ernet.dli.2015.150272/page/n59/mode/2up), he developed his own methodology for setting “control limits” that was more of an economic decision-theoretic approach than a NHST, although I’m pretty sure I’ve heard some couch it in terms of looking for variation in excess of 3 sigma.

    I’ve read some of Donald Wheeler’s material, and I’ve seen other approaches, but I’m curious as to whether there are better approaches today that meet Shewhart’s original goals of reducing the economic cost of a process and enabling it to be carried out (or at least understood) largely by the people doing the work, that enable the approach to be tailored quickly, easily, and transparently to a range of processes and environments, and that produce credible, useful results. Is Wheeler ‘s approach the best in that regard? For those of us who automate, is https://cran.r-project.org/web/packages/qicharts2/vignettes/qicharts2.html a good approach? Yes, that mentions 3 sigma, but my understanding of Shewhart is that he picked a threshold based on industrial experience at the Hawthorne Works that suggested a limit that maximized the economic return, given the probabilities of misclassifying a process as exhibiting common cause or special cause variation, not because of anything magical about 3 sigma.

    • Bill – you opened the door to a very interesting point. After joining the Madison faculty in 1978 I was asked to teach a course on DOE using the brand new BH2 book. The first edition mentions control charts. I discussed with George Box this topic and wanted to understand what was special about this apparently trivial sequence of hypothesis tests. Box’s comment was that it is more complex than it seems, without being specific. Eventually I got it.

      Shewhart’s contribution has a philosophical root. Statistics is about modeling the process generating the data. If the model does not fit, you update the model.

      Control charts are different. The first phase is a process capability analysis phase (I hate the Phase 1 terminology).. Following that, the control limits are set to reflect the established process capability. You than start applying the control charts to monitor a process (I also hate the Phase 2 terminology).

      If in the monitoring phase, an out of control signal is detected, you stop the process and put it back under control. In other words you change the data generation process to fit the in control model. This is the opposite of typical statistical modeling.

      The in control process is an abstract construct. In situation where the process can be reset, when this happens, you change the way the data is produced.

      In 1983, Moshe Pollak and me where among the first to distinguish between processes that can be reset from processes where this is not possible. Examples of this are surveillance systems. We discussed the tracking of generic malformations. The alarm triggers an investigation but does not reset the process. In that case the probabiity of false alarms accounts for subsequent aarms, following the first one. See https://amstat.tandfonline.com/doi/abs/10.1080/01621459.1983.10477982#.XrBw8qhLhME

  9. Ron
    Great article. Having spent over 30 years wrestling with applying industrial statistics in the transactional service sectors there are 3 points I have to constantly repeat.
    1. Always check the Distribution first. It’s almost never normal. If it’s normal it’s usually an unmanaged process.
    2. If it is managed they need to be fired. Any manager would be trying to shift the distribution to the left.
    3. See number one above. Then choose the correct test for conducting your analysis.
    Thanks !

Leave a Reply to Phil Kay Cancel reply

Your email address will not be published. Required fields are marked *