“Statistics: A Life Cycle View”

This article from Ron Kenett is a few years old but is still relevant:

Statistics has gained a reputation as being focused only on data collection and data analysis. This paper is about an expanded view of the role of statistics in research, business, industry and service organizations. . . . a “life cycle view” consisting of: 1) Problem elicitation, 2) Goal formulation, 3) Data collection, 4) Data analysis, 5) Formulation of findings, 6) Operationalization of findings, 7) Communication and 8) Impact assessment. These 8 phases are conducted with internal iterations that combine the inductive-deductive learning process . . . The envisaged overall approach is that applied statistics needs to involve a trilogy combining: 1) a life cycle view, 2) an analysis of impact and 3) an assessment of the quality of the generated information and knowledge. . . .

It can be hard to write, and to read, this sort of article, as advice about problem elicitation, goal formulation, etc., can sound so vague compared to harder-edged topics such as optimization, computing, and probability theory. But all these things are important, and I think it does help to think them through, in specific examples and more generally.

Statistics is a branch of engineering.

11 thoughts on ““Statistics: A Life Cycle View”

      • Maybe tangential it related.

        I’ve taught classes comprising graduate students from a mixture of physics, mathematics, biology, chemistry, engineering…

        Inevitably, each student insists that the other fields are subdivisions of their own field.

        • Andrew said, “Statistics uses mathematics but I don’t think it’s a branch of mathematics. Similarly, electrical engineering uses physics but it’s not a branch of physics.”

          Joshua replied, “I’ve taught classes comprising graduate students from a mixture of physics, mathematics, biology, chemistry, engineering…Inevitably, each student insists that the other fields are subdivisions of their own field.”

          Joshua’s comment is quite striking. I’m a mathematician, but I don’t consider physics, biology, chemistry, engineering (or statistics, for that matter) to be subdivisions of math. Each of them does often use math — but that doesn’t mean they are subdivisions of math. The way I see it is that many topics in math are applicable to many other subjects — but that doesn’t mean that those other subjects are subdivisions of math.

          The fact of the matter is that statistical methods are often (but not always) applicable to a wide variety of subjects. One experience that sticks in my mind was a Ph.D. student in Engineering who took two of the graduate stats courses I taught — Regression, and Analysis of Variance. He was working in a group that was trying to figure out how to make robots from off-the-shelf components, since doing this would be less expensive than making everything from scratch. His job in the group was to figure out how to test the robots made with off-the-shelf parts. In searching the literature, he had had come across the quality assurance literature from the automobile industry. He initially did not have enough statistics background to read that literature, but after taking the two course for me, he understood the concepts involved and could read that literature and adapt the methods to the testing he was trying to do. At his Ph.D. defense, his advisor said that the two (quite routine) courses the student took from me were the key to being able to use the quality assurance methods developed in evaluating automobiles to his group’s need to study the quality of robots.

    • Mathematics is just about discerning the implications of abstractions made by us, while statistics is about discerning how to act in the world to prosper (or at least survive).

      Discerning how to act in the world requires abstract fake worlds contrived by us to hopefully anticipate what would repeatedly happen if we acted this way rather that that way in our world.

      Hence the need for some form of math to discern what would repeatedly happen in those abstract fake worlds.

      But math is just the bridging tool, essential but not the end.

      • Lindley would say that statistics is the study of uncertainty. Decision analysis would be part or it – even though anything may be recast as a decision problem (algebra is about how to act when confronted with an equation).

        I wouldn’t say that what would repeteadly happen in some sense in some abstract fake worlds is necessarily essential, let alone the end.

        • > what would repeteadly happen in some sense in some abstract fake worlds is necessarily essential
          OK, say one is taking a posterior to quantify the uncertainty for a decision analysis, what would be repeated observed by drawing parameters from the posterior and then fake data from the data generating model given those parameters is just a way (a medium) for concretely tracing out the implications of the posterior for the decision being made.

        • You may want to do that but you also may not.

          For example, maybe you’d like to find out where a plane – or a submarine – has been lost. You get some probability distribution for its location from the analysis of the data available. You update that probability distribution based on the things that are found – or not found – while you look for the wreckage. You use that posterior probability to guide the search and decide which areas to inspect – or reinspect – making the best possible use of the resources. You write articles and books describing the whole thing. However, you don’t discuss there the idea of sampling from the posterior to fake data using the data generation process – that’s not an essential part of what you’re doing.

  1. Andrew – Thank you for reposting. With AI facing challenges of interpretability and generalizability, a life cycle view is becoming even more important. In the engineering context there is an interesting paradigm shift from engineering the design to engineering the performance, The later is about a life cycle view of engineered systems. https://www.dropbox.com/s/snl5rj2wm3dqa7e/Kenett%20on%20the%20future%20of%20engineering.pdf?dl=0

    This requires a move to a playground much wider than the mathematics one…

  2. “In the engineering context there is an interesting paradigm shift from engineering the design to engineering the performance”

    After paging through the dropbox presentation, I guess I am having trouble seeing anything truly new. I think performance engineering is now much easier to perform with IT tools and so can be done cheaply enough on more designs, but aircraft manufacturers have been doing it for a long time and developed the basic tools a long time ago. Perhaps train components fall in the area where it is now economically feasible to monitor performance using new tools.

    I am not fond of the term “digital twin,” because I cannot find any substantive difference between that concept and what we called “the math model.” For complex systems, the math model rarely converges on a single solution anyway, and so requires a complementary test stand. Maybe there has been recent progress in this area.

    Statisticians and some “modelers” seem to love Box’s “all models are wrong, but some are useful” quote, but I would be shocked to hear those words come out of the mouth of a modeling engineer. I guess it is really meant as an expression of humility, and perhaps I should leave it at that, but the concept of a model being wrong does not make any sense. If your model has low fidelity, you work to improve the fidelity. At no point is the model either right or wrong.

    • Digital twins combine on line data from surveys with analytic models. This enables the 4 capabilities of monitoring, diagnostics, prognostic and optimization in a way to seen before. For example, it enables condition based maintenance (CBM). These opportunities and challenges are new in application and research.

      If you followed my slides you will notice a quote by Karlin on models used to sharpen the question. This applies to the problem elicitation phase. The quote of Box that all models but some are useful is aimed at the data analysis part, for example in the evaluation of overfitting. Overfitting impinges on generalizability. Models that are less generalizable are also less useful….

Leave a Reply

Your email address will not be published. Required fields are marked *