Theoretical Statistics is the Theory of Applied Statistics: Two perspectives

After watching my video, Theoretical Statistics is the Theory of Applied Statistics: How to Think About What We Do, Ron Kenett points us to these articles:

Conceptual Thinking in Statistics and Data Science Education: Interactive Formative Assessment with Meaning Equivalence Reusable Learning Objects (MERLO):

Computer age statistics, machine learning, data science and in general, data analytics, are having an ubiquitous impact on industry, business and services. Deploying a data transformation strategy requires a workforce which is up to the job in terms of knowledge, experience and capabilities. The application of analytics needs to address organizational needs, invoke proper methods, build on adequate infrastructures and ensure availability of the right skills to the right people. Such upskilling requires a focus on conceptual understanding affecting both the pedagogical approach and the complementary learning assessment tools, This paper is about the application of advanced educational concepts to the teaching and evaluation of statistical and data science related concepts. Two educational elements will be included in the discussion: i) the use of simulations to facilitate problem based experiential learning and ii) an emphasis on information quality, as the overall objective of statistics and data science activity. . . .

A Note on the Theory of Applied Statistics

This note is a sketch of what could be the basis for a theory of applied statistics. Such a theory is needed to help statistics become more relevant, with significant impact and innovative developments. To achieve this goal, statisticians need to get involved in new activities within the research, application and pedagogical areas. In particular social networks and new data structures require new statistical models. Eliciting the components of a theory of applied statistics is a prerequisite to a review of the training and education of statisticians who want to have an impact on the organizations they work in and society in general.

These are two topics that interest me; here I’ll focus on the second one. I’m curious what Kenett said about the theory of applied statistics. It turns out there’s not so much overlap between what he talks about and what I talk about! Statistics is a big field. In his article, Kenett mentions:

– “Extract-Transform-Load which is used to integrate data bases, a prerequisite to the implementation of many analytic tools”

– “Predictive Analytics”: In statistics terminology, this is what we would call causal inference. An advantage of the term “predictive analytics” is that it highlights the aspect of causal inference that is prediction of potential outcomes.

– “Statistical models and the limits of statistics”: It’s a challenge in teaching to talk about the limits of statistics. For example, statistics textbooks often tell stories about nonrandom samples or imbalanced observational studies that give bad inferences, but that just sends the nihilistic message to give up if your data aren’t perfect, a message that is then ignored in the rest of the statistics textbook where students are just told to compute averages, differences, regressions, etc., on available data. Teaching the limits of statistics can’t just be saying what statistics can’t do; we need to discuss partial solutions. Taking a model too seriously is really just another way of not taking it seriously at all.

– “Mathematical statistics and applied statistics”: I do discuss this a bit.

– “Practical statistical efficiency: Are we having an impact?”, “Information quality: Are we generating knowledge?”: These are important, and I’ve not thought about them in any formal way.

– “Why statisticians need to know something about management consulting”: I don’t know about that! I mean, sure, everyone would benefit from knowing more about everything. Let me say that it’s probably a good idea for some statisticians to know something about management consulting, good for some statisticians to know something about medicine, good for some statisticians to know something about physics, etc.

6 thoughts on “Theoretical Statistics is the Theory of Applied Statistics: Two perspectives

  1. “Let me say that it’s probably a good idea for some statisticians to know something about management consulting, good for some statisticians to know something about medicine, good for some statisticians to know something about physics, etc.”

    As an economist who made a career helping colleagues with statistical problems (among other things) and who often worked with “real” statisticians on a number of problems, it is my experience that statisticians who don’t know almost as much about the problem under consideration as the economists is highly compromised in his or her ability to give advice that will be heeded. This is of course a bit of an overgeneralization, but I found (and it’s probably self-serving of me to say so!) that the payoff of subject matter experts diving deep into statistics is much higher than the payoff of statistical experts diving deeply into the subject matter. But of course in the absence of a subject matter expert willing to do so, a statistician who makes an effort to understand the underlying problem as something other than a “statistics problem” is clearly an improvement over falling back to the ten statistical tricks you learned in grad school…

    • My impression is that it’s very important to have a proper idea about what, as a statistician, you need to know and what you don’t need to know. Also asking the right questions is a key skill. I haven’t collaborated much with economists, more with biologists and in a wide mix of not-so-much-talked-about application areas such as archaeology and music science, however it was pretty clear in all projects that both the subject matter experts and the statisticians need to acknowledge their limits and something good can only come out of the right kind of communication, with the statistician required to understand in what way subject matter knowledge has an impact on the choices made in the data analysis and the resulting interpretations and the subject matter expert required to provide the information, or to clarify why it is not available, and whether any surrogate information can be used. An important issue collaboration is that people come together who have different and complementary views. More important than saying “the statistician should know more about the subject matter” or “the subject matter expert should know more about statistics” is to make sure the communication between them works in the way that their different views and knowledge can come together in a productive way.

      This requires, in my view, a very subtle mix of respect and acknowledgement of the relevance of the expertise “on the other side”, but also skepticism when such knowledge tempts an expert into making too far reaching assumptions on issues that involve “the other side” of the expertise. There are many data analysts around who interpret their results as if the data on their own could speak without reference to the meaning and background of the data, and many subject matter experts who are all too easily convinced that whatever statistical analysis (with however many forking parts etc.) that confirms their priuor belief (or hope) has to be true.

      • Nicely put “communication between them works in the way that their different views and knowledge can come together in a productive way”.

        Especially given a little bit of knowledge can be dangerous and we all have limited energy and time to learn other things.

        In my comment on what I learned in MBA school was in a large put about working in groups, getting a sense of what others did and did not know, what mattered to them and their tolerance to thinking differently and managing group projects. Trying to understand how others understand in general terms.

  2. My own impression is that the vast majority of statistics research isn’t about theoretical statistics at all. By “theoretical statistics” I mean the theory of statistical estimation. I’d love to see someone introduce a new kind of information useful for parameter estimation, or construct a new method that should be impossible, or extend the theoretical framework for estimation beyond the axioms of probability. We’ve basically decided that we know everything that’s fundamental, which makes the study of statistics itself very boring. Contrast that with fields where there are questions that are simple enough to explain to a child and yet seemingly unanswerable given the current grasp of knowledge.

  3. I think knowing something about management consulting is more important outside of academia such as a research institute or a business.

    Personally, I have used as much or more of what I learned at MBA school in my career than statistics courses. Though some of what I learned in statistics courses was absolutely essential – and unlike in your talk Andrew the same sort of thing you said about exchangeability of math and computation would not apply.

    Liked the comments – simulation is not scalable and computation is (or is becoming) more important that the math.

  4. Andew: I do distinguish predictive analytics from causality. The recent drive for explainable AI indicates that predictive models do not necessarily provide causal explanations. Causality requires more analysis than simple predictive analytics. Estimating treatment effect and accounting for potential outcomes is more than most predictive analysis modelers do.

    Jonathan (another one): I believe that, for statisticians, exposure to management consulting is of higher importance than the alternatives you list. If you work with clinicians, you should be conversant in their terminology but you do not need to become a physician. Management consulting will give you a life cycle perspective in your work as a statistician and get you to consider, for example, data analysis workflows and adequately address chronology of data and goal (i.e. providing an adequate outcome when it is needed). It will also give you tools for communicating results to different audiences, in different forms. Statisticians should acknowledge this. Deming did, and became a world class management consultant which helped him push forward industrial statistics methods. In 2021, we see new opportunities with sensor data, flexible manufacturing and computer capabilities making accessible monitoring, diagnostic, prognostic and prescriptive analytics. The deployment of these capabilities requires management consulting change management experience. To help close the gap between industry and academia, research initiatives should provide methods for doing this efficiently and effectively. Statisticians need to aim at generating information quality. and Methods for doing that are needed.

Leave a Reply to Ron Kenett Cancel reply

Your email address will not be published. Required fields are marked *