Not just empirically but scientifically

This came up in a recent post:

We should be understanding them [prediction markets] empirically–or, I should say, scientifically, combining empirics and theory, since neither alone will do the job–, not just blindly following pro- or anti-market ideology. As Rajiv says, the markets are out there already, and we can learn from them.

What I want to highlight here is not the topic of prediction markets, interesting as it is, but rather the distinction between “empirically” and “scientifically.” We’ll typically talk about empirical evaluation and empirical understanding, but the data just about never stand alone: you need a model to tie them to the underlying questions you’re asking.

So I’d prefer to talk about evaluating and understanding systems “scientifically” rather than “empirically.”

This is not just words! I think that framing the problem as scientific rather than empirical can also change how we do things, by putting some of the focus on theory. So much of the discussion of the replication crisis is about data, but consider all the problems that arise from absent or defective theory.

7 thoughts on “Not just empirically but scientifically

  1. One book I like on this topic is _Theory and Credibility_ by Scott Ashworth, Christopher Berry, and Ethan Bueno de Mesquita. They emphasize the link between “all-else-equal” comparisons that are the bread and butter both of formal theory (e.g., comparative statistics) and of causal inference (e.g., random assignment, counterfactual imputation). A key point is that even absent issues of confounding, causal estimands do not necessarily correspond to the all-else-equal comparisons that are theoretically relevant. The upshot is that you need theory to interpret empirical patterns, even in studies with high internal validity. There’s lots of other great stuff in there: what makes for a good model, what a “mechanism” is, how theory can help us reason about external validity, and so on.

    I haven’t seen this book mentioned on the blog, but it fits nicely with the topic of understanding systems “scientifically” vs. “empirically.”

  2. I agree with your emphasis on the scientific approach of proposing and using models and then testing them by gathering data to compare the predictions of the model with the data results.

    My view of the value of empiricism is the emphasis on not using terms like “it feels like …”, “it seems like …”, “in my opinion it is…”, “peoples say …” and focusing on phrases like “I observed that X happened and it fits a model or framework that predicts X’ which matches X quantitatively in these respects…”.

    Quantum physics is an excellent example of how the way that scientific models work can make amazingly accurate predictions about experience while having almost no palpable relation to our ordinary daily sensory experiences on our earth in our lives.

    Whittaker in his two volume series, “Theories of Aether and Electricity” showed how 18th and 19th century physicists made excruciatingly detailed ‘mechanical’ models to explain measured phenomena in mechanics and electromagnetism which explained and predicted a lot, but not enough. More and more as time evolved in the 19th century progress was made. The key addition to model making came in 1905 with Einstein’s paper on special relativity which changed our conception of the model for space and time. This combined with the Schroedinger equation, spin model for the electron and the Pauli exclusion principle gave us the key to unlock access to modern technology. All in the 22 years after 1905. Boom – the periodic table, chemical bonds, crystal theory, nuclear energy.

    The point here is that the four model items I just mentioned, can’t be easily conceptualized by most folks. I can’t, I have to work every day to keep it in my head.

    As far as the standard model theory of quantum physics goes: it’s very successful and useful, has a lot of parameters that have no theoretical basis but are effective in making the model work, is incomprehensible from a daily life point of view, and is clearly just a stage in development.

    What does this have to do with social sciences and understanding? A lot. We need a constant stream of models to test. We need data sets that are commonly agreed to have the information power to distinguish between models. We need statistical evaluation methods to examine informative data sets and compare them with model predictions.

    The social sciences appear not to have the symmetry properties that physics systems have in their interactions. Using symmetry properties of the identical interacting elements in physical models allow us to simplify and factor the models to enable experiments to distinguish between models. Even so, the amount of data collected to be able to assert the existence of the Higgs particle boggles the mind.

    Rigor in obtaining informative data sets to evaluate well defined models is not to be denied. They go together like chickens and eggs.

    As far as the theory of human interactions in the small and in the large goes: there is none…

  3. All scientific studies are empirical—if there’s no data, it’s just math. I read Andrew’s post as implying that not all empirical studies are scientific. The problem is that it’s very hard to specify what counts as theory.

    Do we count the attention mechanism in large language models as a theory of language? It’s not much, but it’s better theory for language processing than most of what we saw in the 60 years of linguistic research following Chomsky’s Aspects of the Theory of Syntax.

    • This comment is also related to your other post relating GLM’s and DNN’s. While “theory” can mean different things in different contexts, I think that a useful perspective in statistical applications is to treat “theory” as whatever enables effective out-of-sample generalization.

      As you point out in your other post, GLM’s and DNN’s are ultimately machines for relating predictors to outcomes. Regardless of which general function approximator you use, when the data and/or model becomes sufficiently complex, it cannot necessarily be relied on to correctly extrapolate beyond its original domain. And as Andrew points out, in these kinds of complex scenarios, almost any out-of-sample prediction is a form of extrapolation.

      In those cases, “theory” can take a few forms. It can take the form of a partitioning of variability into a multilevel structure based on some knowledge of how the data were produced. It could be a choice of functional form or predictors that is informed by some causal model of the processes involved. It could also be represented by informed priors on parameters.

      Of course, one could use any of the features I described in the last paragraph for technical reasons, as with regularization. But when the use of those features is motivated by an understanding of processes or structures involved in how the data were generated, then I think that counts as “theory”.

    • “The problem is that it’s very hard to specify what counts as theory. ”

      I agree that there is a continuum from outlandish speculation to hypothesis to theory, but I don’t think it’s fair to say it’s hard to specify what counts as theory.

      A “theory” in the breifest terms, is a set of predicted relationships supported by a substantial set of indisputable observations.

      The theories of evolution and plate tectonics emerged as concepts supported by thousands of indisputable observations. Evolution could not have emerged without a taxonomic system to classify life. Similarly, plate tectonics was initially underlain by a reliable rock classification system, a basic understanding of the principles of stratigraphy and fuzzy but workable concepts of how igneous rocks form.

      I know little about linguistics. It seems like it’s miles ahead of the rest of the social science. However social sciences as a whole as it’s related on this blog doesn’t seem to have anything even close to the simple fundamental classification systems that underlie evolution and tectonics. In fact, social science categories are frequently based on concepts that everyone knows are unreliable – e.g., “race” and *especially* race as self-identified. Even historians recognize that “race” as it’s used in any one moment is a highly subjective term and that “races” come and go over time. It’s not possible to construct reliable theories from unreliable classifications. I feel like Andrew and other social scientist quietly understand this but refuse to openly recognize it because they hope or believe that it can be overcome with more powerful models. No. It can’t.

      The term “theory” as Andrew frequently uses it here means something akin to “rational speculation” – a speculation that is supported by some rationally concievable relationship between phenomena. It is used here to oppose “irrational speculation” so common in social sciences, like suicides rising on down stock market days.

      M view is biased but much of what comes to my attention in social sciences is little more than activism supported by improbable speculation aimed at overturning some widely recognized indisputable knowledge. For example, the effort to overturn the idea that people are generally economically rational. That they are is indisputable. Or that the min wage is a highly beneficial way to increase people’s income/wealth. That it’s no such thing is indisputable, even from a cursory glance at the last five decades. No amount of statistical bullshit will undo this simple and clear observation. Yet a nobel-prize-winning economist relentlessly flogs both of these “theories”.

      Trying to claim that the sun is cold won’t lead to even a simple testable hypothesis much less a “theory”. And yes, that’s an analogy, don’t take it literally, I realize that no one in social sciences is literally claiming the sun is cold. I’m sorry to have to add this disclaimer but such is the state of academia today.

      • I don’t think your comment here does much to separate models from non-models. You deride the notion that suicides might rise on market down days as a model, calling it “irrational specuation.” I have no idea whether it’s correct or not, or whether the data support it, but it’s a model of human behavior; to wit:
        down market days cost some people money
        losing money makes some people depressed
        the depressed are more likely to commit suicide
        This is a model; like all models, it may have problems with describing actual human behavior and the data may or may not support it. But it’s clearly a model and suggests many lines of potential quantification and verification.
        The alternative model-less (but nonetheless empirical) inquiry is:
        I have data on suicides and a host of contemporaneous indicia. Put them all in the regression blender and see what falls out. If the coefficient on delta stock market happens to have a negative significant sign, publish a finding. Oh, and if the coefficient is significantly positive, publish that instead.

Leave a Reply

Your email address will not be published. Required fields are marked *