This is Jessica. Like “guarantees”, optimal is one of those words that gets thrown around all the time in computer science, but which can be used in cringier and less cringy ways. Today’s post is about using “optimal” to describe empirically estimated quantities, especially those that depend on human behavior.
For example, imagine we employ some model (e.g., an LLM) to mimic some human judgments. We try various ways of doing this and determine that we get the best alignment of the model behavior and the human behavior under some particular configuration of the method, so we call it optimal. Or, maybe we are measuring human performance given some model predictions, and we do a grid search over different model parameter combinations and call the parameterization that coincides with the best performance the optimal one.
Using “optimal” in such cases bugs me because there is no way in which the specific estimated value is a consistent property of the process that produces it. Running the same data collection process in either of the above cases will result in different “optimal” values. Calling something that is best for some bespoke sample where often we can’t really describe the sampling process or even the population “optimal” implies auxiliary assumptions we can’t back up.
Instead “optimal” should be reserved for where we are talking about idealized processes, e.g., where we can imagine hypothetical replications and talk about how the solution a method returns is consistent in some way. For example, when we fit a regression to some data, we can talk about the optimality of our estimator in the sense that it promises a certain relationship with the true values, but we should not call the specific fit that this process returns optimal, unless maybe we know we are dealing with the entire population.
My impression is that perceiving things that are empirically learned as optimal tends to happen more often in machine learning than statistics. At the extreme, you get the Breiman-esque view that optimizing for prediction is all you need. I heard a version of this while participating in a panel on AI safety recently, where one of the other panelists announced that statistics will ultimately be replaced entirely by machine learning. Regardless of what the speaker might have meant by this, it’s easy to interpret such comments as saying that the output of the learning process is sufficiently “optimal” to not need much theory.
As I like to say, “Guarantee” is another word for “assumption”.
Another problematic term is “best,” as discussed in my post, “Best Linear Unbiased Prediction” is exactly like the Holy Roman Empire.
Legend
I’d like to add “robust” to the list of pet-peeve terms. In my field, this seems to often just be thrown in without any substantiation, like: ” This index is a robust method of measuring X”.
Cool. Robust how?
I’ve heard my students talk about “more optimal” and “less optimal” fits of some model to data. I want to smack them on the head each time I hear that… apparently some people interpret the word “optimal” to be a synonym for “better”, and not for “best”.
This is why I miss my former job doing mixed integer programming. In that field it is quickly obvious that nearly optimal is often orders of magnitude faster than proving optimality.
For empirical work, those optimizations (supply chain designs, or manufacturing schedules) were also simulated to show robustness. That type of study led to many changes in solvers to search for near optimal but reasonably different solutions.
This is definitely the blog for tilting against established statistical terminology. I doubt the term “optimization” is going to go away as the name of the algorithmic process of minimizing or maximizing a loss function. Wikipedia calls it “mathematical optimization,” but if you go up to “optimization,” there are several sub-pages of disambiguation.
Bob:
I think the issue is not so much the terminology as the attitude. If you have people saying that their methods have “guarantees” and our methods do not, that’s annoying. Bayesians can be annoying too when they say that Bayesian inference is rationality so that non-Bayesian methods are irrational.
I don’t get this. “Optimal” is *always* model / context dependent. The question is the same as always: is the training data set representative of the population? Do the algo predictions fall within the required error level? If so, who cares if there is no absolute “empirical optimal”?
What’s the optimal speed for driving? There is no empirical answer. It’s just another question: what do you want to optimize for? Even “simple” stuff like what’s the “optimal” separation distance for EV chargers? If you live in Houston and you ski in Ruidoso, it ain’t the same goin’ as it is comin’ home.
It seems strange to suggest that “theory” should underpin ML. AFAIK, “ML” is a process by which data is fed into an algo, the algo creates a model, the model makes predictions, tests the predictions against actual outcomes, then continually self-revises to drive convergence between prediction and outcome. Theory is unnecessary. It is self-replicating.
I can imagine situations in which people apply a model to a population for which the training set is not representative. Then the obvious attack or criticism is on the difference between the target population and the training set. But ultimately unless the differences are glaring, a successful attack has to demonstrate that the results of the applied model are inaccurate for the specified purpose of the model. Again, it’s back to replicating. Theory isn’t enough.
People here raise the “theory” question when researchers employ dodgy methods to supposedly measure effects that are highly unlikely to be real. But it’s the weakest line of attack. In the end theory is kind of an accessory. The main point is replication. Many ancient cultures accurately predicted the movements of major stars and planets thousands of years before the heliocentric theory. How did they do it? Continually measuring and testing their predictions. No theory was required. Theory is unnecessary to determine whether something is true or false.
Anon:
Prediction is not the only goal of science. Newton’s laws predict planetary motion (ok, not perfectly, but pretty well) and they also come from a theory which is useful for all sorts of other purposes. You could have a Ptolemaic system that predicted planetary positions to 5 digits of accuracy and it wouldn’t be scientifically useful in the same way that Newton’s laws are. I don’t think theory is an accessory here at all. And I think the same about theories in the biological and social sciences as well. Prediction is an important application of theories, and it’s an important way to test theories, but it’s only part of the story.
“Prediction is not the only goal of science.”
Sure, I agree with that. But without reasonably accurate measurements and prediction, sound theory is not possible. As you point out with the many prior theories regarding planetary movements, replication and predictability come first. Theory comes second.
Above I said:
“…The main point is replication… ”
I feel like that’s incorrect. I want to revise it. The appropriate approach is to:
1) start with a method that has been shown to to yield legitmate results for the problem at hand
2) verify those results through replication, ideally with an independent method that has also been shown to yield legit results
With that in mind, we should think of theory as a verified conceptual method. In physics, chem, bio etc, theory is useful because there is a strong framework that has been tested repeatedly. Aside from providing explanations, the repeated testing / verification makes it useful – provides the foundation – for assessing further implications, which can then be tested. But without preceeding verification, its explanatory and predictive power is weak. We might even imagine that the predictive power and explanatory value of theory is a function of its proximity to verified concepts within the theory.
With all of that in mind, it seems obvious that verified methods and replication are the keys to success. It’s fair to criticize underlying theory, but methodological and replication / repeatability are stronger criticisms.