Raghu Parthasarathy writes:
Here’s a question about statistics, or more accurately the uses of statistics, that has puzzled me for many years: Why isn’t the Cramér-Rao lower bound (CRLB) more often invoked?
As I understand it, the CRLB gives us a bound on how good any estimator of some parameter can be. When I first learned about it—not as an undergraduate, not as a graduate student, not even as a postdoc, but as a faculty member—I was amazed. It’s relevant to work I was doing on finding the locations of single molecules in images, which is relevant to things like super resolution microscopy. I’ve inserted a mention of it in the course I’m currently teaching, on image analysis and related stuff. I’ve often wondered why the CRLB isn’t better known, and more importantly: Why isn’t it used more? Looking at all kinds of studies drawing grand conclusions from noisy data, I think to myself, given the noise, shouldn’t we ask how well even in principle we can know some parameter? What’s the best we can do? Of course, it’s true that we don’t necessarily have a good noise model in many of these cases, but surely we can estimate some kind of curvature of a log-likelihood function given the data distribution itself, a bit like determining the curvature of the blurry image of a fluorescent molecule. Is it that no one asks “What’s the CRLB for the estimator of obesity given noisy gut microbiome abundance data,” or am I missing something about how applicable this actually is?
My quick answer is that the lower bound depends on the model, and once you’ve fit the model, you have your inference, so no need for the lower bound anymore.
To put it another way, Raghu writes, “shouldn’t we ask how well even in principle we can know some parameter?” My answer is, yes, we can do this using fake-data simulation. Or, if the statement is, “surely we can estimate some kind of curvature of a log-likelihood function given the data distribution itself,” then my answer is that this is standard practice if you have a unimodal likelihood. It’s not called the “Cramer-Rao lower bound”; it’s just the normal approximation to the maximum likelihood estimate, or Bayesian inference, or the variational approximation, or whatever. The CRLB is implicit in whatever you’re doing, and once you have your standard error or other measure of uncertainty, there’s no need for the theorem. It’s more that knowledge of the theorem affects the sorts of estimates that we use.
I wonder what is the best practice. If one does simulated draws from the Bayesian posterior distribution, there is no need to check the CRLB? Is it a matter of taste?
I apologize if my question does not make sense —I am a sociologist and have only rudimentary knowledge in statistics.
I think the idea is the CRLB is a general bound that always applies, the Bayesian simulation will be a bound that applies to your particular scenario more specifically. There is no need for the CRLB if you are doing the Bayesian simulation, it will just give you a kind of conservative estimate of the simulation result.
CRLB might be useful as a quick back of the envelope calculation that doesn’t require as much work to estimate.
Daniel, Thank you for the clarification.
Should be: “What’s the CRLB for the estimator of obesity given noisy gut microbiome abundance data and our model”
You could have a different model including an estimator with the same name, and the value depends on the model specification.
That is usually a bigger source of error than sampling error. You can check it with predictive skill on new data collected after the training data used to fit the model. Maybe you can apply this lower bound to that?
Raghu asks: “Looking at all kinds of studies drawing grand conclusions from noisy data, I think to myself, given the noise…:”
Andrew replies: “My quick answer is that the lower bound depends on the model…”
I’m not sure if Andrew’s answer is to the question that Raghu asked. At first glance it seems that most of the “studies drawing grand conclusions from noisy data” aren’t using the sophisticated models that Andrew refers to and uses in his own research – which is what enables them to draw the grand conclusions from noisy (and usually crappy and/or unreliable) data in the first place.
Just my impression…
A cheeky comment could be that the CRLB is ubiquitous because most models are OLS regression models that achieve the CRLB.
I think the point where the CRLB can help you is if you are looking for an estimator with certain desirable properties (the CRLB properties). If your estimator reaches the CRLB, case closed. You cannot do better, use this estimator.
Echoing Anoneuoid, the CRLB gives the theoretically lowest bound on the variation of an estimator. But usually we are more interested in ‘how far away is the estimate from the true value’ than in ‘what is the minimum variance around the estimator’. The latter is what we can learn from the CRLB, and why it might be useful for fake-data simulation: one of the purposes of fake-data simulation is to perform power analyses, i.e. to find out whether an experimental setup could, under reasonable assumptions, detect the assumed signal. The CRLB could thus be used to calculate the minimum number of observations required for a given effect size.
Incidentally if you express the Schrodinger lagrangian in QM in terms of a probability and phase instead of the usual wave function, there’s a term in there formally identical to fisher’s information (of a location parameter). Consequently, you can use the Cramer-Rao bound to prove the Heisenberg uncertainty relation.
It’s a matter of taste whether you consider than an “application”.
You may enjoy this related article: https://www.ssca.org.in/media/37_SA22122024_Bhramar_Mukherjee_GE_Xiao_Li_Meng_01122024_FINAL_Finally.pdf
Thanks for the comments, everyone! I’ll try to digest all this. For now, I thought I’d insert a link to my post on the more recent (2024) instance my image analysis course, rather than the 2022 course. https://eighteenthelephant.com/2025/01/10/image-analysis-course-recap-fall-2024/
This also intersects various frequent topics on this blog, like the impact of AI.
Great topic. Glad you asked that question. Incidentally read a few weeks ago your biophysics popular science book and tremendously enjoyed it. I am sure your students enjoy these courses :)
Thank you!
I think I have a better understanding of my own question now, and of possible answers. (Thanks, Andrew, and everyone!) I agree with Andrew’s statement, “The CRLB is implicit in whatever you’re doing, and once you have your standard error or other measure of uncertainty, there’s no need for the theorem,” but I also agree with Anonymous, “At first glance it seems that most of the “studies drawing grand conclusions from noisy data” aren’t using the sophisticated models that Andrew refers to.”
I think in many fields, it is *not* standard to actually construct a model of the likelihood of observations. If it were, then yes, the CRLB would in some sense be implied. Instead one simply has observations and makes up some method of inferring some parameter from them — maybe time-series of species abundance, for example, from which one invents some procedure for inferring interactions. One might argue that this is a misguided approach, but here we are. Given this, taking a step back and saying, “here’s the likelihood function, what’s its curvature?” even if we don’t use this function for anything else (too computationally intensive?) would be a worthwhile step.
I’m moderately sure that what I’ve written makes sense, but not very sure!
Raghu — not sure if this is helpful but here (slide 28) is an example from my doctoral dissertation where I derived the asymptotic standard error for a parameter of interest in nonlinear regression model. The form of this turned out to be quite simple and gave insight into how hard it can be to estimate the parameter (on average, in sufficiently large samples) and what determines this.
It was derived from the Fisher information but I never invoked the CRLB explicitly since I was always working a likelihood/Bayesian framework.
I think of the CRLB as a bit like the complete class theorem or the Rao-Blackwell theorem: it’s one of a number of results effectively saying that in the absence of model misspecification, being approximately Bayesian (including likelihood-based methods) leads to estimators with sensible or optimal frequentist properties. People considering the trade off between efficiency and robustness to misspecification (e.g. in semiparametrics) make use of CRLB-like theory quite a bit.
Thanks! Yes, it’s helpful (for me) to see a context in which someone *is* determining the asymptotic standard error, and that it’s useful to do so.