Likelihood from quantiles?

Michael McLaughlin writes:

Many observers, esp. engineers, have a tendency to record their observations as {quantile, CDF} pairs, e.g.,

x CDF(x)

3.2 0.26
4.7 0.39

etc.

I suspect that their intent is to do some kind of “least-squares” analysis by computing theoretical CDFs from a model, e.g. Gamma(a, b), then regressing the observed CDFs against the theoretical quantiles, iterating the model parameters to minimize something, perhaps the K-S statistic.

I was wondering whether standard MCMC methods would be invalidated if the likelihood factor were constructed using CDFs instead of PDFs (or density mass). That is, the likelihood would be the product of F(x) values instead of the derivative, f(x). My intuition tells me that it shouldn’t matter since the result is still a product of probabilities but the apparent lack of literature examples gives me pause.

My reply: I don’t know enough about this sort of problem to give you a real answer, but in general the likelihood is the probability distribution of the data (given parameters), hence in setting up the likelihood you want to get a sense of what the measurements actually are. Is that “3.2” measured with error, or are you concerned with variation across different machines or whatever? Once you know this, maybe you can model the measurements directly.

8 thoughts on “Likelihood from quantiles?

  1. There is quite an old tradition of estimating models based on quantiles. A good example is the discussion in Aitchison and Brown (1959)
    on estimating the three parameter lognormal, i.e. X such that log(X – a) ~ N(m,s). Formal likelihood methods fail since the likelihood
    becomes unbounded as ahat -> X_(1) from below, but it is easy to construct estimators based on a few sample quantiles that are highly efficient.
    This can be viewed as a GMM/GEE approach if one takes the view that “quantiles are moments too, of a sort.” Many early robust estimators
    are also of this type, associated with — among others — Mosteller, Tukey and Gastwirth.

    • Some guy wrote this series of papers too, which seems relevant: http://www.econ.uiuc.edu/~roger/research/home.html

      I remember a point early in grad school where I was looking at the distribution of wages in some country before and after a minimum wage was instituted. And so I looked at the time series of the quantiles – that is, I calculated the 10th, 20th, … etc percentiles of the wage distribution in each quarter, and then did a least-squares regression using each period’s quantile on the left hand side (so T time periods worth of observations per regression, and Q quantiles number of separate regressions). It was suggested to me that there are more refined ways to do that. But maybe that kind of approach makes sense sometimes (or some smarter version of that), or is it just that you can use the quantiles all at once to fit some parametric distribution to the data?

      I would also like to take this opportunity to suggest a series of blog posts on interpreting estimates of effects across the outcome distribution: quantile treatment effects, conditional quantile regression coefficients, and those “uncondtional quantile regressions” that use the re-centered influence function. I hear terrible interpretations of results from these kinds of estimators all the time, but I wouldn’t say that means I always have good interpretations.

  2. If the correspondent is look for examples in the literature, I think survival analysis would be a good place to start. Here censoring is a common problem (i.e. failure time is not observed on all units). Fitting a parametric model to these kinds of data involves a likelihood composed of terms derived from a pdf where failure is observed, and terms derived from the cdf for situations where failure is not observed (e.g. all we know is that an individual lived for *at least* 2 years after they took the pill / got the surgery / etc).

  3. Unless I’m missing something, the CDF values are just non-invertible functions of the raw observations, and hence are information-theoretically superfluous (but not computationally superfluous, obvs).

    • Aren’t CDFs generally invertible functions? By CDF value does the original post author mean just rank? or does he mean that some CDF has been chosen and they’re just calculating that as an extra step in the data collection process? I’m confused. I certainly don’t know anyone who does this kind of thing, though I have definitely seen people take data in rank order, which is obviously related, in that CDF(x) = Rank(x)/N if you’re using the empirical CDF at least.

  4. “That is, the likelihood would be the product of F(x) values instead of the derivative, f(x). My intuition tells me that it shouldn’t matter since the result is still a product of probabilities but the apparent lack of literature examples gives me pause.”

    Consider CDF(x) = integrate(pdf(z),z,-inf,x)

    so the value of the CDF depends primarily on the value of the pdf at points very far away from x (points in the left tail). How much information does the CDF provide about whether things are going to happen near x??? basically none. Whereas the PDF gives you information specifically about how “likely” (in purely nontechnical usage of the word) data in the vicinity of x is.

    Of course, ultimately, the probability of getting any individual real-numbered result is zero, so people often are confused about how a probability density is supposed to get you a non-zero probability anyway. The answer is at least in part that every measurement instrument has finite precision, so the pdf is really just an approximation to a probability mass function, for example over the 24 bits of possible results from your A/D converter, or over the numbers representable by IEEE 32 bit floating point numbers, or something.

  5. Some confusion here between observed statistics and parameters?
    – the query was pointing to observed quantiles,and as likelihood is simply the probability of what was observed – they are well defined (and as David points out, never continuous).

    The details can be found in the appendix on order statistics in Cox and Hinkley or in my DPhil thesis (a section on L statistics) http://statmodeling.stat.columbia.edu/wp-content/uploads/2010/06/ThesisReprint.pdf

  6. You should be able to use a standard Bayesian model. For continuous response if you specify a likelihood you specify the quantile function and vice versa. Then for model comparison you can compare the posterior distribution of your quantiles to the theoretical quantiles using a loss function of choice (e.g. least squares).

    If quantiles are of interest, it can be helpful to think about specifying the quantile function instead of the distribution. There is no need to use alternative likelihood approaches (e.g. substitution likelihood). Brian Reich and I created an R package entitled Bayesian Simultaneous Quantile Regression (BSquare) that simultaneously models the effects of a predictor across all quantiles. Details are found on Brian’s website:

    http://www4.stat.ncsu.edu/~reich/QR/BSquare.pdf

    and in Chapters 2-3 of my thesis:

    http://repository.lib.ncsu.edu/ir/bitstream/1840.16/9465/1/etd.pdf.

Leave a Reply

Your email address will not be published. Required fields are marked *