In any case, the real difference between the quadratic and log rule is in how they treat low probability errors, as you point out. If there aren’t any of those, the (rescaled) log score and quadratic score will behave pretty similarly. The main reason the quadratic score is (in many cases) more sensitive to differences between moderate probabilities than the log score is really that it’s not hypersensitive to differences between low probabilities.

]]>From 0.5 to 0.9 absolute error, there’s slightly more flattening from log loss.

Beyond 0.9 absolute error, the log loss grows sharply. So if there are lots of big misses (e.g. predictions of less than 10% for events that occur) then the log loss will be dominated by those.

So which does more flattening is going to boil down to what the absolute error distribution looks like.

Maximum likelihood estimates (peanlized) try to minimize log loss on the training set (plus negative log penlty). For Bayes, we sample from the posterior density, which doesn’t find an optimal set of parameters for log loss, but that’s still the objective of the density from which we’re sampling. So we can calculate expected loss (for any of the loss measures we might want to use).

]]>Interesting discussion. I’ll have to think about this.

]]>Scaling them this way, log loss penalizes near misses more and extreme misses more. But either way, near misses are dominated by larger misses when the total loss measure is just averaged over items as log loss and squared loss are. For example, with squared loss, an error of 0.25 is about 0.06 squared error, whereas an error of 0.5 is 0.25 squared error, almost five times as large.

]]>$latex \mbox{logLoss}(y, \hat{y}) = -\mbox{bernoulli}(y \mid y_hat) = -\log \mbox{ifelse}(y, \hat{y}, 1 – \hat{y}).$

$latex \mbox{brierLoss}(y, \hat{y}) = (y – \hat{y})^2 \approx – \log \mbox{normal}(y \mid y_hat, 1).$

Plotting these out for the case where $latex y = 1$, we get

Squared error certainly has a lower dynamic range, but it still flattens small differences by squaring them.

Hey, who turned off the MathJax?

logLoss(y, y_hat) = -log(y ? y_hat : 1 - y_hat) squareLoss(y, y_hat) = (y - y_hat)^2]]>