Comments on: Log Sum of Exponentials for Robust Sums on the Log Scale

By: David Darmon

David Darmon — Thu, 22 Dec 2016 14:47:22 +0000

Also missing the right paren in:

log(a + b) = log(exp(log(a)) + exp(log(b))

Thanks for the writeup!

By: Jesse W

Jesse W — Sat, 18 Jun 2016 12:06:51 +0000

This is extremely helpful, thank you! I have been working on mixture models of animal body sizes (estimated through metrics of bones – I’m an archaeologist) and have been having this exact issue when trying to calculate log densities while using JAGS. Now to head back to my code and tinker some more. Or better yet, learn how to use Stan…

By: Sebastian

Sebastian — Thu, 16 Jun 2016 13:56:51 +0000

Also see this for a robust log-sum-exp streaming algorithm:
https://www.nowozin.net/sebastian/blog/streaming-log-sum-exp-computation.html

By: Jacob Egner

Jacob Egner — Wed, 15 Jun 2016 13:57:58 +0000

If we wanted to alter the log_sum_exp function to take an input array/vector/whatever of arbitrary length in R, rather than just two scalars, would the following be correct?

log_sum_exp <- function(x)
{
m = max(x)
m + log(sum(exp(x – m)))
}

By: Bob Carpenter

Bob Carpenter — Mon, 13 Jun 2016 20:27:19 +0000

In reply to Daniel Lakeland. Among the cool properties of Hakaru, a functional probabilistic programming language embedded in Haskell, is that it automatically converts arithmetic to the more stable log scale behind the scenes.

By: Bob Carpenter

Bob Carpenter — Mon, 13 Jun 2016 20:25:03 +0000

In reply to Allan Engelhardt. Fixed, thanks. No paren matching in the WordPress post editor. I should really be writing these in emacs.

By: Rahul

Rahul — Mon, 13 Jun 2016 14:57:11 +0000

In reply to Oliver Keyes.

Hmmm…ok, so it’s best implemented at application or at most library level.

I guess there’s already mpmath for python: https://mpmath.org/

By: Oliver Keyes

Oliver Keyes — Mon, 13 Jun 2016 12:33:01 +0000

In reply to Rahul.

They potentially could, but honestly it’d be forever before they were actually usable places.

Languages like C – the baseline and mainstay compiled languages – tend to have new versions published on a schedule. Once the new version is out (or just advanced enough that it’s clear what direction it’s going in), compiler developers go off and build support for the new features or changed features into their compilers.

The schedule on this compiler support, however, can be totally variable. As an example, let’s take C++11, the C++ standard from, well, 2011.

The two “main” compilers used for these kinds of languages are Clang and GCC – GCC as the default on a lot of GNU/Linux platforms, Clang as the default on OSX and others. Both got the finished C++11 Spec in September 2011. Clang switched over to C++11 in April 2013; GCC in June *2016*.

Now, 11 was a pretty big revision, so there’s that. But in a roundabout way what I’m getting at is that while a language abstraction might be doable, it could be anywhere between 2 and 5 years – even once that abstraction is accepted and supported – before it’s actually reliably usable on an inter-platform basis.

By: Allan Engelhardt

Allan Engelhardt — Mon, 13 Jun 2016 09:29:05 +0000

Thanks. Public duly served by this.

Missing ‘)’ in

log_sum_exp(u, v) = max(u, v) + log(exp(u – max(u, v)) + exp(v – max(u, v))

By: Rahul

Rahul — Mon, 13 Jun 2016 07:44:06 +0000

In reply to Thomas Lumley. Could there be a non-floating point type defined for these cases? Maybe a compiler / language abstraction that relies on mapping things behind the scenes to finally the same old types supported by existing hardware. Can those types coexist with the other regular types (say, within expressions) or are they such an exotic beast that type-casting and inter-conversion would be too onerous?

By: Thomas Lumley

Thomas Lumley — Mon, 13 Jun 2016 07:26:19 +0000

In reply to Rahul.

There isn’t a much bigger data type with widespread hardware support. On laptops, desktops, and servers you can pretty much rely on the C ‘long double’ type compiling to a hardware-supported type with at least the 15 bits of exponent in the Intel 80-bit extended precision format, but I don’t think you can practically rely on more.

15 bits instead of 11 doesn’t get you very far when the exponent increases with both the sample size and the number of parameters — it only slightly postpones needing to know a bit about floating point computing. In contrast, going from single to double precision was a big deal, and let lots of people get away with ignoring rounding.

By: Thomas Lumley

Thomas Lumley — Mon, 13 Jun 2016 07:06:53 +0000

In reply to thisisntimportantatall. From "An Interview with the Old Man of Floating-Point"

The primary objection to Gradual Underflow arose from a fear that it would degrade the performance of the fastest arithmetics. Microcoded implementations like the i8087 and MC68881/2 had nothing to lose, but hardwired floating-point, especially if pipelined, seemed slowed by the extra steps Gradual Underflow required even if no underflows occurred. Two implementation strategies were discussed during meetings of the p754 committee. One used traps, one trap to catch results that underflowed and denormalize them to produce subnormal numbers, and another trap to catch subnormal operands and prenormalize them. The second strategy inserted two extra stages into the pipeline, one stage to prenormalize subnormal operands and another to denormalize underflowed results. Both strategies seemed to inject delays into the critical path for all floating-point operations, slowing all of them.

By: Rahul

Rahul — Mon, 13 Jun 2016 03:57:33 +0000

In reply to Bob Carpenter. Is it an option to use a bigger data type? Or do they not exist? Assuming these sort of issues crop up at many other places in Numeric / Scientific Computing isn't the time ripe for the languages to internalize this handling under the hood?

By: Jonathan (another one)

Jonathan (another one) — Sun, 12 Jun 2016 21:42:36 +0000

In reply to Daniel Lakeland.

All of that… Plus there is an important branch based on whether b^2-4ac is 0, namely 2 imaginary, 2 real, or 1 solution, so precision on the sign of b^2-4ac is critical as well… Not to mention degenerate cases, namely a=0, or c=0, or a and b=0, or a,b,and c all equal to 0.

By: thisisntimportantatall

thisisntimportantatall — Sun, 12 Jun 2016 19:56:44 +0000

In reply to thisisntimportantatall. In fact, there /used/ to be some CPUs with FP units that would raise an exception whenever a number is subtracted from a number very close to it (this was pre-IEEE754 I think). While I never thought much about it, my understanding is that the consensus was that this was a bad/unhelpful idea. Perhaps it's discussed in Bill Kahan's papers somewhere.

By: thisisntimportantatall

thisisntimportantatall — Sun, 12 Jun 2016 19:51:51 +0000

In reply to Bob Carpenter.

I don’t know such a term, although there might be. Perhaps saturation, but googling it seems it means something else. I looked it up in Higham’s Accuracy and Stability of Numerical Algorithms (great book for these questions, Chapter 4 is the one that covers error analysis of summation methods), and he doesn’t seem to introduce any such term, just uses rounding. I think from the point of view of floating-point arithmetic there isn’t anything terribly special about such rounding compared to any other rounding: the basic promise is that the result of 1-u is the mathematically precise number that is rounded to the nearest (according to a certain rule) representable number. So in a way it’s not a “special” occurrence the way underflow would be, because the promise is fulfilled (in an unhelpful kind of way). 1-u is just an ill-conditioned function of u near u=1, so it’s not really floating-point arithmetic’s fault that it fails.

By: Daniel Lakeland

Daniel Lakeland — Sun, 12 Jun 2016 19:29:32 +0000

In reply to Rahul. no 0+1e-75 = 1e-75 (or the rounded version of 1e-75 since that number probably isn't exactly representable) but 1+1e-75 = 1 because the closest IEEE float to 1 + 10^-75 is 1

By: Rahul

Rahul — Sun, 12 Jun 2016 18:34:32 +0000

In reply to Daniel Lakeland. You mean 0+1e-75 will set the underflow flag but 1+1e-75 won't? Sounds strange.

By: Daniel Lakeland

Daniel Lakeland — Sun, 12 Jun 2016 18:27:31 +0000

In reply to Rahul. This is actually a proposed alternative to ieee type floating point (storing the logarithm of the magnitude of a number). The main difficulty is that it makes addition and subtraction a complex calculation, whereas multiplication/division is very easy. There was an article in American Scientist several years back about the merits and difficulties of this alternative.

By: Daniel Lakeland

Daniel Lakeland — Sun, 12 Jun 2016 18:21:45 +0000

In reply to Daniel Lakeland. However, the processor WILL silently round non-special values for example 1+1e-75 = 1 without any warning. Underflow specifically means calculating a number that is too close to 0 to be represented as different from 0. For example exp(-770) = 0 (underflow status set, I'm guessing).

By: Rahul

Rahul — Sun, 12 Jun 2016 18:20:57 +0000

In reply to Bob Carpenter.

Interestingly R seems to have a package to deal with very large / very small numbers.

https://cran.r-project.org/web/packages/Brobdingnag/index.html

Not sure if that’s relevant in cases like this one. It seems to be doing the same thing, just behind the scenes.

By: Daniel Lakeland

Daniel Lakeland — Sun, 12 Jun 2016 18:14:52 +0000

In reply to Rahul.

Here’s what the gnu c library says:

https://www.gnu.org/software/libc/manual/html_node/FP-Exceptions.html

specifically, it looks like at the processor level, when underflow occurs the processor sets a register or something (in these documentations called the “status word”) to let you know. The GNU C library is then responsible for checking that status and seeing if a “trap” has been set and then executing the appropriate signal handling code. what happens in other languages is dependent on their runtime code and exception handling mechanisms. But the processor itself doesn’t silently underflow.

By: Daniel Lakeland

Daniel Lakeland — Sun, 12 Jun 2016 18:00:50 +0000

In reply to Bob Carpenter.

If you can do this kind of peephole optimization and throw compiler warnings I think it would be valuable. If the compiler warning is sufficiently informative the user should be able to figure out what went on:

“Warning: converted log(1-x) to equivalent computation log1m(x) for numerical stability at line LLL ”

or “Warning: converted SOMECODEHERE to equivalent form log_sum_exp(SOMEOTHERCODE) for numerical stability at line LLL”

The thing that might be difficult is to figure out whether the numerical stability is really happening. For example, suppose someone has log(1+x) in their code, if x is very large then log1m(-x) might not necessarily be more numerically accurate, since it’s probably optimized for the case where x = O(1)

So, it might be better to simply throw warnings like “consider converting log(1-x) to log1m(x) for numerical accuracy if x is expected to be close to 1 at line LLL”

By: Bob Carpenter

Bob Carpenter — Sun, 12 Jun 2016 17:18:17 +0000

In reply to Rahul. In what language? I don't know what R does. The built-in C++ library has underlying flags that get set when operations like exp() overflow. You query them as global constants and they can be reset. C++11 gives you more flexiblity to set global options that cause these events to raise exceptions. I don't think there's anything that will detect when x + epsilon evaluates to x because epsilon is too small relative to x. Integer overflow was the cause of binary search being broken (or at least not scalable) in Java for many years; see Josh Bloch's excellent post Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken. Bloch recommends Bentley's Programming Pearls, which also made a deep impression on me when I was starting to get serious about programming as opposed to algorithm analysis (that's what a move from academia to industry does).

By: Daniel Lakeland

Daniel Lakeland — Sun, 12 Jun 2016 17:13:02 +0000

In reply to Jonathan (another one).

the symbolic solution (-b +- sqrt(b^2-4ac))/2a can help us figure out what goes wrong, if b^2-4ac is the difference of two large numbers then it can be subject to cancellation, so although b^2-4ac might be something like 5, it’s because b^2 ~ 10 million and 4ac is 5 less than 10 million.

next -b+-sqrt(foo) if foo is close to b^2 then again you get cancellation errors. Also foo ~ b^2 when 4ac ~ 0 relative to b^2, which can happen when a is near zero, which means that 1/2a could potentially overflow.

all of these issues are similar to the ones involved in log_sum_exp

I was TA for a stats class where the professor assigned a problem in which there was a simple probability density that was constant for an interval and then dropped linearly to zero. She asked the students to find the probability of a random draw being in a certain interval in which the triangular portion was just a little bit of the interval. The students wound up having to either be smart about it, or solve a quadratic that had bad properties when the calculations were carried out by hand on a calculator and paper (ie. subject to roundoff at say the 4th decimal digit which is typical of engineering paper calculations). Many people got wrong answers even though they clearly knew the method needed. The professor said she hadn’t really thought about how solving a quadratic could be difficult.

By: Bob Carpenter

Bob Carpenter — Sun, 12 Jun 2016 17:10:10 +0000

In reply to mjskay. The source-to-source compiler that takes a Stan program and produces a C++ header file defining a class is factored through an abstract syntax tree. What we're talking about here are algebraic transforms (called "peephole optimizations"), that would replace log(1 + u) with log1p(u). One issue that arises, as with much optimization, is that runtime error messages will reference the log1p() function, not the log() function; fixing that issue would be much more involved, as we'd have to track the original and transformed version and pass them to where errors arise at runtime.

By: Bob Carpenter

Bob Carpenter — Sun, 12 Jun 2016 17:04:31 +0000

In reply to thisisntimportantatall. Thanks. I'm a numerical computing novice, so I appreciate it. Is there a finer-grained term than "rounding" for this behavior when you subtract add/subtract something too small (ideally something not confusible with another standard operation, like rounding to integers? I know "catastrophic cancellation", but that's what happens with 1-u when u is close to 1.

By: Jonathan (another one)

Jonathan (another one) — Sun, 12 Jun 2016 12:47:15 +0000

Takes me back to my college course in computation. A big excercise was to write a program that solved quadratic equations… we all laughed until we saw how unintuitive was an algorithm that managed to come up with sensible answers regardless of the relative magnitude of the inputs… I think my first effort got 2 out of 10.

By: Rahul

Rahul — Sun, 12 Jun 2016 08:47:50 +0000

So does an underflow get silently set to zero or does it raise an exception?

By: Andrew

Andrew — Sun, 12 Jun 2016 02:10:58 +0000

In reply to mjskay. Mjskay: Doing this correction automatically could be tricky, but we are planning to write a parser that catches common errors and flags them for the user. So this could be one of the patterns in our list.

By: mjskay

mjskay — Sun, 12 Jun 2016 01:18:37 +0000

How hard would it be given the design of the Stan compiler to detect these patterns and convert them for the user? I’m assuming there’s an intermediate representation in the compiler that these sorts of optimizations could be done on without the user having to worry about it.

By: thisisntimportantatall

thisisntimportantatall — Sun, 12 Jun 2016 00:30:59 +0000

“1 – u overflows to 1”

Small nitpick: 1-u is /rounded/ to 1, because overflow specifically means that the number is larger than the largest representable floating-point number (1e350 or so). The difference in terminology is that when 1-u is rounded to 1 the /relative/ error is still very small, but if u underflowed to 0, the relative error would be 100%.