Nice

]]>Thank you!

anoop

I am a systematic portfolio manager trading in the futures markets and this work has not only generated a lot of new ideas but has me questioning prior work at a fundamental level. ]]>

Because of its Bayesian approach, I consider McElreath’s book a must read for statisticians. Paraphrasing DV Lindley: Bayesian Statistics is the 21 Century Statistics.

]]>It took me a while to get a chance to sample the online lectures – excellent, full of sensible insight put in ways (metaphors) most likely to cause (some arguably useful) understanding by non-statistical grad students.

And no overdone frequency approach bashing!

]]>In the general case, you can specify nearly anything about the distribution and there might still be a maximum entropy distribution that satisfies that constraint (there doesn’t always have to be). The general case for specifying known values for various moments has been more or less worked out. But you could specify other things: the pdf has peaks at 0 and 1, the mean value is 2, the 95th percentile is 5 and q(x) has interquartile range 1 to 4.5 for some given strange nonlinear function q or whatever

getting the maximum entropy distribution for a sufficiently weird set of constraints like that might require numerical approximations or something similar, like writing the log density in a basis expansion and solving numerically for the coefficients.

In many cases, even if that’s the true set of information you have, you could work with a simpler problem (ie. just the peaks at 0,1 and the mean value 2 might be enough to get useful results with). In some sense the reason the normal distribution is so useful and common is that it’s one of the “simplest” maximum entropy distributions (ie. contains very little information) especially if you are hierarchically modeling the value of the standard deviation.

But, yes you can specify a mean absolute deviation and get a laplace type distribution too. That turns out to be the Bayesian interpretation of the “LASSO” I guess. I often use exponential distributions for priors over parameters that I know the approximate order of magnitude of (ie. “on this scale it’s a positive number about 3”, so exponential(1/3.0) is the max-ent prior)

]]>JD: You’ll likely enjoy the derivations in Chapter 9.

If there is a finite variance, then there is also a mean. You get that moment by implication, which is why it isn’t listed as a constraint. If you assume mean absolute deviation but say nothing about variance, the maxent dist is exponential.

I mainly use maxent in the course to derive likelihoods (aka data priors), not parameter priors. So that’s why I don’t focus on fixed distributions, but rather conditional distributions. Hopefully that makes the issue clearer.

]]>Yes, I’m hearing a lot about the corrupted Kindle version. I’ve let CRC Press know, but I don’t think they actual produce the Kindle edition, so not sure how many subcontractor steps until it is corrected.

]]>I am reading the McElreath book on the VitalSource bookshelf. This is quite an improvement over Kindle, although it still has some annoying aspects.

In general CRC Press is doing a better job than many other stats publishers by releasing books on Kindle that look exactly like the print version; Springer has outperformed CRC Press (recently?) by allowing people to just buy the pdf and read it like a regular pdf file.

Another surprise was the cost of the McElreath book on Kindle; even BDA3 is 10 Euros cheaper. CRC Press should reduce the online books’ prices. If I assign the McElreath book to students here in Potsdam, many will not be able to afford it.

]]>I wish someone would figure out a way to make more academic books readable on the Kindle.

]]>At the moment, while I don’t think I “buy into” the Maximum Entropy Principle, I do think it is interesting.

But I was watching some of McElreath’s video lectures and something struck me as odd. It sounded like he was saying that if you want a prior with support on the real line and a finite variance, then the MaxEnt prior will be the normal distribution. This isn’t accurate, right?

I’m no MaxEnt expert, but it seems to me that the more precise statement would be that if you want a prior with support on the real line and the only other thing you know is that the variance is a particular number then, the normal with this variance is the MaxEnt distribution.

But my real question is, how often could that situation come up, really? I mean if you think you can specify the variance, then how hard would it be to also elicit a mean absolute deviation? What if I started with specifying this MAD and got a bit lazy and didn’t say anything about the variance?

Is there any development of software out there that helps you determine MaxEnt distributions for cases where you have more than one moment constraints or more complicated constraints?

JD

]]>Was really looking forward to reading this on my flight but unfortunately the kindle version comes with corrupted font. Tested it across multiple devices: android, pc, ipad.

]]>But that isn’t important – it looks really useful. ]]>

Well he’s not shy about basing his work on Jaynes. There is one criticism though. Why not just refer to frequency distributions as “frequency distributions”, denote them with f(), and admit frankly they’re empirical quantities we’re tying to predict, no different in principle than a meteorologist predicting temperatures, or political scientists predicting vote totals. Reserve probabilities p() solely for modeling and determining the consequences of uncertainties.

I think adopting such notation will be the tipping point for Bayesian statistics, because 90% of the endless sad pit of confusion and despair that is present day statistics just melts away if you simply don’t use the same notation for frequencies as you do for probabilities.

]]>