“The paper’s goal is not so much to teach readers how to actually perform Bayesian data analysis — there are other papers in the special issue for that — but to facilitate readers in their quest to understand basic Bayesian concepts.”

http://alexanderetz.com/2016/02/07/understanding-bayes-how-to-become-a-bayesian-in-eight-easy-steps/ ]]>

“The critical steps for me were learning measure theory and what a random variable was…”

Random variable, sure. Maybe even its definition as a function. But measure theory? Way overkill.

– Probability and Statistics

http://online.stanford.edu/course/probability-and-statistics-self-paced

– Statistical Reasoning

http://online.stanford.edu/course/statistical-reasoning-self-paced

I’m interested in this question too, as I have a particular viewpoint of my own, but would like to hear other’s. In particular, perhaps Bob mainly didn’t have experience with “calculus type” mathematics and the real analysis / measure theory stuff helped him get that experience? But, if you’d taken a bunch of Calculus, and ODEs and soforth at an undergrad level, perhaps the formalism of measure theory would be a different story.

]]>Seems obvious that one’s background and interests are going to have a big effect here.

Was wondering if you think dealing with non-finite sets (measure theory) is necessary or just convenient?

]]>The critical steps for me were learning measure theory and what a random variable was and then learning simulation and MC/MCMC methods. Without measure theory, it was hard to follow all the notation, which is notoriously vague in statistics (particularly expectation notation and Bayesian overloading of notation for random variables and values); the bottleneck is that you need to have done some analysis and algebra (ideally topology) to really get sample spaces and events.

After the basic probability theory, it was easy to learn Monte Carlo methods and open up the world of applied Bayesian modeling. For that, I found BUGS invaluable because it reduced to code rather than the usual squirrelly narrative in a stats paper. I really liked Gelman and Hill’s regression book after that for the same reason and for all the insight into practical modeling, though I found all the point estimation stuff in the first half confusing (where they use lm/glm and glm/glmer).

As a very first book, I love the intro to Bulmer’s *Principles of Statistics*. It then veers off into frequentist estimation and hypothesis testing, but as far as that goes, is the clearest explanation I’ve seen. It’s the right size, too, not one of these doorstops used for intro stats classes.

My favorite intro to probability theory is in a surprising place—the appendix to Anderson and Moore’s *Optimal Filtering*. It’s just a very tight 10 or 15 pages of definitions. It probably wouldn’t work if you don’t know a bit of topology and aren’t used to sequences of definitions math-book style. I just like that it’s so concise and properly defines everything in exactly the sequence you need.

I was driven into this by wanting to understand multiple rater models for data annotation problems in natural language processing. I knew I needed to use hierarchical modeling. Here I was lucky and knew Andrew, so he let me hang out in his and Jennifer Hill’s multiple imputation reading group. Nothing like hanging out with experts to tidy up lots of little misunderstandings and learn to properly talk the talk. They helped me rediscover Dawid and Skene’s (1979) model (as Andrew says, everything you come up with was discovered by psychometricians decades ago), but that meant I was on the right track in thinking about models and modeling; I never could get the Bayesian version published in an NLP venue (other than as an example in the Stan manual chapter on latent discrete parameters).

]]>To reiterate some of the issues:

1) The policy went into effect in 1950 and ran to 1980. You’d expect to see a transient response at the initiation of the policy which then equilibriates over a period of several years. By the 2000’s the self-selection into living location based on preferences that include smogginess would be expected to be fully complete… The policy is discontinuous in TIME as well as space, but is analyzed at a single far future point in time (relative to the policy change).

2) the 1-D nature of their model makes no sense for a large 2-D area. Weather patterns will be important for exposure, not just “how far north of the river are you”

3) The policy provided economic benefits for the north, those may well be extant now as higher levels of development, such as density of hospitals, average education of the populations, etc. The more developed areas may attract a different population. It’s been 60 years since this policy was put in place. You’d want to model the effect of the economic benefits on development, and that would include migration for both health and economic reasons.

So, taking an “ecological” point of view. You’ve got a region where some resources are being provided, and you’ve got agents that can move around, and you’ve got a pollutant being generated in the same region as the resources, and you’ve got an extended timeframe of exposure to the pollutant, and you’ve got an uncertain “damage” response to the pollution, and you’ve got agents that respond to that damage via changes in their behavior. That all sounds like a dynamic time-varying process that requires a dynamic time-varying model.

]]>I suspect no, because I think it really requires a more in-depth study getting additional data in order to do a good job. Even if they do have sufficient data to deal with the situation in a more realistic way, the time it would take is prohibitive. Presumably they are the ones who got research grants (or at least university salaries) to study this stuff. If it were just a matter of spending an afternoon or something, then yeah I’d be tempted, but I suspect it’s more of a wade in hip deep and spend a month or two looking at the data that is available, breaking it down, studying different models, looking for additional data to supplement them. etc

Some of the issues that I think they should have addressed are already discussed (by me and others) in comments at the previous blog entry (linked above). So, someone who wanted to use that problem as a project could start to wade in and look at the issues and work on it.

]]>https://uk.sagepub.com/en-gb/eur/series/Series486

For anyone in the UK, Sheffield University offers an MSc in statistics by distance learning, which I have been looking at.

]]>Could you take the data from the China-coal-burning paper & actually show the sort of realistic model *you* think ought to have been fit?

]]>I swear, I wrote this at about 4 am, then went back to bed and dreamed that economists from UC Berkeley were conspiring with the CIA to kill me in a “The Fugitive” style movie. So, anyway, take it all with a grain of salt, and don’t write blog posts at 4am.

]]>On the other hand, if you start out thinking about Bayesian ideas (a distribution measures how plausible a particular value of an observed or unknown quantity is) then building models is about asking yourself “what do I know about what can happen?” after you get past that initial concept, “doing statistics” is more about encoding what you already know. One of the best things about Stan is that it lets you encode pretty much anything (though discrete parameters are its downfall, a bit). That freedom really alters how you view statistics. Before that freedom, you spend your time shoehorning your problem into something that your software can solve. Afterwards, you spend time just describing your problem and hoping you have enough computer power.

Fundamentally, Bayesian statistics is just accounting for uncertainty within mathematical modeling, and it can be done in ANY mathematical model, so you go back to thinking about the actual stuff going on.

“Social science regressions” are typically an attempt to describe regularity in data via linear combinations of simple functions (typically just linear functions). y = y0 + a*x1 + b*x2 + c*x3 +d*f(x4) etc. Sometimes that’s a reasonable model, but I think the dominance of this type of model in social science is more down to this kind of thing being easily solvable than that it fits the situation. The typical example would be the now famous on this blog coal-burning in China example. The big problem there was failure to really model the situation in anything like realistic detail (ie. spatial variation through time). What was needed was a thought process about how the history of the policy produced differences across the river through time. What we got was a 1-D basis expansion with a discontinuous function thrown in, and an estimate of it’s coefficient.

So, anyway, I suggest reading about mathematical modeling in general. That’s going to include at least calculus, ordinary differential equations, representation of functions in basis expansion (fourier etc), scaling and dimensional analysis, maybe something related to agent based models, dimension reduction… Maybe ecology would be a good place to start. My impression is that it’s an area where people are doing real mathematical modeling based on mechanistic ideas of what drives changes in ecosystems and adding in uncertainty through Bayesian calculations is relatively accepted. Most of social sciences could probably be described as “the ecosystem of humans”. So I think this analogy is a good one.

Of course, if what you mainly want to do is read other people’s papers with regressions and understand what they did (as opposed to say having an opinion about what they SHOULD have done)… Then my advice may be overkill.

]]>Read on past statistical frauds & scandals is a great way. Or the sort of fishing & forking paths that Andrew’s blog covers. Or retracted papers.

Getting an intimate knowledge of how people abuse the tools is a great way to form better opinions of other analyses you must review.

]]>Given a couple hours for the wheels to turn I remember a few other statistics-related texts which were useful for self-study. My of my work involved making classification decisions. Stork, Duda, and Hart’s “Pattern Classification” is excellent. It’s a good intro text if you’ve got a solid advanced undergrad math background. I still pick it up from time to time. McLachlan and Peel’s “Finite Mixture Models” and McLachlan and Krishnan’s “The EM Algorithm and Extensions” were also useful. (It was about 10 years ago that I was digging into those texts.)

Last one: Clive Rodgers, “Inverse Methods for Atmospheric Sounding”. His application is atmospheric remote sensing but the methods are generally applicable. From the publisher’s blurb on Amazon: “Inverse theory is treated in depth from an estimation-theory point of view, but practical questions are also emphasized, for example designing observing systems to obtain the maximum quantity of information, efficient numerical implementation of algorithms for processing of large quantities of data, error analysis and approaches to the validation of the resulting retrievals…” Lots of fun with covariance matrices and regularization methods.

]]>I couldn’t agree more. Nothing’s a better motivator than having a problem you need to solve.

> Here I’d suggest jumping in and fitting a model in Stan and making graphs in R…

IDL was the entrenched language at work so that’s what I worked in. It’s C-like and well-suited for image processing but basic graphics were awful (better now 15+ years later). They were bad enough that I used to save out IDL results and load them into Igor Pro when I needed to make graphs. I’m now 50/50 IDL/MATLAB. As was IDL at my previous job, MATLAB is currently the lingua franca at work. I liked Igor Pro better for graphics but as an overall analysis plus graphics package I do like MATLAB. (I even coughed up $150 for a home license and $50 each for a couple Toolboxes.) That stated, I like what I’ve seen of R and, if I had the opportunity to start from scratch and were unconstrained by work, I’d probably give it a shot. (FWIW, I’m a “late adopter” of new technology. If not pushed along by external forces I’d probably still program in FORTRAN and create graphs on paper;-)

]]>Speaking as a non-economist, I liked MHE; I gather from the reading I’ve done that careless/sloppy/overhyped concern with finding clean (clever) instruments is a major meta-issue in econometrics, but I didn’t get the feeling that Angrist and Pischke themselves were going overboard. To support this, Noam Scheiber says in his piece “How freakonomics is ruining the dismal science”

> The early practitioners of this approach–Angrist, Krueger, Card–had well-earned reputations as crafty researchers. But, by and large, all three men used their creativity to chip away at important questions. It was only in the late ’90s that the signs of overreach became apparent.

To stretch an analogy a bit, blaming Angrist & Pischke for abuse of instrumental variables feels like blaming Fisher/Neyman/Pearson for abuse of p-values.

]]>This Book is Not good! It reduces statistics to gimmicks. Seriously, all the hype about “clever instruments” without care about everything else is damaging terribly economists.

]]>Second “Statistical Rethinking” by Richard McElreath – very thoughtful metaphors and a guided pathway to Stan.

(Also similar/in line with what I was heading for here https://phaneron0.wordpress.com/2012/11/23/two-stage-quincunx-2/ )

]]>G

My (generally positive) review of Mostly Harmless Economics is here.

]]>Its solidly grounded from a theory perspective in almost everything you’d encounter at givewell. Regression, IV, Fixed effects, DinD, and panels. ]]>

I’d also suggest, “Statistical Rethinking” by Richard McElreath as I am deeply in love with the style of the book. It also has a corresponding video course:

http://xcelab.net/rm/statistical-rethinking/