Skip to content

“The Book of Why” by Pearl and Mackenzie

Judea Pearl and Dana Mackenzie sent me a copy of their new book, “The book of why: The new science of cause and effect.”

There are some things I don’t like about their book, and I’ll get to that, but I want to start with a central point of theirs with which I agree strongly.

Division of labor

A point that Pearl and Mackenzie make several times, even if not quite in this language, is that there’s a division of labor between qualitative and quantitative modeling.

The models in their book are qualitative, all about the directions of causal arrows. Setting aside any problems I have with such models (I don’t actually think the “do operator” makes sense as a general construct, for reasons we’ve discussed in various places on this blog from time to time), the point is that these are qualitative, on/off statements. They’re “if-then” statements, not “how much” statements.

Statistical inference and machine learning focuses on the quantitative: we model the relationship between measurements and the underlying constructs being measured; we model the relationships between different quantitative variables; we have time-series and spatial models; we model the causal effects of treatments and we model treatment interactions; and we model variation in all these things.

Both the qualitative and the quantitative are necessary, and I agree with Pearl and Mackenzie that typical presentations of statistics, econometrics, etc., can focus way too strongly on the quantitative without thinking at all seriously about the qualitative aspects of the problem. It’s usually all about how to get the answer given the assumptions, and not enough about where the assumptions come from. And even when statisticians write about assumptions, they tend to focus on the most technical and least important ones, for example in regression focusing on the relatively unimportant distribution of the error term rather than the much more important concerns of validity and additivity.

If all you do is set up probability models, without thinking seriously about their connections to reality, then you’ll be missing a lot, and indeed you can make major errors in casual reasoning, as James Heckman, Donald Rubin, Judea Pearl, and many others have pointed out. And indeed Heckman, Rubin, and Pearl have (each in their own way) advocated for substantive models, going beyond data description to latch on to underlying structures of interest.

Pearl and Mackenzie’s book is pretty much all about qualitative models; statistics textbooks such as my own have a bit on qualitative models but focus on the quantitative nuts and bolts. We need both.

Judea Pearl, like Jennifer Hill and Frank Sinatra, are right that “you can’t have one without the other”: If you think you’re working with a purely qualitative model, it turns out that, no, you’re actually making lots of data-based quantitative decisions about which effects and interactions you decide are real and which ones you decide are not there. And if you think you’re working with a purely quantitative model, no, you’re really making lots of assumptions (causal or otherwise) about how your data connect to reality.
Continue reading ‘“The Book of Why” by Pearl and Mackenzie’ »

Did she really live 122 years?

Even more famous than “the Japanese dude who won the hot dog eating contest” is “the French lady who lived to be 122 years old.”

But did she really?

Paul Campos points us to this post, where he writes:

Here’s a statistical series, laying out various points along the 100 longest known durations of a particular event, of which there are billions of known examples. The series begins with the 100th longest known case:

100th: 114 years 93 days

90th: 114 years 125 days

80th: 114 years 182 days

70th: 114 years 208 days

60th: 114 years 246 days

50th: 114 years 290 days

40th: 115 years 19 days

30th: 115 years 158 days

20th: 115 years 319 days

10th: 116 years 347 days

9th: 117 years 27 days

8th: 117 years 81 days

7th: 117 years 137 days

6th: 117 years 181 days

5th: 117 years 230 days

4th: 117 years 248 days

3rd: 117 years 260 days

Based on this series, what would you expect the second-longest and the longest known durations of the event to be?

These are the maximum verified — or as we’ll see “verified” — life spans achieved by human beings, at least since it began to be possible to measure this with some loosely acceptable level of scientific accuracy . . .

Given the mortality rates observed between ages 114 and 117 in the series above, it would be somewhat surprising if anybody had actually reached the age of 118. Thus it’s very surprising to learn that #2 on the list, an American woman named Sarah Knauss, lived to be 119 years and 97 days. That seems like an extreme statistical outlier, and it makes me wonder if Knauss’s age at death was recorded correctly (I know nothing about how her age was verified).

But the facts regarding the #1 person on the list — a French woman named Jeanne Calment who was definitely born in February of 1875, and was determined to have died in August of 1997 by what was supposedly all sorts of unimpeachable documentary evidence, after reaching the astounding age of 122 years, 164 days — are more than surprising. . . .

A Russian mathematician named Nikolay Zak has just looked into the matter, and concluded that, despite the purportedly overwhelming evidence that made it certain beyond a reasonable doubt that Calment reached such a remarkable age, it’s actually quite likely, per his argument, that Jeanne Calment died in the 1930s, and the woman who for more than 20 years researchers all around the world considered to be the oldest person whose age had been “conclusively” documented was actually her daughter, Yvonne. . . .

I followed the link and read Zak’s article, and . . . I have no idea.

The big picture is that, after age 110, the probability of dying is about 50% per year. For reasons we’ve discussed earlier, I don’t think we should take this constant hazard rate too seriously. But if we go with that, and we start with 100 people reaching a recorded age of 114, we’d expect about 50 to reach 115, 25 to reach 116, 12 to reach 117, 6 to reach 118, 3 to reach 119, etc. . . . so 122 is not at all out of the question. So I don’t really buy Campos’s statistical argument, which all seems to turn on there being a lot of people who reached 117 but not 118, which in turn is just a series of random chances that can just happen.

Although I have nothing to add to the specific question of Jeanne or Yvonne Calment, I do have some general thoughts on this story:

– It’s stunning to me how these paradigm shifts come up, where something that everybody believes is true, is questioned. I’ve been vaguely following discussions about the maximum human lifespan (as in the link just above), and the example of Calment comes up all the time, and I’d never heard anyone suggest her story might be fake. According to Zak, there had been some questioning, but it it didn’t go far enough for me to have heard about it.

Every once in awhile we hear about these exciting re-thinkings of the world. Sometimes it seems that turn out to be right (for example, that story about the asteroid collision that indirectly killed the dinosaurs. Or, since we’re on the topic, the story that modern birds are dinosaurs’ descendants). Other times these new ideas seem to have been dead ends (for example, claim that certain discrepancies in sex ratios could be explained by hepatitis). As Joseph Delaney discusses in the context of the latter example, sometimes an explanation can be too convincing, in some way. The challenge is to value paradigm-busting ideas without falling in love with them.

– The Calment example is a great illustration of Bayesian inference. Bayesian reasoning should lead us to be skeptical of Calment’s claimed age. Indeed, as Zak notes, Bayesian reasoning should lead us to be skeptical of any claim on the tail of any distribution. Those 116-year-olds and 117-year-olds on Campos’s list above: we should be skeptical of each of them too. It’s just simple probabilistic reasoning: there’s some baseline probability that anyone’s claimed age will be fake, and if the distribution of fake ages has wider tails than the distribution of real ages, then an extreme claimed age is some evidence of an error. The flip side is that there must be some extreme ages out there that we haven’t heard about.

– The above discussion also leads to a sort of moral hazard of Bayesian inference: If we question the extreme reported ages without correspondingly researching other ages, we’ll be shrinking our distribution. As Phil and I discuss in our paper, All maps of parameters are misleading, there’s no easy solution to this problem, but we at least should recognize it.

P.S. Campos adds:

I hadn’t considered that the clustering at 117 is probably just random, but of course that makes sense. Calment does seem like a massive outlier, and as you say from a Bayesian perspective the fact that she’s such an outlier makes the potential holes in the validation of her age more probable than otherwise. What I don’t understand about the inheritance fraud theory is that Jeanne’s husband lived until 1942, eight years after Jeanne’s hypothesized death. It would be unusual, I think, for French inheritance law not to give a complete exemption to a surviving spouse for any inheritance tax liability (that’s the case in the legal systems I know something about), but I don’t know anything about French inheritance law.

The seminar speaker contest begins: Jim Thorpe (1) vs. John Oliver

As promised, we’ll be having one contest a day for our Ultimate Seminar Speaker contest, first going through the first round of our bracket, then going through round 2, etc., through to the finals.

Here’s the bracket:

And now we begin! The first matchup is Jim Thorpe, seeded #1 in the GOATs category, vs. John Oliver, unseeded in the TV personalities category.

This is a tough one. Jim Thorpe is the GOAT of GOATs, arguably the greatest athlete ever lived, and with an interesting personal story as well. On the other hand, John Oliver is an undeniably entertaining speaker.

Remember the rules:

We’re trying to pick ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

It’s your duty as commenters to give the strongest and amusing arguments on both sides. So go to it! As you know, the comments are the fuel on which this blog runs.

On deck for the first half of 2019

OK, this is what we’ve got for you:


  • “The Book of Why” by Pearl and Mackenzie
  • Reproducibility and Stan
  • MRP (multilevel regression and poststratification; Mister P): Clearing up misunderstandings about

  • Becker on Bohm on the important role of stories in science
  • This is one offer I can refuse
  • How post-hoc power calculation is like a shit sandwich

  • Storytelling: What’s it good for?
  • Freud expert also a Korea expert
  • Data partitioning as an essential element in evaluation of predictive properties of a statistical method

  • A thought on the hot hand in basketball and the relevance of defense
  • A ladder of responses to criticism, from the most responsible to the most destructive
  • “Either the results are completely wrong, or Nasa has confirmed a major breakthrough in space propulsion.”

  • Moneyball for evaluating community colleges
  • Of butterflies and piranhas
  • Science as an intellectual “safe space”?

  • Just when you thought it was safe to go back into the water . . . SHARK ATTACKS in the Journal of Politics
  • One more reason to remove letters of recommendation when evaluating candidates for jobs or scholarships.
  • When doing regression (or matching, or weighting, or whatever), don’t say “control for,” say “adjust for”

  • If this article portrays things accurately, the nutrition literature is in even worse shape than I thought
  • What should JPSP have done with Bem’s ESP paper, back in 2010? Click to find the surprisingly simple answer!
  • The bullshit asymmetry principle

  • “Objective: Generate evidence for the comparative effectiveness for each pairwise comparison of depression treatments for a set of outcomes of interest.”
  • Autodiff! (for the C++ jockeys in the audience)
  • Principal Stratification on a Latent Variable (fitting a multilevel model using Stan)

  • Of multiple comparisons and multilevel models
  • If you want to measure differences between groups, measure differences between groups.
  • New estimates of the effects of public preschool

  • Facial feedback is back
  • The Stan Core Roadmap
  • Fitting big multilevel regressions in Stan?

  • Fitting multilevel models when the number of groups is small
  • Our hypotheses are not just falsifiable; they’re actually false.
  • “Using 26,000 diary entries to show ovulatory changes in sexual desire and behavior”

  • Votes vs. $
  • Should he go to grad school in statistics or computer science?
  • Michael Crichton on science and storytelling

  • Simulation-based statistical testing in journalism
  • More on that horrible statistical significance grid
  • P-hacking in study of “p-hacking”?

  • “Do you have any recommendations for useful priors when datasets are small?”
  • Deterministic thinking meets the fallacy of the one-sided bet
  • Are GWAS studies of IQ/educational attainment problematic?

  • I believe this study because it is consistent with my existing beliefs.
  • Healthier kids: Using Stan to get more information out of pediatric respiratory data
  • “News Release from the JAMA Network”

  • Kevin Lewis has a surefire idea for a project for the high school Science Talent Search
  • HMC step size: How does it scale with dimension?
  • Does diet soda stop cancer? Two Yale Cancer Center docs have diametrically opposite views!

  • Evidence distortion in clinical trials
  • Separated at birth?
  • “Light Privilege? Skin Tone Stratification in Health among African Americans”

  • George Orwell meets statistical significance: “Politics and the English Language” applied to science
  • Good news! Researchers respond to a correction by acknowledging it and not trying to dodge its implications
  • “Yes, not only am I suspicious of the claims in that op-ed, I’m also suspicious of all the individual claims from the links in these two sentences”

  • Journalist seeking scoops is as bad as scientist doing unreplicable research
  • Yes on design analysis, No on “power,” No on sample size calculations
  • (back to basics:) How is statistics relevant to scientific discovery?

  • A corpus in a single survey!
  • The neurostatistical precursors of noise-magnifying statistical procedures in infancy
  • Not Dentists named Dennis, but Physicists named Li studying Li

  • Remember that paper we wrote, The mythical swing voter? About shifts in the polls being explainable by differential nonresponse? Mark Palko beat us to this idea, by 4 years.
  • Political polarization and gender gap
  • Junk science + Legal system = Disaster

  • Yes, you can include prior information on quantities of interest, not just on parameters in your model
  • From the Stan forums: “I’m just very thirsty to learn and this thread has become a fountain of knowledge”
  • “No, cardiac arrests are not more common on Monday mornings, study finds”

  • One more reason I hate letters of recommendation
  • Statistical-significance thinking is not just a bad way to publish, it’s also a bad way to think
  • Estimating treatment effects on rates of rare events using precursor data: Going further with hierarchical models.

  • Are male doctors better for male heart attack patients and female doctors better for female heart attack patients?
  • When and how do politically extreme candidates get punished at the polls?
  • He asks me a question, and I reply with a bunch of links

  • New golf putting data! And a new golf putting model!
  • Balancing rigor with exploration
  • Yes, I really really really like fake-data simulation, and I can’t stop talking about it.

  • Should we talk less about bad social science research and more about bad medical research?
  • Mister P for surveys in epidemiology — using Stan!
  • Here’s a puzzle: Why did the U.S. doctor tell me to drink more wine and the French doctor tell me to drink less?

  • Surgeon promotes fraudulent research that kills people; his employer, a leading hospital, defends him and attacks whistleblowers. Business as usual.
  • A world of Wansinks in medical research: “So I guess what I’m trying to get at is I wonder how common it is for clinicians to rely on med students to do their data analysis for them, and how often this work then gets published”
  • An interview with Tina Fernandes Botts

  • How to approach a social science research problem when you have data and a couple different ways you could proceed?
  • “Boston Globe Columnist Suspended During Investigation Of Marathon Bombing Stories That Don’t Add Up”
  • Here’s an idea for not getting tripped up with default priors . . .

  • Impact of published research on behavior and avoidable fatalities
  • How did our advice about research ethics work out, three years later?
  • What’s a good default prior for regression coefficients? A default Edlin factor of 1/2?

  • “The Long-Run Effects of America’s First Paid Maternity Leave Policy”: I need that trail of breadcrumbs.
  • Question on multilevel modeling reminds me that we need a good modeling workflow (building up your model by including varying intercepts, slopes, etc.) and a good computing workflow
  • “Heckman curve” update: The data don’t seem to support the claim that human capital investments are most effective when targeted at younger ages.

  • Why “bigger sample size” is not usually where it’s at.
  • Emile Bravo and agency
  • “How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions” . . . and still stays around even after it’s been retracted

  • Prestigious journal publishes sexy selfie study
  • What sort of identification do you get from panel data if effects are long-term? Air pollution and cognition example.
  • Treatment interactions can be hard to estimate from data.

  • Works of art that are about themselves
  • All statistical conclusions require assumptions.
  • State-space models in Stan

  • The network of models and Bayesian workflow, related to generative grammar for statistical models
  • No, its not correct to say that you can be 95% sure that the true value will be in the confidence interval
  • Differential effects of research trauma on fatigue and functioning of journal editors in chronic sloppy research syndrome

  • Claims about excess road deaths on “4/20” don’t add up
  • Here’s a supercool controversy for ya
  • Wanted: Statistical success stories

  • R-squared for multilevel models
  • “Appendix: Why we are publishing this here instead of as a letter to the editor in the journal”
  • Ballot order update

  • “Incentives to Learn”: How to interpret this estimate of a varying treatment effect?
  • Conditioning on post-treatment variables when you expect self-selection
  • “How many years do we lose to the air we breathe?” Or not.

  • How to think scientifically about scientists’ proposals for fixing science
  • Continuing discussion of status threat and presidential elections, with discussion of challenge of causal inference from survey data
  • “Boosting intelligence analysts’ judgment accuracy: What works, what fails?”

  • “One should always beat a dead horse because the horse is never really dead”
  • Olivia Goldhill and Jesse Singal report on the Implicit Association Test
  • Automation and judgment, from the rational animal to the irrational machine

  • Do regression structures affect research capital? The case of pronoun drop
  • Difference-in-difference estimators are a special case of lagged regression
  • The Arkansas paradox

  • Gremlin time: “distant future, faraway lands, and remote probabilities”
  • That illusion where you think the other side is united and your side is diverse
  • Maintenance cost is quadratic in the number of features

  • Poetry corner
  • Bayesian analysis of data collected sequentially: it’s easy, just include as predictors in the model any variables that go into the stopping rule.
  • BizStat: Modeling performance indicators for deals

  • Scandal! Mister P appears in British tabloid.
  • “We see MRP as a way to combine all the data—pre-election voter file data, early voting, precinct results, county results, polling—into a single framework”
  • “In 1997 Latanya Sweeney dramatically demonstrated that supposedly anonymized data was not anonymous,” but “Over 20 journals turned down her paper . . . and nobody wanted to fund privacy research that might reach uncomfortable conclusions.”

  • On the term “self-appointed” . . .
  • Hey, people are doing the multiverse!
  • Vigorous data-handling tied to publication in top journals among public heath researchers

  • Software for multilevel conjoint analysis in marketing
  • Neural nets vs. statistical models
  • I’m no expert

  • John Le Carre is good at integrating thought and action
  • Donald J. Trump and Robert E. Lee
  • Pushing the guy in front of the trolley

  • Crystallography Corner: The result is difficult to reproduce, but the result is still valid.
  • They’re working for the clampdown
  • What pieces do chess grandmasters move, and when?

  • Let’s publish everything.
  • Why edit a journal?
  • Should we mind if authorship is falsified?

  • Solutions to the 15 questions on our applied regression exam

So, yeah, the usual range of topics.

P.S. I listed the posts in groups of 3 just for easier readability. There’s no connection between the three posts in each batch.

Objective Bayes conference in June

Christian Robert points us to this Objective Bayes Methodology Conference in Warwick, England in June. I’m not a big fan of the term “objective Bayes” (see my paper with Christian Hennig, Beyond subjective and objective in statistics), but the conference itself looks interesting, and there are still a few weeks left for people to submit posters.

Announcing the ultimate seminar speaker contest: 2019 edition!

Paul Davidson made the bracket for us (thanks, Paul!):

Here’s the full list:

Wits:

Oscar Wilde (seeded 1 in group)
Dorothy Parker (2)
David Sedaris (3)
Voltaire (4)
Veronica Geng
Albert Brooks
Mel Brooks
Monty Python

Creative eaters:

M. F. K. Fisher (1)
Julia Child (2)
Anthony Bourdain (3)
Alice Waters (4)
A. J. Liebling
Nora Ephron
The Japanese dude who won the hot dog eating contest
John Belushi

Magicians:

Harry Houdini (1)
George H. W. Bush (2)
Penn and Teller (3)
Steve Martin (4)
David Blaine
Eric Antoine
Martin Gardner
Ira Glass

Mathematicians:

Carl Friedrich Gauss (1)
Pierre-Simon Laplace (2)
John von Neumann (3)
Alan Turing (4)
Leonhard Euler
Paul Erdos
Stanislaw Ulam
Benoit Mandelbrot

TV personalities:

Oprah Winfrey (1)
Johnny Carson (2)
Ed Sullivan (3)
Carol Burnett (4)
Sid Caesar
David Letterman
Ellen DeGeneres
John Oliver

People from New Jersey:

Bruce Springsteen (1)
Chris Christie (2)
Frank Sinatra (3)
Philip Roth (4)
William Carlos Williams
Virginia Apgar
Meryl Streep
Joe Pesci

GOATs:

Jim Thorpe (1)
Babe Didrikson Zaharias (2)
LeBron James (3)
Bobby Fischer (4)
Serena Williams
Pele
Simone Biles
Lance Armstrong

People whose names end in f:

Riad Sattouf (1)
Ian McKellen (2)
Boris Karloff (3)
Darrell Huff (4)
Yakov Smirnoff
DJ Jazzy Jeff
Adam Schiff
Anastasia Romanoff

The rules!

We’re trying to pick ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

Our new list includes eight current or historical figures from each of the eight categories listed above.

I’ll post one matchup each day at noon, starting tomorrow.

Once each pairing is up, all of you can feel free (indeed, are encouraged) to comment. I’ll announce the results when posting the next day’s matchup.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

As with our previous contest four years ago, we’re continuing the regular flow of statistical modeling, causal inference, and social science posts. They’ll be in their usual 9-10am slot, alternate with these matchup postings which will appear at noon each day.

Last time we did this competition was a few years ago. See here and here for the first two contests, here for an intermediate round, and here for the conclusion of that one.

I’m stoked for this new tournament. The above bracket features some interesting pairings. Johnny Carson vs. Paul Erdos! Frank Sinatra vs. Virginia Apgar! Carl Friedrich Gauss vs. Nora Ephron! Harry Houdini vs. Yakov Smirnoff! Julia Child vs. Ira Glass! Lots of upsets are possible.

“Dissolving the Fermi Paradox”

Jonathan Falk writes:

A quick search seems to imply that you haven’t discussed the Fermi equation for a while.

This looks to me to be in the realm of Miller and Sanjurjo: a simple probabilistic explanation sitting right under everyone’s nose. Comment?

“This” is a article, Dissolving the Fermi Paradox, by Anders Sandberg, Eric Drexler and Toby Ord, which begins:

The Fermi paradox is the conflict between an expectation of a high ex ante probability of intelligent life elsewhere in the universe and the apparently lifeless universe we in fact observe. The expectation that the universe should be teeming with intelligent life is linked to models like the Drake equation, which suggest that even if the probability of intelligent life developing at a given site is small, the sheer multitude of possible sites should nonetheless yield a large number of potentially observable civilizations. We show that this conflict arises from the use of Drake-like equations, which implicitly assume certainty regarding highly uncertain parameters. . . . When the model is recast to represent realistic distributions of uncertainty, we find a substantial ex ante probability of there being no other intelligent life in our observable universe . . . This result dissolves the Fermi paradox, and in doing so removes any need to invoke speculative mechanisms by which civilizations would inevitably fail to have observable effects upon the universe.

I solicited thoughts from astronomer David Hogg, who wrote:

I have only skimmed it, but it seems reasonable. Life certainly could be rare, and technological life could be exceedingly rare. Some of the terms do have many-order-of-magnitude uncertainties.

That said, we now know that a large fraction of stars host planets and many host planets similar to the Earth, so the uncertainties on planet-occurrence terms in any Drake-like equation are now much lower than order-of-magnitude.

And Hogg forwarded the question to another astronomer, Jason Wright, who wrote:

The original questioner’s question (Thomas Basbøll’s submission from December) is addressed explicitly here.

In short, only the duration of transmission matters in steady-state, which is the final L term in Drake’s famous equation. Start time does not matter.

Regarding Andrew’s predicate “given that we haven’t hard any such signals so far” in the OP: despite the high profile of SETI, almost no actual searching has occurred because the field is essentially unfunded (until Yuri Milner’s recent support). Jill Tarter analogizes the idea that we need to update our priors based on the searching to date as being equivalent to saying that there must not be very many fish in the ocean based on inspecting the contents of a single drinking glass dipped in it (that’s a rough OOM, but it’s pretty close). And that’s just searches for narrowband radio searches; other kinds of searches are far, far less complete.

And Andrew is not wrong that the amount of popular discussion of SETI has gone way down since the ’90’s. A good account of the rise and fall of government funding for SETI is Garber (1999).

I have what I think is a complete list of NASA and NSF funding since the (final) cancellation of NASA’s SETI work in 1993, and it sums to just over $2.5M (not per year—total). True, Barnie Oliver and Paul Allen contributed many millions more, but most of this went to develop hardware and pay engineers to build the (still incomplete and barely operating) Allen Telescope Array; it did not train students or fund much in the way of actual searches.

So you haven’t heard much about SETI because there’s not much to say. Instead, most of the literature is people in their space time endlessly rearranging, recalculating, reinventing, modifying, and critiquing the Drake Equation, or offering yet another “solution” to the Fermi Paradox in the absence of data.

The central problem is that for all of the astrobiological terms in the Drake Equation we have a sample size on 1 (Earth), and since that one is us we run into “anthropic principle” issues whenever we try to use it to estimate those terms.

The recent paper by Sandberg calculates reasonable posterior distributions on N in the Drake Equation, and indeed shows that they are so wide that N=0 is not excluded, but the latter point has been well appreciated since the equation was written down, so this “dissolution” to the Fermi Paradox (“maybe spacefaring life is just really rare”) is hardly novel. It was the thesis of the influential book Rare Earth and the argument used by Congress as a justification for blocking essentially all funding to the field for the past 25 years.

Actually, I would say that an equally valid takeaway from the Sandberg paper is that very large values of N are possible, so we should definitely be looking for them!

So make of that what you will.

P.S. I posted this in July 2018. The search for extraterrestrial intelligence is one topic where I don’t think much is lost in our 6-month blog delay.

Back by popular demand . . . The Greatest Seminar Speaker contest!

Regular blog readers will remember our seminar speaker competition from a few years ago.

Here was our bracket, back in 2015:

image

And here were the 64 contestants:

– Philosophers:
Plato (seeded 1 in group)
Alan Turing (seeded 2)
Aristotle (3)
Friedrich Nietzsche (4)
Thomas Hobbes
Jean-Jacques Rousseau
Bertrand Russell
Karl Popper

– Religious Leaders:
Mohandas Gandhi (1)
Martin Luther King (2)
Henry David Thoreau (3)
Mother Teresa (4)
Al Sharpton
Phyllis Schlafly
Yoko Ono
Bono

– Authors:
William Shakespeare (1)
Miguel de Cervantes (2)
James Joyce (3)
Mark Twain (4)
Jane Austen
John Updike
Raymond Carver
Leo Tolstoy

– Artists:
Leonardo da Vinci (1)
Rembrandt van Rijn (2)
Vincent van Gogh (3)
Marcel Duchamp (4)
Thomas Kinkade
Grandma Moses
Barbara Kruger
The guy who did Piss Christ

– Founders of Religions:
Jesus (1)
Mohammad (2)
Buddha (3)
Abraham (4)
L. Ron Hubbard
Mary Baker Eddy
Sigmund Freud
Karl Marx

– Cult Figures:
John Waters (1)
Philip K. Dick (2)
Ed Wood (3)
Judy Garland (4)
Sun Myung Moon
Charles Manson
Joan Crawford
Stanley Kubrick

– Comedians:
Richard Pryor (1)
George Carlin (2)
Chris Rock (3)
Larry David (4)
Alan Bennett
Stewart Lee
Ed McMahon
Henny Youngman

– Modern French Intellectuals:
Albert Camus (1)
Simone de Beauvoir (2)
Bernard-Henry Levy (3)
Claude Levi-Strauss (4)
Raymond Aron
Jacques Derrida
Jean Baudrillard
Bruno Latour

We did single elimination, one match per day, alternating with the regular blog posts. See here and here for the first two contests, here for an intermediate round, and here for the conclusion.

2019 edition

Who would be the ultimate seminar speaker? I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

Our new list includes eight current or historical figures from each of the following eight categories:
– Wits
– Creative eaters
– Magicians
– Mathematicians
– TV hosts
– People from New Jersey
– GOATs
– People whose names end in f

All these categories seem to be possible choices to reach the sort of general-interest intellectual community that was implied by the [notoriously hyped] announcement of Slavoj Zizek Bruno Latour’s visit to Columbia a few years ago.

The rules

I’ll post one matchup each day at noon, starting sometime next week or so, once we have the brackets prepared.

Once each pairing is up, all of you can feel free (indeed, are encouraged) to comment. I’ll announce the results when posting the next day’s matchup.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

As with our previous contest four years ago, we’re continuing the regular flow of statistical modeling, causal inference, and social science posts. They’ll alternate with these matchup postings.

Robin Pemantle’s updated bag of tricks for math teaching!

Here it is! He’s got the following two documents:

– Tips for Active Learning in the College Setting

– Tips for Active Learning in Teacher Prep or in the K-12 Setting

This is great stuff (see my earlier review here).

Every mathematician and math teacher in the universe should read this. So, if any of you happen to be well connected to the math world, please pass this along.

Published in 2018

Enjoy. They’re listed in approximate reverse chronological order of publication date, so I guess some of the articles at the top of the list will be officially published in 2019.

What to do when you read a paper and it’s full of errors and the author won’t share the data or be open about the analysis?

Someone writes:

I would like to ask you for an advice regarding obtaining data for reanalysis purposes from an author who has multiple papers with statistical errors and doesn’t want to share the data.

Recently, I reviewed a paper that included numbers that had some of the reported statistics that were mathematically impossible. As the first author of that paper wrote another paper in the past with one of my collaborators, I have checked their paper and also found multiple errors (GRIM, DF, inappropriate statistical tests, etc.). I have enquired my collaborator about it and she followed up with the first author who has done the analysis and said that he agreed to write an erratum.

Independently, I have checked further 3 papers from that author and all of them had a number of errors, which sheer number is comparable to what was found in Wansink’s case. At that stage I have contacted the first author of these papers asking him about the data for reanalysis purposes. As the email was unanswered, after 2 weeks I have followed up mentioning this time that I have found a number of errors in these papers and included his lab’s contact email address. This time I received a response swiftly and was told that these papers were peer-reviewed so if there were any errors they would have been caught (sic!), that for privacy reasons the data cannot be shared with me and I was asked to send a list of errors that I found. In my response I sent the list of errors and emphasized the importance of independent reanalysis and pointed out that the data comes from lab experiments and any personally identifiable information can be removed as it is not needed for reanalysis. After 3 weeks of waiting, and another email sent in the meantime, the author wrote that he is busy, but had time to check the analysis of one of the papers. In his response, he said that some of the mathematically impossible DFs were wrongly copied numbers, while the inconsistent statistics were due to wrong cells in the excel file selected that supposedly don’t change much. Moreover, he blamed the reviewers for not catching these mistypes (sic!) and said that he found the errors only after I contacted him. The problem is that it is the same paper for which my collaborator said that they checked the results already, so he must have been aware of these problems even before my initial email (I didn’t mention that I know that collaborator).

So here is my dilemma how to proceed. Considering that there are multiple errors, of multiple types across multiple papers it is really hard to trust anything else reported in them. The author clearly does not intend to share the data with me so I cannot verify if the data exists at all. If it doesn’t, as I have sent him the list of errors, he could reverse engineer what tools I have used and come up with numbers that will pass the tests that can be done based solely on the reported statistics.

As you may have more experience dealing with such situations, I thought that I may ask you for an advice how to proceed. Would you suggest contacting the involved publishers, going public or something else?

My reply:

I hate to say it, but your best option here might be to give up. The kind of people who lie and cheat about their published work may also play dirty in other ways. So is it really worth it to tangle with these people? I have no idea about your particular case and am just speaking on general principles here.

You could try contacting the journal editor. Some journal editors really don’t like to find out that they’ve published erroneous work; others would prefer to sweep any such problems under the rug, either because they have personal connections to the offenders or just because they don’t want to deal with cheaters, as this is unpleasant.

Remember: journal editing is a volunteer job, and people sign up for it because they want to publish exciting new work, or maybe because they enjoy the power trip, or maybe out of a sense of duty—but, in any case, they typically aren’t in it for the controversy. So, if you do get a journal editor who can help on this, great, but don’t be surprised if the editors slink away from the problem, for example by putting the burden in your lap by saying that your only option is to submit your critique in the form of an article for the journal, which can then be sent to the author of the original paper for review, and then rejected on the grounds that it’s not important enough to publish.

Maybe you could get Retraction Watch to write something on this dude?

Also is the paper listed on PubPeer? If so, you could comment there.

“Principles of posterior visualization”

What better way to start the new year than with a discussion of statistical graphics.

Mikhail Shubin has this great post from a few years ago on Bayesian visualization. He lists the following principles:

Principle 1: Uncertainty should be visualized

Principle 2: Visualization of variability ≠ Visualization of uncertainty

Principle 3: Equal probability = Equal ink

Principle 4: Do not overemphasize the point estimate

Principle 5: Certain estimates should be emphasized over uncertain

And this caution:

These principles (as any visualization principles) are contextual, and should be used (or not used) with the goals of this visualization in mind.

And this is not just empty talk. Shubin demonstrates all these points with clear graphs.

Interesting how this complements our methods for visualization in Bayesian workflow.

Authority figures in psychology spread more happy talk, still don’t get the point that much of the published, celebrated, and publicized work in their field is no good (Part 2)

Part 1 was here.

And here’s Part 2. Jordan Anaya reports:

Uli Schimmack posted this on facebook and twitter.

I [Anaya] was annoyed to see that it mentions “a handful” of unreliable findings, and points the finger at fraud as the cause. But then I was shocked to see the 85% number for the Many Labs project.

I’m not that familiar with the project, and I know there is debate on how to calculate a successful replication, but they got that number from none other than the “the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%” people, as Sanjay Srivastava discusses here.

Schimmack identifies the above screenshot as being from Myers and Twenge (2018); I assume it’s this book, which has the following blurb:

Connecting Social Psychology to the world around us. Social Psychology introduces students to the science of us: our thoughts, feelings, and behaviors in a changing world. Students learn to think critically about everyday behaviors and gain an appreciation for the world around us, regardless of background or major.

But according to Schimmack, there’s “no mention of a replication failure in the entire textbook.” That’s fine—it’s not necessarily the job of an intro textbook to talk about ideas that didn’t work out—but then why mention replications in the first place? And why try to minimize it by talking about “a handful of unreliable findings”? A handful, huh? Who talks like that. This is a “Politics and the English Language” situation, where sloppy language serves sloppy thinking and bad practice.

Also, to connect replication failures to “fraud” is just horrible, as it’s consistent with two wrong messages: (a) that to point out a failed replication is to accuse someone of fraud, and (b) that, conversely, honest researchers can’t have replication failures. As I’ve written a few zillion times, honesty and transparency are not enuf. As I wrote here, it’s a mistake to focus on “p-hacking” and bad behavior rather than the larger problem of researchers expecting routine discovery.

So, the blurb for the textbook says that students learn to think critically about everyday behaviors—but they won’t learn to think critically about published research in the field of psychology.

Just to be clear: I’m not saying the authors of this textbook are bad people. My guess is they just want to believe the best about their field of research, and enough confused people have squirted enough ink into the water to confuse them into thinking that the number of unreliable findings really might be just “a handful,” that 85% of experiments in that study replicated, that the replication rate in psychology is statistically indistinguishable from 100%, that elections are determined by shark attacks and college football games, that single women were 20 percentage points more likely to support Barack Obama during certain times of the month, that elderly-priming words make you walk slower, that Cornell students have ESP, etc etc etc. There are lots of confused people out there, not sure where to turn, so it makes sense that some textbook writers will go for the most comforting possible story. I get it. They’re not trying to mislead the next generation of students; they’re just doing their best.

There are no bad guys here.

Let’s just hope 2019 goes a little better.

A good start would be for the authors of this book to send a public note to Uli Schimmack thanking them for pointing out their error, and then replacing that paragraph with something more accurate in their next printing. They could also write a short article for Perspectives on Psychological Science on how they got confused on this point, as this could be instructive for other teachers of psychology. They don’t have to do this. They can do whatever they want. But this is my suggestion how they could get 2019 off to a good start, in one small way.

Combining apparently contradictory evidence

I want to write a more formal article about this, but in the meantime here’s a placeholder.

The topic is the combination of apparently contradictory evidence.

Let’s start with a simple example: you have some ratings on a 1-10 scale. These could be, for example, research proposals being rated by a funding committee, or, umm, I dunno, gymnasts being rated by Olympic judges. Suppose there are 3 judges doing the ratings, and consider two gymnasts: one receives ratings of 8, 8, 8; the other is rated 6, 8, 10. Or, forget about ratings, just consider students taking multiple exams in a class. Consider two students: Amy, whose three test scores are 80, 80, 80; and Beth, who had scores 80, 100, 60. (I’ve purposely scrambled the order of those last three so that we don’t have to think about trends. Forget about time trends; that’s not my point here.)

How to compare those two students? A naive reader of test scores will say that Amy is consistent while Beth is flaky; or you might even say that you think Beth is better as she has a higher potential. But if you have some experience with psychometrics, you’ll be wary of overinterpreting results from three exam scores. Inference about an average from N=3 is tough; inference about variance from N=3 is close to impossible. Long story short: from a psychometrics perspective, there’s very little you can say about the relative consistency of Amy and Beth’s test-taking based on just three scores.

Academic researchers will recognize this problem when considering reviews of their own papers that they’ve submitted to journals. When you send in a paper, you’ll typically get a few reviews, and these reviews can differ dramatically in their messages.

Here’s a hilarious example supplied to me by Wolfgang Gaissmaier and Julian Marewski, from reviews of their 2011 article, “Forecasting elections with mere recognition from small, lousy samples: A comparison of collective recognition, wisdom of crowds, and representative polls.”

Here are some positive reviewer comments:

– This is a very interesting piece of work that raises a number of important questions related to public opinion. The major finding — that for elections with large numbers of parties, small non-probability samples looking only at party name recognition do as well as medium-sized probility samples looking at voter intent — is stunning.

– There is a lot to like about this short paper… I’m surprised by the strength of the results… If these results are correct (and I have no real reason to suspect otherwise), then the authors are more than justified in their praise of recognition-based forecasts. This could be an extremely useful forecasting technique not just for the multi-party European elections discussed by the authors, but also in relatively low-salience American local elections.

– This is concise, high-quality paper that demonstrates that the predictive power of (collective) recognition extends to the important domain of political elections.

And now the fun stuff. The negative comments:

– This is probably the strangest manuscript that I have ever been asked to review… Even if the argument is correct, I’m not sure that it tells us anything useful. The fact that recognition can be used to predict the winners of tennis tournaments and soccer matches is unsurprising – people are more likely to recognize the better players/teams, and the better players/teams usually win. It’s like saying that a football team wins 90% (or whatever) of the games in which it leads going into the fourth quarter. So what?

– To be frank, this is an exercise in nonsense. Twofold nonsense. For one thing, to forecast election outcomes based on whether or not voters recognize the parties/candidates makes no sense… Two, why should we pay any attention to unrepresentative samples, which is what the authors use in this analysis? They call them, even in the title, “lousy.” Self-deprecating humor? Or are the authors laughing at a gullible audience?

So, their paper is either “a very interesting piece of work” whose main finding is “stunning”—or it is “an exercise in nonsense” aimed at “a gullible audience.”

“Check yourself before you wreck yourself: Assessing discrete choice models through predictive simulations”

Timothy Brathwaite sends along this wonderfully-titled article (also here, and here’s the replication code), which begins:

Typically, discrete choice modelers develop ever-more advanced models and estimation methods. Compared to the impressive progress in model development and estimation, model-checking techniques have lagged behind. Often, choice modelers use only crude methods to assess how well an estimated model represents reality. Such methods usually stop at checking parameter signs, model elasticities, and ratios of model coefficients. In this paper, I [Brathwaite] greatly expand the discrete choice modelers’ assessment toolkit by introducing model checking procedures based on graphical displays of predictive simulations. . . . a general and ‘semi-automatic’ algorithm for checking discrete choice models via predictive simulations. . . .

He frames model checking in terms of “underfitting,” a connection I’ve never seen before but which makes sense. To the extent that there are features in your data that are not captured in your model—more precisely, features that don’t show up, even in many different posterior predictive simulations from your fitted model—then, yes, the model is underfitting the data. Good point.

Using multilevel modeling to improve analysis of multiple comparisons

Justin Chumbley writes:

I have mused on drafting a simple paper inspired by your paper “Why we (usually) don’t have to worry about multiple comparisons”.

The initial idea is simply to revisit frequentist “weak FWER” or “omnibus tests” (which assume the null everywhere), connecting it to a Bayesian perspective. To do this, I focus on the distribution of the posterior maximum or extrema (not the maximum a posteriori point estimate) of the joint posterior, given a data-set simulated under the omnibus null hypothesis. This joint posterior may be, for example, defined on a set of a priori exchangeable random coefficients in a multilevel model: it’s maxima just encodes my posterior belief in the magnitude of the largest of those coefficients (which “should” be zero for this data) and can be estimated for example by MCMC. The idea is that hierarchical Bayesian extreme values helpfully contract to zero with the number of coefficients in this setting, while non-hierarchical frequentist extreme values increase. The latter being more typically quantified by other “error” parameters such as FWER “multiple comparisons problem” or MSE “overfitting”. Thus, this offers a clear way to show that hierarchical inference can automatically control the (weak) FWER, without Bonferroni-style adjustments to the test threshold. Mathematically, I imagine some asymptotic – in the number of coefficients – argument for this behavior of the maxima, that I would need time or collaboration to formalize (I am not a mathematician by any means). In any case, the intuition is that because posterior coefficients are all increasingly shrunk, so is their maximum. I have chosen to study the maxima because it is applicable across the very different hierarchical and frequentist models used in practice in the fields I work on (imaging, genomics): spatial, cross-sectional, temporal, neither or both. For example, the posterior maximum is defined for a discretely indexed, exchangeable random process, or a continuously-indexed, non-stationary process. As a point of interest, frequentist distribution of spatial maxima is used for standard style multiple-comparisons adjusted p-values in mainstream neuroimaging, e.g. SPM.

I am very keen to learn more about the possible pros or cons of the idea above.
-Its “novelty”
– How it fares relative to alternative Bayesian omnibus “tests”, e.g. based on comparison of posterior model probabilities for an omnibus null model – a degenerate spike prior – versus some credible alternative model.
-How generally it might be formalized.
-How to integrate type II error and bias into the framework.
… and any more!

My reply:

This idea is not really my sort of thing—I’d prefer a more direct decision analysis on the full posterior distribution. But given that many researchers are interested in hypothesis testing but still want to do something better than classical null hypothesis significance testing, I thought there might be interest in these ideas. So I’m sharing them with the blog readership. Comment away!

Back to the Wall

Jim Windle writes:

Funny you should blog about Jaynes. Just a couple of days ago I was looking for something in his book’s References/Bibliography (it along with “Godel, Escher, Bach” and “Darwin’s Dangerous Idea” have bibliographies which I find not just useful but entertaining), and ran across something I wanted to send you but I was going to wait until I could track down a copy of the actual referenced paper. But since Jayne’s is the current topic here the cited work and his comment which I thought might amuse you relating to our previous exchange. From “References”:

Boring, E.G. (1955), ‘The present status of parapsychology’, Am. Sci., 43, 108-16
Concludes that the curious phenomena to be studied is the behavior of parapsychologists. Points out that, having observed any fact, attempts to prove that no natural explanation of it exists are logically impossible; one cannot prove a universal negative (quantum theorist who deny the existence of casual explanations please take note)

And just for the record, I’m more comfortable with quantum uncertainty, to the extent I understand it, than Jaynes. And I don’t fully agree about not being able to prove a negative. The ancient Greeks proved long again that there’s no largest prime number. I guess you just have to be careful about how you define the negative.

Amusing, and of course it relates to some of our recent discussions about unreplicable work in the social and behavioral sciences, including various large literatures which seem to be based on little more than the shuffling of noise, the ability of certain theories to explain any possible patterns in data, and the willingness of journals to publish any sort of junk as long as it combines an attractive storyline with “p less than 0.05.”

It’s only been 63 years, I guess no reason to expect much progress!

What is probability?

This came up in a discussion a few years ago, where people were arguing about the meaning of probability: is it long-run frequency, is it subjective belief, is it betting odds, etc? I wrote:

Probability is a mathematical concept. I think Martha Smith’s analogy to points, lines, and arithmetic is a good one. Probabilities are probabilities to the extent that they follow the Kolmogorov axioms. (Let me set aside quantum probability for the moment.) The different definitions of probabilities (betting, long-run frequency, etc), can be usefully thought of as models rather than definitions. They are different examples of paradigmatic real-world scenarios in which the Kolmogorov axioms (thus, probability).

Probability is a mathematical concept. To define it based on any imperfect real-world counterpart (such as betting or long-run frequency) makes about as much sense as defining a line in Euclidean space as the edge of a perfectly straight piece of metal, or as the space occupied by a very thin thread that is pulled taut. Ultimately, a line is a line, and probabilities are mathematical objects that follow Kolmogorov’s laws. Real-world models are important for the application of probability, and it makes a lot of sense to me that such an important concept has many different real-world analogies, none of which are perfect.

We discuss some of these different models in chapter 1 of BDA.

P.S. There’s been some discussion and I’d like to clarify my key point, why I wrote this post. My concern is that I’ve read lots of articles and books that claim to give the single correct foundation of probability, which might be uncertainty, betting, or relative frequency, or coherent decision making, or whatever. My point is that none of these frameworks is the foundation of probability; rather, probability is a mathematical concept which applies to various problems, including long-run frequencies, betting, uncertainty, decision making, statistical inference, etc. In practice, probability is not a perfect model for any of these scenarios: long-run frequencies are in practice not stationary, betting depends on your knowledge of the counterparty, uncertainty includes both known and unknown unknowns, decision making is open-ended, and statistical inference is conditional on assumptions that in practice will be false. That said, probability can be a useful tool for all these problems.

“Thus, a loss aversion principle is rendered superfluous to an account of the phenomena it was introduced to explain.”

What better day than Christmas, that day of gift-giving, to discuss “loss aversion,” the purported asymmetry in utility, whereby losses are systematically more painful than gains are pleasant?

Loss aversion is a core principle of the heuristics and biases paradigm of psychology and behavioral economics.

But it’s been controversial for a long time.

For example, back in 2005 I wrote about the well-known incoherence that people express when offered small-scale bets. (“If a person is indifferent between [x+$10] and [55% chance of x+$20, 45% chance of x], for any x, then this attitude cannot reasonably be explained by expected utility maximization. The required utility function for money would curve so sharply as to be nonsensical (for example, U($2000)-U($1000) would have to be less than U($1000)-U($950)).”)

When Matthew Rabin and I had (separately) published papers about this in 1998 and 2000, we’d attributed the incoherent risk-averse attitude at small scales to “loss aversion” and “uncertainty aversion.” But, as pointed out by psychologist Deb Frisch, it can’t be loss aversion, as the way the problem is set up above, no losses are involved. I followed up that “uncertainty aversion” could be logically possible but I didn’t find that labeling so convincing either; instead:

I’m inclined to attribute small-stakes risk aversion to some sort of rule-following. For example, it makes sense to be risk averse for large stakes, and a natural generalization is to continue that risk aversion for payoffs in the $10, $20, $30 range. Basically, a “heuristic” or a simple rule giving us the ability to answer this sort of preference question.

By the way, I’ve used the term “attitude” above, rather than “preference.” I think “preference” is too much of a loaded word. For example, suppose I ask someone, “Do you prefer $20 or [55% chance of $30, 45% chance of $10]?” If he or she says, “I prefer the $20,” I don’t actually consider this any sort of underlying preference. It’s a response to a question. Even if it’s set up as a real choice, where they really get to pick, it’s just a preference in a particular setting. But for most of these studies, we’re really talking about attitudes.

The topic came up again the next year, in the context of the (also) well-known phenomenon that, when it comes to political attitudes about the government, people seem to respond to the trend rather than the absolute level of the economy. Again, I felt that terms such as “risk aversion” and “loss aversion” were being employed as all-purpose explanations for phenomena that didn’t really fit these stories.

And then, in the midst of all that, David Gal published an article, “A psychological law of inertia and the illusion of loss aversion,” in the inaugural issue of the Journal of Judgment and Decision Making, saying:

The principle of loss aversion is thought to explain a wide range of anomalous phenomena involving tradeoffs between losses and gains. In this article, I [Gal] show that the anomalies loss aversion was introduced to explain — the risky bet premium, the endowment effect, and the status-quo bias — are characterized not only by a loss/gain tradeoff, but by a tradeoff between the status-quo and change; and, that a propensity towards the status-quo in the latter tradeoff is sufficient to explain these phenomena. Moreover, I show that two basic psychological principles — (1) that motives drive behavior; and (2) that preferences tend to be fuzzy and ill-defined — imply the existence of a robust and fundamental propensity of this sort. Thus, a loss aversion principle is rendered superfluous to an account of the phenomena it was introduced to explain.

I’d completely forgotten about this article until learning recently of a new review article by Gal and Derek Rucker, “The Loss of Loss Aversion: Will It Loom Larger Than Its Gain?”, making this point more thoroughly:

Loss aversion, the principle that losses loom larger than gains, is among the most widely accepted ideas in the social sciences. . . . The upshot of this review is that current evidence does not support that losses, on balance, tend to be any more impactful than gains.

But if loss aversion is unnecessary, why do psychologists and economists keep talking about it? Gal and Rucker write:

The third part of this article aims to address the question of why acceptance of loss aversion as a general principle remains pervasive and persistent among social scientists, including consumer psychologists, despite evidence to the contrary. This analysis aims to connect the persistence of a belief in loss aversion to more general ideas about belief acceptance and persistence in science.

In Table 1 of their paper, Gal and Rucker consider several phenomena, all of which are taken to provide evidence of loss aversion, can be easily explained in other ways. Here are the phenomena they talk about:

– Status quo bias

– Endowment effect

– Risky bet premium

– Hedonic impact ratings

– Sunk cost effect

– Price elasticity

– Equity risk premium

– Disposition effect

– Loss/gain framing.

The article also comes with discussions by Tory Higgins and Nira Liberman and Itamar Simonson and Ran Kivetz and rejoinder by Gal and Rucker.

June is applied regression exam month!

So. I just graded the final exams for our applied regression class. Lots of students made mistakes which gave me the feeling that I didn’t teach the material so well. So I thought it could help lots of people out there if I were to share the questions, solutions, and common errors.

It was an in-class exam with 15 questions. I’ll post the questions and solutions, one at a time, for the first half of June, following the model of my final exam for Design and Analysis of Sample Surveys from a few years ago. Enjoy.