Skip to content

Bruce Springsteen (1) vs. Veronica Geng; Monty Python advances

Yesterday’s contest wasn’t particularly close, as it pitted a boring guy who got lucky with one famous book, vs. some very entertaining wits. I saw Life of Brian when it came out, and I think I laughed harder at that spaceship scene than any other time in my life. In any case, Ethan brings it home with the comment, “I want to hear Monty Python riff on pystan,” and Dzhaughn seals it with, “How many at Hormel owe their terrible disgusting jobs to this comedy troupe? Canned meat was dead in the water. Literally. But now “Spam” is on everyone’s lips. . . .” And this, from Tom: “There is the opportunity here for a seminar ending with a hall full of people singing ‘Always look on the bright side . . .’ It is that or listen to someone whose name ends in F.” f, actually, as we’re not shouting here. Whatever.

Today we conclude the first half of the draw with the top seeded person from New Jersey, against another person from our Wits category. Unseeded or not, Veronica Geng was very witty. The Boss would pull in the crowds, but maybe Geng works better for a university audience. What do you all think?

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Just when you thought it was safe to go back into the water . . . SHARK ATTACKS in the Journal of Politics

We’ve been here before.

Back in 2002, political scientists Chris Achen and Larry Bartels presented a paper “Blind Retrospection – Electoral Responses to Drought, Flu and Shark Attacks.” Here’s a 2012 version in which the authors trace “the electoral impact of a clearly random event—a dramatic series of shark attacks in New Jersey in 1916” and claim to “show that voters in the affected communities significantly punished the incumbent president, Woodrow Wilson, at the polls,” a finding that has been widely discussed in political science over the past several years and was featured in Achen and Bartels’s recent book, Democracy for Realists.

In 2016, Anthony Fowler and Andy Hall reanalyzed the data and concluded that “the evidence is, at best, inconclusive”:

First, we assemble data on every fatal shark attack in U.S. history and county-level returns from every presidential election between 1872 and 2012, and we find little systematic evidence that shark attacks hurt incumbent presidents or their party. Second, we show that Achen and Bartels’ finding of fatal shark attacks hurting Woodrow Wilson’s vote share in the beach counties of New Jersey in 1916 becomes substantively smaller and statistically weaker under alternative specifications. Third, we find that their town-level result for beach townships in Ocean County significantly shrinks when we correct errors associated with changes in town borders and does not hold for the other beach counties in New Jersey. Lastly, implementing placebo tests in state-elections where there were no shark attacks, we demonstrate that Achen and Bartels’ result was likely to arise even if shark attacks do not influence elections. Overall, there is little compelling evidence that shark attacks influence presidential elections, and any such effect—if one exists—appears to be substantively negligible.

Fowler and Hall published their article in the Journal of Politics, and Achen and Bartels replied with a comment of their own, including these points:

Attributing to us [Achen and Bartels] the notion that, in general, “shark attacks influence presidential elections,” much less that “irrelevant events generally influence presidential elections”, reflects a profound misreading of our argument. As we spelled out, the 1916 attacks were politically relevant because substantial economic losses ensued and the president was explicitly blamed. Neither of those things is true of the typical shark attack; thus, we would not expect it to matter at the polls. . . .

Fowler and Hall’s reanalysis of county-level voting patterns in New Jersey in 1916 produces “substantively smaller and statistically weaker” estimates of the impact of shark attacks on electoral support for Woodrow Wilson “under alternative specifications” . . . Even so, their point estimates mostly differ only modestly from ours. The estimates become statistically insignificant only because their ahistorical bad statistical fits inflate standard errors and thus make the t-statistics smaller. We show that a variety of regression models that get the politics right all fit better than Fowler and Hall’s. Those models all show a substantively and statistically significant shark effect. . . .

Fowler and Hall employ a series of “placebo tests” comparing election outcomes in coastal and noncoastal counties to suggest that “Achen and Bartels’s result for New Jersey in 1916 was somewhat likely to arise even if shark attacks have no effect on presidential elections” . . . They find that 27% of these comparisons produce “statistically significant” differences in the vote swing from one election to the next in counties bordering the ocean; hence, they argue that other factors besides shark attacks could have produced the marked electoral shift in Jersey Shore counties in 1916. But Fowler and Hall provide no indication of what those other factors might be. . . .

This last point seems to represent a statistical misunderstanding on the part of Achen and Bartels. Fowler and Hall’s point regarding the placebo tests—a point which I think is valid—is that correlations in the dataset make statistical significance easier to attain than would be expected under the simple default model of independent outcomes. Thus, the statistical significance of Achen and Bartels’s original analysis cannot be taken as strong evidence: the patterns they found could be explained by various systematic differences between counties unrelated to shark attacks.

Achen and Bartels are right that Fowler and Hall do not “demonstrate that the shark attacks made no difference.” That’s right. What Fowler and Hall find is there is no strong evidence for any effects of shark attacks.

Step back a minute. Before Achen and Bartels (2002, 2012), I assume that few scholars would’ve considered shark attacks to have important effects in 1916 or in any other presidential election. The Achen and Bartels papers made the surprising claim that, yes, shark attacks did make a difference, and they took this as evidence for their “blind retrospection” theory. As I wrote earlier when discussing all this, I think Achen and Bartels have some good points in their book; their larger arguments do not rely on the validity of that shark-attack study. Anyway, the point is that Fowler and Hall don’t need to demonstrate that shark attacks made no difference. The burden is on Achen and Bartels to support their counterintuitive statement that shark attacks mattered. Or, to put it another way, you can believe that shark attacks mattered in 1916, even if the data don’t really show it. All sorts of things could’ve mattered, and you can’t disprove any of them.

OK, fine. That all said, Achen and Bartels have two substantive points to make. The first is that irrelevant events can matter in presidential elections if “substantial economic losses ensued and the president was explicitly blamed.” The second is that Fowler and Hall’s estimates are similar to theirs, and that Fowler and Hall find non-statistical-significance only by fitting a crappy model and thus obtaining artificially high standard errors.

Now let’s turn to Fowler and Hall’s reply in that same journal.

I’ll pull out two parts of this reply.

First, regarding the effects of shark attacks:

[Achen and Bartels] agree with us that shark attacks do not, in general, lead voters to punish incumbents, stating that they “would not expect” the “typical shark attack” to affect a presidential election. This consensus is important since many readers have thought that their claims were stronger and more general than they are. . . . Writing in Pacific Standard, Seth Masket states that “voters punish their leaders for . . . shark attacks.” In his review of Achen and Bartels’s book in the Journal of Politics, Neil Malhotra writes that “voters frequently punish incumbents for things they cannot control such as shark attacks.” Achen and Bartels have now clarified that their claim is specific to only the shark attacks in New Jersey in 1916. . . . Future discussions of “blind retrospection” should take note of this new consensus. Rather than claiming that shark attacks indicate a general failure of electoral accountability, Achen and Bartels say that voters do not blame incumbents for shark attacks in about 99 percent of all recorded shark attacks in American history.

Well put.

Second, regarding the new analyses that give larger standard errors and thus diminish the claims of strong evidence regarding the 1916 election:

We show that the 1912 election, which Achen and Bartels use to control for the baseline political preferences of counties and towns, was anomalous. Figure 3 of our paper clearly shows that their county-level result is driven by the unusualness of 1912, not anything that happened in 1916. . . .

We show that Achen and Bartels’ standard errors are misleading. If we apply their inferential strategy to state-elections with no shark attacks, we detect an effect as large as theirs 32 percent of the time, and the estimate is statistically significant (p < .05) 27 percent of the time.

To see more on this, I recommend you go back and look at our earlier post, in particular this graph of adjusted data from Achen and Bartels:

screen-shot-2016-10-25-at-6-35-16-pm

and this graph of raw data from Fowler and Hall:

screen-shot-2016-10-29-at-9-07-53-am

I agree with Fowler and Hall that the evidence isn’t nearly as strong as implied by Achen and Bartels’s regressions.

Let me be clear: I don’t think that what Achen and Bartels did is any sort of scandal. They have an interesting idea regarding blind retrospection, the shark attack example is a cool case study, and yes their data are consistent with no effect of shark attacks but their data are also consistent with a positive effect, perhaps for the reasons they stated regarding economic costs and the president being blamed. They did some analyses which confirmed their beliefs and they published. Fair enough. Later, some other researchers looked at their data more carefully and found the evidence, both for this particular case and for shark attacks more generally, to not be so strong. That’s how we move forward.

Fowler adds:

Andy Hall and I wrote about this a bit in the conclusion of our sharks paper, but it’s very difficult to show evidence either way regarding the competence of rationality of voters. For one, lots of seemingly irrational behaviors have rational explanations. For example, it’s not necessarily irrational for voters to change their beliefs about their elected officials as a result of shark attacks–maybe the voters learned that the government doesn’t have their back when a major crisis comes their way. And even then, once you’ve adopted irrationality or incompetence as your “theory,” there’s nothing constraining your empirical testing. Maybe sharks affect people in the beach towns, maybe the beach counties, maybe the coastal counties, or maybe the whole state. You can run 10,000 different regressions and each one is just as (poorly) grounded in theory as the next, so all bets are off. And whichever regression gives you the desired result, you just say “well, that must be how irrational voting works.”

One fun solution would be for people like Chris Achen and Larry Bartels to get together with people like me and Andy Hall to come up with some ex-ante tests of rationality/competence/etc. We could all agree on some compelling tests that would partly adjudicate some of these debates and then go out and run the experiments or collect the relevant data.

P.S. I have no financial conflicts of interest here, but in the interests of full disclosure I should inform you that I’ve been involved in disputes with Larry Bartels before. In one dispute, Bartels and I were on the same side (this was in dealing with the annoying Thomas Frank); in the other case, we disagreed with each other. So, sometimes we agree, sometimes we don’t.

Darrell Huff (4) vs. Monty Python; Frank Sinatra advances

In yesterday’s battle of the Jerseys, Jonathan offered this comment:

Sinatra is an anagram of both artisan and tsarina. Apgar has no English anagram. Virginia is from New Jersey. Sounds confusing.

And then we got this from Dzhaughn:

I got as far as “Nancy’s ancestor,” and then a Youtube clip of Joey Bishop told me, pal, stop he’s a legend, he don’t need no backronym from you or anybody. He don’t need no google doodle, although it would have been a respectful gesture on his 100th birthday, but nevermind. He’s a legend, and he’s against someone who puts people to sleep. Professionally.

Good point. As much as I’d love to see Apgar, we can’t have a seminar speaker who specializes in putting people to sleep. So it will be Frank facing Julia in the second round.

Today, we have the #4 seed in the “People whose name ends in f” category, vs. an unseeded entry in the Wits category. (Yes, Monty Python is an amazing group, but the Wits category is a tough one; seedings are hard to come by when you’re competing with the likes of Oscar Wilde and Dorothy Parker.)

Darrell Huff is a bit of a legend in statistics, or used to be, based on his incredibly successful book from 1954, How to Lie with Statistics. But the guy didn’t really understand statistics; he was a journalist who wrote that one book and then went on to other things, most notoriously working on a book, How to Lie with Smoking Statistics, which was paid for by the cigarette industry but was never completed, or at least never published. Huff could talk about how to lie with statistics firsthand—but I suspect his knowledge of statistics was simplistic enough that he might not have even known what he was doing.

As for Monty Python: You know who they are. I have nothing to add on that account.

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Science as an intellectual “safe space”? How to do it right.

I don’t recall hearing the term “safe space” until recently, but now it seems to be used all the time, by both the left and the right, to describe an environment where people can feel free to express opinions that might be unpopular in a larger community, without fear of criticism or contradiction.

Sometimes a safe space is taken to be a good thing—a sort of hothouse garden in which ideas can be explored and allowed to grow rapidly in an environment free of natural enemies or competition—and other times it’s taken to be a bad thing, a place where ideas will not develop in useful ways.

The short version is that people sometimes (but not always) want safe spaces for themselves, but they typically are derisive of safe spaces for people they disagree with. Safe spaces are a little like protective tariffs in economics: if you feel strong, you might not see the need for anyone to have a safe space; if you feel weak, or if you have weak allies, you might want those protective zones.

Psychology journals as safe spaces

This all came up when I was thinking what seemed to me to be exaggeratedly defensive reactions of scientists (in particular, some psychology researchers) to criticism of their published work. To me, if you publish your work, it’s public, and you should welcome criticism and active engagement with your ideas. But, to many researchers, it seems that praise is OK but skepticism is unwelcome. And in some cases researchers go all out and attack their critics. What these researchers seem to be looking for is a safe space. But to me it seems ridiculous to publish a paper in a public journal, promote it all over the public news media, and then claim object to criticism. This sort of behavior seems roughly equivalent to fencing off some area in a public park and then declaring it private property and only admitting your friends. Actually, even worse than that, because prominent psychologists use their safe space to spread lies about people. So it’s more like fencing off some area in a public park and then declaring it private property and then using it to launch missiles against people who you perceive as threatening your livelihood.

“Freakonomics” as a safe space

Another example came up a few years ago when Kaiser Fung and I wrote an article expressing a mix of positive and negative attitudes toward the Freakonomics franchise. One of the Freaknomics authors responded aggressively to us, and in retrospect I think he wanted Freakonomics to be a sort of safe space for economic thinking. The idea, perhaps, was that “thinking like an economist” is counterintuitive and sometimes unpopular, and so it’s a bad idea for outsiders such as Kaiser and me to go in and criticize. If the Freaknomics team are correct in their general themes (the importance of incentives, the importance of thinking like an economist, etc.), then we’re being counterproductive to zoom in on details they may have gotten wrong.

Safe spaces in science

I have no problem with my work being criticized; indeed, I see it as a key benefit of publication that more people can see what I’ve done and find problems with it.

That said, I understand the need for safe spaces. Just for example, I don’t share the first draft of everything I write. Or, to step back even further, suppose I’m working on a math problem and I want to either prove statement X or find a counterexample. Then it can be helpful to break the task in two, and separately try to find the proof or find the counterexample. When working on the proof, you act as if you know that X is true, and when searching for the counterexample, you act as if you know X is false. Another example is group problem solving, where it’s said to be helpful to have a “brainstorming session” in which people throw ideas on the table without expectation or fear of criticism. At some point you want to hear about longshot ideas, and it can be good to have some sort of safe space where these speculations can be shared without being immediately crushed.

My suggestion

So here’s my proposal. If you want a safe space for your speculations, fine: Just label what you’re doing as speculation, not finished work, and if NPR or anyone else interviews you about it, please be clear that you’re uncertain about these ideas and, as far as you’re concerned, these ideas remain in a no-criticism zone, a safe space where they can be explored without concern that people will take them too seriously.

Frank Sinatra (3) vs. Virginia Apgar; Julia Child advances

My favorite comment from yesterday came from Ethan, who picked up on the public TV/radio connection and rated our two candidate speakers on their fundraising abilities. Very appropriate for the university—I find myself spending more and more time raising money for Stan, myself. A few commenters picked up on Child’s military experience. I like the whole shark repellent thing, as it connects to the whole “shark attacks determine elections” story. Also, Jeff points out that “a Julia win would open at least the possibility of a Wilde-Child semifinal,” and Diana brings up the tantalizing possibility that Julia Grownup would show up. That would be cool. I looked up Julia Grownup and it turns out she was on Second City too!

As for today’s noontime matchup . . . What can I say? New Jersey’s an amazing place. Hoboken’s own Frank Sinatra is only the #3 seed of our entries from that state, and he’s pitted against Virginia Apgar, an unseeded Jerseyite. Who do you want to invite for our seminar: the Chairman of the Board, or a pioneering doctor who’s a familiar name to all parents of newborns?

Here’s an intriguing twist: I looked up Apgar on wikipedia and learned that she came from a musical family! Meanwhile, Frank Sinatra had friends who put a lot of people in the hospital. So lots of overlap here.

You can evaluate the two candidates on their own merits, or based on who has a better chance of besting Julia Child in round 2.

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

The butterfly effect: It’s not what you think it is.

John Cook writes:

The butterfly effect is the semi-serious claim that a butterfly flapping its wings can cause a tornado half way around the world. It’s a poetic way of saying that some systems show sensitive dependence on initial conditions, that the slightest change now can make an enormous difference later . . . Once you think about these things for a while, you start to see nonlinearity and potential butterfly effects everywhere. There are tipping points everywhere waiting to be tipped!

But it’s not so simple. Cook continues:

A butterfly flapping its wings usually has no effect, even in sensitive or chaotic systems. You might even say especially in sensitive or chaotic systems.

Sensitive systems are not always and everywhere sensitive to everything. They are sensitive in particular ways under particular circumstances, and can otherwise be quite resistant to influence.

And:

The lesson that many people draw from their first exposure to complex systems is that there are high leverage points, if only you can find them and manipulate them. They want to insert a butterfly to at just the right time and place to bring about a desired outcome. Instead, we should humbly evaluate to what extent it is possible to steer complex systems at all. We should evaluate what aspects can be steered and how well they can be steered. The most effective intervention may not come from tweaking the inputs but from changing the structure of the system.

Yes! That’s an excellent, Deming-esque point.

Bradley Groff pointed be to the above-linked post and noted the connection to my recent note on the piranha principle, where I wrote:

A fundamental tenet of social psychology, behavioral economics, at least how it is presented in the news media, and taught and practiced in many business schools, is that small “nudges,” often the sorts of things that we might not think would affect us at all, can have big effects on behavior. . . .

The model of the world underlying these claims is not just the “butterfly effect” that small changes can have big effects; rather, it’s that small changes can have big and predictable effects. It’s what I sometimes call the “button-pushing” model of social science, the idea that if you do X, you can expect to see Y. . . .

In response to this attitude, I sometimes present the “piranha argument,” which goes as follows: There can be some large and predictable effects on behavior, but not a lot, because, if there were, then these different effects would interfere with each other, and as a result it would be hard to see any consistent effects of anything in observational data.

I’m thinking of social science and I’m being mathematically vague (I do think there’s a theorem there somewhere, something related to random matrix theory, perhaps), whereas Cook is thinking more of physical systems with a clearer mathematical connection to nonlinear dynamics. But I think our overall points are the same, and with similar implications for thinking about interventions, causal effects, and variation in outcomes.

P.S. This is related to my skepticism of structural equation or path analysis modeling and similar approaches used in some quarters of sociology and psychology for many years and promoted in slightly different form by Judea Pearl and other computer scientists: These methods often seem to me to promise a sort of causal discovery that cannot be realistically delivered and in which in many cases I don’t even think makes sense (see this article, especially the last full paragraph on page 960 and the example on page 962), and I see this as connected with the naive view of the butterfly effect described above, an attitude that if you just push certain buttons in a complex social system that you can get predictable results.

In brief: I doubt that the claims deriving from such data analyses will replicate in new experiments, but I have no doubt that anything that doesn’t replicate will be explained as the results of additional butterflies in the system. What I’d really like is for researchers to just jump to the post-hoc explanation stage before even gathering those new validation data. The threat of replication should be enough to motivate people to back off of some of their extreme claims.

To speak generically:
1. Research team A publishes a paper claiming that X causes Y.
2. Research team B tries to replicate the finding, but it fails to replicate.
3. Research team A explains that the original finding is not so general; it only holds under conditions Z, which contain specifics on the experimental intervention, the people in the study, and the context of the study. The finding only holds if the treatment is done for 1 minute, not 3 minutes; it holds only in warm weather, not cold weather; it holds only in Israel, not in the United States; it works for some sorts of stimulus but not others.
4. Ideally, in the original published paper, team A could list all the conditions under which they are claiming their result will appear. That is, they could anticipate step 2 and jump right to step 3, saving us all a lot of time and effort.

P.P.S. This post was originally called “Of butterflies and piranhas”; after seeing some comments, I changed the title to focus the message.

Julia Child (2) vs. Ira Glass; Dorothy Parker advances

Yesterday we got this argument from Manuel in favor of Biles:

After suffering so many bad gymnastics (mathematical, logical, statistical, you name it) at seminars, to have some performed by a true champion would be a welcome change.

But Parker takes it away, based on this formidable contribution of Dzhaughn:

Things I Have Learned From the Contest So Far:
(Cf. “Resume” by Dorothy Parker)

Thorpe’s 1/8th hashtag
Babe’s just a champ
Oscar is all Gray
Hotdogs cause cramp
Serena’s a whiner
Erdos sylvan
Jeff’s gone ballistic
I might as well win.

Today’s contest features the second seed in the Creative Eaters category against an unseeded magician. (Regular listeners to This American Life will recall that Glass did magic shows when he was about 12 years old, I think it was.) Both have lots of experience performing in front of an audience. So what’ll it be? Public TV or public radio? In either case, the winner will be facing someone from New Jersey in the second round.

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Moneyball for evaluating community colleges

From an interesting statistics-laden piece by “Dean Dad”:

Far more community college students transfer prior to completing the Associate’s degree than actually complete first. According to a new report from the National Student Clearinghouse Research Center, about 350,000 transfer before completion, compared to about 60,000 who complete first.

That matters in several ways.

Most basically, it suggests that measuring community colleges by their graduation rates misses the point. A student who does a year at Brookdale before transferring to Rutgers, and subsequently graduating, got what she wanted, but she shows up in our numbers as a dropout. In states with “performance funding,” the community college could be punished for her decision, even if it was what she intended to do all along. . .

People who only look at “headline” numbers, and don’t bother with the asterisks, look at graduation rates and assume that something is going horribly wrong. But a ratio of 35 to 6 is such a honker of an asterisk that failing to account for it amounts to misrepresentation. . . .

My preferred measures of community college performance would be based on actual student behavior. For example, does the percentage of bachelor’s grads in a given area with community college credits roughly match the percentage of undergrads who are enrolled at community colleges? (Nationally, it does.) If so, then the idea of community colleges as dropout factories is hard to sustain. For programs not built around transfer, how are the employment outcomes? I wouldn’t look at loan repayment rates, just because the percentage of students with loans is so low; it’s a skewed sample. I would look at achievement gaps by race, sex, age, and income. I would look at ROI for public investment, as well as at local reputation. . . .

And a bunch more. I don’t know much about the world of education policy: Maybe some of these things are already being measured? Seems important, in any case.

Dorothy Parker (2) vs. Simone Biles; Liebling advances

I was surprised to see so little action in the comments yesterday. Sure, Liebling’s an obscure figure—I guess at this point he’d be called a “cult writer,” and I just happen to be part of the cult, fan as I am of mid-twentieth-century magazine writing—but I’d’ve thought Bourdain would’ve aroused more interest. Anyway, the best comment was from Ethan, playing it straight and going for Liebling on the strength of his diversity of interests. Even though coming from the Eaters category, he can talk about lots of other topics; in that way, he’s similar to Steve Martin who broke out entirely from the Magicians category where he was situated. On the other side, the best comment in favor of Bourdain was from Sean, who endorsed the celebrity chef but said he went to one of Bourdain’s real-life talks but “left a little disappointed to hear what in large part amounted a collection of some of the best one-liners of No Reservations.”

For today we have the #2 ranked wit, the star of the Algonquin Round Table—no alcohol jokes in the comments, please—vs. the undisputed GOAT of gymnastics. Two completely different talents, and unfortunately only one can advance to the next round. Who should it be?

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

“Either the results are completely wrong, or Nasa has confirmed a major breakthrough in space propulsion.”

Daniel Lakeland points us to this news article by David Hambling from 2014, entitled “Nasa validates ‘impossible’ space drive.” Here’s Hambling:

Nasa is a major player in space science, so when a team from the agency this week presents evidence that “impossible” microwave thrusters seem to work, something strange is definitely going on. Either the results are completely wrong, or Nasa has confirmed a major breakthrough in space propulsion. . . .

He has built a number of demonstration systems, but critics reject his relativity-based theory and insist that, according to the law of conservation of momentum, it cannot work.

According to good scientific practice, an independent third party needed to replicate Shawyer’s results. As Wired.co.uk reported, this happened last year when a Chinese team built its own EmDrive and confirmed that it produced 720 mN (about 72 grams) of thrust, enough for a practical satellite thruster. . . . a US scientist, Guido Fetta, has built his own propellant-less microwave thruster, and managed to persuade Nasa to test it out. The test results were presented on July 30 at the 50th Joint Propulsion Conference in Cleveland, Ohio. Astonishingly enough, they are positive. . . .

OK, that was 3.5 years ago. Any followups? A quick google search revealed this article by Guilio Prisco from 2017, “Theoretical Physicists Are Getting Closer to Explaining How NASA’s ‘Impossible’ EmDrive Works: The EmDrive propulsion system might be able to take us to the stars, but first it must be reconciled with the laws of physics.”

If I wanted to be snarky, I’d say they could do a 2-for-1 deal and power the Em-drive with cold fusion. But my physics knowledge is weak, so I’ll just say . . . who knows, maybe this is the interstellar drive we’ve all been waiting for! I’ll believe it once it appears in PNAS.

Google on Responsible AI Practices

Great and beautifully written advice for any data science setting:

Enjoy.

Anthony Bourdain (3) vs. A. J. Liebling; Steve Martin advances

Yesterday‘s decision was pretty easy, as almost all the commenters talked about Steve Martin, pro and con. Letterman was pretty much out of the picture. Indeed, the best argument in favor of Letterman came from Jonathan, who wrote:

I’ll go with Letterman because he looks like he could use the work.

Conversely, the strongest argument against Martin came from Adam, who wrote:

Steve Martin once said:

I know what you’re saying, you’re saying, “Steve, where do you find time to juggle?” Well, I juggle in my mind. … Whoops.

so that’s the problem: he might just do magic in his head. and that’s no fun to watch.

Then again, along the same lines as zbicyclist, he might be able to shed some light on the stuff you post on here. In the same routine, he said:

And then on the other hand science, you know, is just pure empiricism and by virtue of its method it excludes metaphysics. And uh, I guess I wouldn’t believe in anything if not for my lucky astrology mood watch.

Take the strongest case for Dave, and the strongest case against Steve, and Steve still comes out on top. So, no contest.

And now for today’s contest, featuring two people from the Creative Eaters category. (It’s the nature of the random assignment of unseeded competitors that sometimes two people from the same category will face off in the first round.)

Seeded #3 in the group is legendary globetrotting tell-it-like-it-is chef Anthony Bourdain. You can’t go wrong with Bourdain. But his unseeded opponent is formidable too: A. J. Liebling, one of the greatest and most versatile reporters who’s ever lived, author of The Honest Rainmaker and many other classics and the inspiration for O.G. blogger Mickey Kaus’s invention of the concept of Liebling optimality.

Bourdain was skinny and Liebling was fat; make of that what you will.

So give it your best: this round could turn out to be important!

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

A ladder of responses to criticism, from the most responsible to the most destructive

In a recent discussion thread, I mentioned how I’m feeling charitable toward David Brooks, Michael Barone, and various others whose work I’ve criticized over the years, because their responses have been so civilized and moderate.

Consider the following range of responses to an outsider pointing out an error in your published work:

1. Look into the issue and, if you find there really was an error, fix it publicly and thank the person who told you about it.

2. Look into the issue and, if you find there really was an error, quietly fix it without acknowledging you’ve ever made a mistake.

3. Look into the issue and, if you find there really was an error, don’t ever acknowledge or fix it, but be careful to avoid this error in your future work.

4. Avoid looking into the question, ignore the possible error, act as if it had never happened, and keep making the same mistake over and over.

5. If forced to acknowledge the potential error, actively minimize its importance, perhaps throwing in an “everybody does it” defense.

6. Attempt to patch the error by misrepresenting what you’ve written, introducing additional errors in an attempt to protect your original claim.

7. Attack the messenger: attempt to smear the people who pointed out the error in your work, lie about them, and enlist your friends in the attack.

We could probably add a few more rungs to the latter, but the basic idea is that response 1 is optimal, responses 2 and 3 are unfortunate but understandable, response 4 represents at the very least a lost opportunity for improvement, and responses 5, 6, and 7 increasingly pollute the public discourse.

David Brooks is a pretty solid 4 on that scale, which isn’t great but in retrospect is like a breath of fresh air, given the 6’s and 7’s we’ve been encountering lately.

Most of the responses I’ve seen, in academic research and also the news media, have been 1’s. Or, at worst, 2’s and 3’s. From that perspective, Brooks’s stubbornness (his 4 on the above scale) has been frustrating. But it can, and has, been much worse. So I appreciate that, however Brooks handles criticism of his own writing, he does not go on the attack. Similarly, I was annoyed when Gregg Easterbrook did response 2, but, in retrospect, that 2 doesn’t seem so bad at all.

As I said, I put the above into a comment thread, but I thought it’s something we might want to refer to more generally, so it’s convenient to give it its own post.

Steve Martin (4) vs. David Letterman; Serena Williams advances

Yesterday‘s matchup featured a food writer vs. a tennis player, two professions that are not known for public speaking. The best arguments came in the very first two comments. Jeff wrote:

Fisher’s first book was “Serve It Forth,” which seems like good advice in tennis, as well. So, you’d get a two-fer there.

That was fine, but not as good as Jonathan’s endorsement of Williams:

Serena would be great at an academic seminar. Just like academics, she has a contempt for referees, even while purporting to regard them as valuable. Just don’t let the Chair interrupt her!

Which was echoed by Diana:

I was going to root for Fisher (whom I have never read) because her victory would make Auden happy. But then I thought about it some more and realized how incapable anyone is of *making* Auden happy—or unhappy, for that matter. In “The More Loving One,” he writes:

Were all stars to disappear or die,
I should learn to look at an empty sky
And feel its total dark sublime,
Though this might take me a little time.

So with that motive gone or suspended, I vote for Williams. She’s likely to win a few matches before the end, and that’ll be fun. At the seminar itself, she might even treat us to a serve or two (not to mention a referee chew-out, as Jonathan noted). What could go wrong?

Most of that bit was irrelevant, but I’m a sucker for Auden so I liked it anyway.

Today the competition is a bit more serious. Steve Martin is seeded #4 in the Magicians category even though magic is not one of his main talents; and David Letterman, though unseeded in the TV personalities category, knows how to handle an audience. You can take it from there.

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

A thought on the hot hand in basketball and the relevance of defense

I was reading about basketball the other day and a thought came to me about the hot hand . . . There are a bunch of NBA players who could shoot with great accuracy even from long distance if they’re not guarded, right? For example, what would Steph Curry’s success rate be for 30-footers if he weren’t guarded and had time to take a shot? Over 50%?

So then it seems like one way to get a hot hand is to not be guarded, or, more generally, to not be guarded tightly. And I guess this would be even more the case for close-in shots, which just about any NBA player could make at a near-100% rate if there were no defense.

This all suggests that a key part—maybe the most important part—of the hot hand is what defense is on you, and more generally how you handle the defense.

I’m not quite sure what more to say about this right now, but this all seems different from the usual way we talk about the hot hand, with a focus on shooting. It also suggests that it could be a mistake to consider free-throw shooting as somehow a more pure test of the hot hand. If the above speculations are correct, then the hot hand in free-throw shooting is really a completely different thing than the hot-hand for regular shots.

Causal inference data challenge!

Susan Gruber, Geneviève Lefebvre, Tibor Schuster, and Alexandre Piché write:

The ACIC 2019 Data Challenge is Live!
Datasets are available for download (no registration required) at https://sites.google.com/view/ACIC2019DataChallenge/data-challenge (bottom of the page).
Check out the FAQ at https://sites.google.com/view/ACIC2019DataChallenge/faq
The deadline for submitting results is April 15, 2019.

The fourth Causal Inference Data Challenge is taking place as part of the 2019 Atlantic Causal Inference Conference (ACIC) to be held in Montreal, Canada
(https://www.mcgill.ca/epi-biostat-occh/news-events/atlantic-causal-inference-conference-2019). The data challenge focuses on computational methods of inferring causal effects from quasi-real world data. This year there are two tracks: low dimensional and high dimensional data. Participants will analyze 3200 datasets in either Track 1 or Track 2 to estimate marginal additive treatment effects and associated 95% confidence intervals. Entries will be evaluated with respect to bias, variance, mean squared error, and confidence interval coverage across a variety of data generating processes.

I’m not a big fan of 95% intervals, and I am aware of the general problems arising from this sort of competition: the problems in the contest are not necessarily similar to the problems to which a particular method might be applied. That said, Jennifer has assured me that she and others learned a lot from the results of previous competitions in this series, so on that basis I encourage all of you to take a look and check out this one.

M. F. K. Fisher (1) vs. Serena Williams; Oscar Wilde advances

The best case yesterday was made by Manuel:

Leave Joe Pesci at home alone. Wilde’s jokes may be very old, but he can use slides from The PowerPoint of Dorian Gray.

As Martha put it, not great, but the best so far in this thread.

On the other side, Jonathan wrote, “I’d definitely rather hear Wilde, but I hate it when speakers aren’t live, and the video connections with Reading Gaol are lousy.”—which wasn’t bad—but then he followed it up with, “Please, though. No Frankie Valli stories.” If even the best Pesci endorsement is so lukewarm, we’ll have to go with Oscar to face off against hot dog guy in the next round.

Today our contest features the #1 food writer of all time vs. an unseeded GOAT. I’ve never actually read anything by M. F. K. Fisher but the literary types rave about her, hence her top seeding in that category. As for Serena Williams, I did go to the U.S. Open once but only to see some of those free qualifying rounds. So this particular matchup is a bit of a mystery to me. Whaddya got?

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Data partitioning as an essential element in evaluation of predictive properties of a statistical method

In a discussion of our stacking paper, the point came up that LOO (leave-one-out cross validation) requires a partitioning of data—you can only “leave one out” if you define what “one” is.

It is sometimes said that LOO “relies on the data-exchangeability assumption,” but I don’t think that’s quite the right way to put it, but LOO does assume the relevance of a data partition. We discuss this briefly in section 3.5 of this article. For regular Bayes, p(theta|y) proportional to p(y|theta) * p(theta), there is no partition of data. “y” is just a single object. But for loo, y can be partitioned. At first this bothered me about loo, but then I decided that this is a fundamental idea, related to the idea of “internal replication” discussed by Ripley in his spatial statistics book. The idea is that with just “y” and no partitions, there is no internal replication and no statistically general way of making reliable statements about new cases.

This is similar to (but different from) the distinction in chapter 6 of BDA between the likelihood and the sampling distribution. To do inference for a given model, all we need from the data is the likelihood function. But to do model checking, we need the sampling distribution, p(y|theta), which implies a likelihood function but requires more assumptions (as can be seen, for example, in the distinction between binomial and negative binomial sampling). Similarly, to do inference for a given model, all we need is p(y|theta) with no partitioning of y, but to do predictive evaluation we need a partitioning.

Oscar Wilde (1) vs. Joe Pesci; the Japanese dude who won the hot dog eating contest advances

Raghuveer gave a good argument yesterday: “The hot dog guy would eat all the pre-seminar cookies, so that’s a definite no.” But this was defeated by the best recommendation we’ve ever had in the history of the Greatest Seminar Speaker contest, from Jeff:

Garbage In, Garbage Out: Mass Consumption and Its Aftermath
Takeru Kobayashi

Note: Attendance at both sessions is mandatory.

Best. Seminar. Ever.

So hot dog guy is set to go to the next round, against today’s victor.

It’s the wittiest man who ever lived, vs. an unseeded entry in the People from New Jersey category. So whaddya want: some 125-year-old jokes, or a guy who probably sounds like a Joe Pesci imitator? You think I’m funny? I’m funny how, I mean funny like I’m a clown, I amuse you?

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Does Harvard discriminate against Asian Americans in college admissions?

Sharad Goel, Daniel Ho and I looked into the question, in response to a recent lawsuit. We wrote something for the Boston Review:

What Statistics Can’t Tell Us in the Fight over Affirmative Action at Harvard

Asian Americans and Academics

“Distinguishing Excellences”

Adjusting and Over-Adjusting for Differences

The Evolving Meaning of Merit

Character and Bias

A Path Forward

The Future of Affirmative Action