Skip to content

The Japanese dude who won the hot dog eating contest vs. Oscar Wilde (1); Albert Brooks advances

Yesterday I was going to go with this argument from Ethan:

Now I’m morally bound to use the Erdos argument I said no one would see unless he made it to this round.

Andrew will take the speaker out to dinner, prove a theorem, publish it and earn an Erdos number of 1.

But then Jan pulled in with :

If you get Erdos, he will end up staying in your own place for the next n months, and him being dead, well, let’s say it is probably not going to be pleasant.

To be honest, I don’t even think I’d like a live Erdos staying in our apartment: from what I’ve read, the guy sounds a bit irritating, the kind of person who thinks he’s charming—an attribute that I find annoying.

Anyway, who cares about the Erdos number. What I really want is a good Wansink number. Recall what the notorious food researcher wrote:

Facebook, Twitter, Game of Thrones, Starbucks, spinning class . . . time management is tough when there’s so many other shiny alternatives that are more inviting than writing the background section or doing the analyses for a paper.

Yet most of us will never remember what we read or posted on Twitter or Facebook yesterday. In the meantime, this Turkish woman’s resume will always have the five papers below.

Coauthorship is forever. Those of us with a low Wansink number will live forever in the scientific literature.

And today’s match features an unseeded eater vs. the top-seeded wit. Doesn’t seem like much of a contest for a seminar speaker, but . . . let’s see what arguments you come up with!

Again, here’s the bracket and here are the rules.

More on that horrible statistical significance grid

Regarding this horrible Table 4:

Eric Loken writes:

The clear point or your post was that p-values (and even worse the significance versus non-significance) are a poor summary of data.

The thought I’ve had lately, working with various groups of really smart and thoughtful researchers, is that Table 4 is also a model of their mental space as they think about their research and as they do their initial data analyses. It’s getting much easier to make the case that Table 4 is not acceptable to publish. But I think it’s also true that Table 4 is actually the internal working model for a lot of otherwise smart scientists and researchers. That’s harder to fix!

Good point. As John Carlin and I wrote, we think the solution is not to reform p-values or to replace them with some other statistical summary or threshold, but rather to move toward a greater acceptance of uncertainty and embracing of variation.

Book reading at Ann Arbor Meetup on Monday night: Probability and Statistics: a simulation-based introduction

The Talk

I’m going to be previewing the book I’m in the process of writing at the Ann Arbor R meetup on Monday. Here are the details, including the working title:

Probability and Statistics: a simulation-based introduction
Bob Carpenter
Monday, February 18, 2019
Ann Arbor SPARK, 330 East Liberty St, Ann Arbor

I’ve been to a few of their meetings and I really like this meetup group—a lot more statistics depth than you often get (for example, nobody asked me how to shovel their web site into Stan to get business intelligence). There will be a gang (or at least me!) going out for food and drinks afterward.

I’m still not 100% sure about which parts I’m going to talk about, as I’ve already written 100+ pages of it. After some warmup on the basics of Monte Carlo, I’ll probably do a simulation-based demonstration of the central limit theorem, the curse of dimensionality, and some illustration of (anti-)correlation effects on MCMC, as those are nice encapsulated little case studies I can probably get through in an hour.

The Repository

I’m writing it all in bookdown and licensing it all open source. I’ll probably try to find a publisher, but I’m only going to do so if I can keep the pdf free.

I just opened up the GitHub repo so anyone can download and build it:

I’m happy to take suggestions, but please don’t start filing issues on typos, grammar, etc.—I haven’t even spell checked it yet, much less passed it by a real copy editor. When there’s a more stable draft, I’ll put up a pdf.

Paul Erdos vs. Albert Brooks; Sid Caesar advances

The key question yesterday was, can Babe Didrikson Zaharias do comedy or can Sid Caesar do sports. According to Mark Palko, Sid Caesar was by all accounts extremely physically strong. And I know of no evidence that Babe was funny. So Your Show of Shows will be going into the third round.

And now we have an intriguing contest: a famously immature mathematician who loved to collaborate, vs. an Albert Einstein who didn’t do science. Whaddya think?

Again, here’s the bracket and here are the rules.

Simulation-based statistical testing in journalism

Jonathan Stray writes:

In my recent Algorithms in Journalism course we looked at a post which makes a cute little significance-type argument that five Trump campaign payments were actually the $130,000 Daniels payoff. They summed to within a dollar of $130,000, so the simulation recreates sets of payments using bootstrapping and asks how often there’s a subset that gets that close to $130,000. It concludes “very rarely” and therefore that this set of payments was a coverup.

(This is part of my broader collection of simulation-based significance testing in journalism.)

I recreated this payments simulation in a notebook to explore this. The original simulation checks sets of ten payments, which the authors justify because “we’re trying to estimate the odds of the original discovery, which was found in a series of eight or so payments.” You get about p=0.001 that any set of ten payments gets within $1 of $130,000. But the authors also calculated p=0.1 or so if we choose from 15, and my notebook shows this that goes up rapidly to p=0.8 if you choose 20 payments.

So the inference you make depends crucially on the universe of events you use. I think of this as the denominator in the frequentist calculation. It seems like a free parameter robustness problem, and for me it casts serious doubt on the entire exercise.

My question is: Is there a principled way to set the denominator in a test like this? I don’t really see one.

I’d be much more comfortable with fully Bayesian attempt, modeling the generation process for the entire observed payment stream with and without a Daniels payoff. Then the result would be expressed as a Bayes factor which I would find a lot easier to interpret — and this would also use all available data and require making a bunch of domain assumptions explicit, which strikes me as a good thing.

But I do still wonder if frequentist logic can answer the denominator question here. It feels like I’m bumping up against a deep issue here, but I just can’t quite frame it right.

Most fundamentally, I worry that that there is no domain knowledge in this significance test. How does this data relate to reality? What are the FEC rules and typical campaign practice for what is reported and when? When politicians have pulled shady stuff in the past, how did it look in the data? We desperately need domain knowledge here. For an example of what application of domain knowledge to significance testing looks like, see Carl Bialik’s critique of statistical tests for tennis fixing.

My reply:

As Daniel Lakeland said:

A p-value is the probability of seeing data as extreme or more extreme than the result, under the assumption that the result was produced by a specific random number generator (called the null hypothesis).

So . . . when a hypothesis tests rejects, it’s no big deal; you’re just rejecting the hypothesis that the data where produced by a specific random number generator—which we already knew didn’t happen. But when a hypothesis test doesn’t reject, that’s more interesting: it tells us that we know so little about the data that we can’t reject the hypothesis that the data where produced by a specific random number generator.

It’s funny. People are typically trained to think of rejection (low p-values) as the newsworthy event, but that’s backward.

Regarding your more general point: yes, there’s no substitute for subject-matter knowledge. And the post you linked to above is in error, when it says that a p-value of 0.001 implies that “the probability that the Trump campaign payments were related to the Daniels payoff is very high.” To make this statement is just a mathematical error.

But I do think there are some other ways of going about this, beyond full Bayesian modeling. For example, you could take the entire procedure used in this analysis, and apply it to other accounts, and see what p-values you get.

Sid Caesar vs. Babe Didrikson Zaharias (2); Jim Thorpe advances

Best comment from yesterday came from Dalton:

Jim Thorpe isn’t from Pennsylvania, and yet a town there renamed itself after him. DJ Jazzy Jeff is from Pennsylvania, and yet Will Smith won’t even return his phone calls. Until I can enjoy a cold Yuengling in Jazzy Jeff, PA it’s DJ Jumpin’ Jim for the win.

And today’s second-round bout features a comedic king versus a trailblazing athlete. I have no idea if Babe was funny or if Sid could do sports.

Again, here’s the bracket and here are the rules.

Michael Crichton on science and storytelling

Javier Benitez points us to this 1999 interview with techno-thriller writer Michael Crichton, who says:

I come before you today as someone who started life with degrees in physical anthropology and medicine; who then published research on endocrinology, and papers in the New England Journal of Medicine, and even in the Proceedings of the Peabody Museum. As someone who, after this promising beginning . . . spent the rest of his life in what is euphemistically called the entertainment business.

Scientists often complain to me that the media misunderstands their work. But I would suggest that in fact, the reality is just the opposite, and that it is science which misunderstands media. I will talk about why popular fiction about science must necessarily be sensationalistic, inaccurate, and negative.

Interesting, given that Crichton near the end of his life became notorious as a sensationalist climate change denier. But that doesn’t really come up in this particular interview, so let’s let him continue:

I’ll explain why it is impossible for the scientific method to be accurately portrayed in film. . . .

Movies are a special kind of storytelling, with their own requirements and rules. Here are four important ones:

– Movie characters must be compelled to act
– Movies need villains
– Movie searches are dull
– Movies must move

Unfortunately, the scientific method runs up against all four rules. In real life, scientists may compete, they may be driven – but they aren’t forced to work. Yet movies work best when characters have no choice. That’s why there is the long narrative tradition of contrived compulsion for scientists. . . .

Second, the villain. Real scientists may be challenged by nature, but they aren’t opposed by a human villain. Yet movies need a human personification of evil. You can’t make one without distorting the truth of science.

Third, searches. Scientific work is often an extended search. But movies can’t sustain a search, which is why they either run a parallel plotline, or more often, just cut the search short. . . .

Fourth, the matter of physical action: movies must move. Movies are visual and external. But much of the action of science is internal and intellectual, with little to show in the way of physical activity. . . .

For all these reasons, the scientific method presents genuine problems in film storytelling. I believe the problems are insoluble. . . .

This all makes sense.

Later on, Crichton says:

As for the media, I’d start using them, instead of feeling victimized by them. They may be in disrepute, but you’re not. The information society will be dominated by the groups and people who are most skilled at manipulating the media for their own ends.

Yup. And now he offers some ideas:

For example, under the auspices of a distinguished organization . . . I’d set up a service bureau for reporters. . . . Reporters are harried, and often don’t know science. A phone call away, establish a source of information to help them, to verify facts, to assist them through thorny issues. Don’t farm it out, make it your service, with your name on it. Over time, build this bureau into a kind of good housekeeping seal, so that your denial has power, and you can start knocking down phony stories, fake statistics and pointless scares immediately, before they build. . . .

Unfortunately, and through no fault of Crichton, we seem to have gotten the first of these suggestions but not the second. Scientists, universities, and journals promote the hell out of just about everything, but they aren’t so interested in knocking down phony stories. Instead we get crap like the Harvard University press office saying “The replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%,” or the Cornell University press office saying . . . well, if you’re a regular reader of this blog you’ll know where I’m going on this one. Distinguished organizations are promoting the phony stories, not knocking them down.

Crichton concluded:

Under the circumstances, for scientists to fret over their image seems slightly absurd. This is a great field with great talents and great power. It’s time to assume your power, and shoulder your responsibility to get your message to the waiting world. It’s nobody’s job but yours. And nobody can do it as well as you can.

Didn’t work out so well. There have been some high points, such as Freakonomics, which, for all its flaws, presented a picture of social scientists as active problem solvers. But, in many other cases, it seems that science spent much of its credibility on a bunch of short-term quests for money and fame. Too bad, seeing what happened since 1999.

As scientists, I think we should spend less time thinking about how to craft our brilliant ideas as stories for the masses, and think harder about how we ourselves learn from stories. Let’s treat our audience, our fellow citizens of the world, with some respect.

Halftime! And Jim Thorpe (1) vs. DJ Jazzy Jeff

So. Here’s the bracket so far:

Our first second-round match is the top-ranked GOAT—the greatest GOAT of all time, as it were—vs. an unseeded but appealing person whose name ends in f.

Again here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Should he go to grad school in statistics or computer science?

Someone named Nathan writes:

I am an undergraduate student in statistics and a reader of your blog. One thing that you’ve been on about over the past year is the difficulty of executing hypothesis testing correctly, and an apparent desire to see researchers move away from that paradigm. One thing I see you mention several times is to simply “model the problem directly”. I am not a masters student (yet) and am also not trained at all in Bayesian. My coursework was entirely based on classical null hypothesis testing.

From what I can gather, you mean the implementation of some kind of multi-level model. But do you also mean the fitting and usage of standard generalized linear models, such as logistic regression? I have ordered the book you wrote with Jennifer Hill on multi-level models, and I hope it will be illuminating.

On the other hand, I’m looking at going to graduate school and I will be applying this fall. My interests have diverged from classical statistics, with a larger emphasis on model building, prediction, and machine learning. To this end, would further training in statistics be appropriate? Or would it be more useful to try and get into a CS program? I still have interests in “statistics” — describing associations, but I am not so sure I am interested in being a classical theorist. What do you think?

My reply: There are lots of statistics programs that focus on applications rather than theory. Computer science departments, I don’t know how that works. If you want an applied-oriented statistics program, it could help to have a sense of what application areas you’re interested in, and also if you’re interested in doing computational statistics, as a lot of applied work requires computational as well as methodological innovation in order to include as much relevant information as possible into your analyses.

Yakov Smirnoff advances, and Halftime!

Best argument yesterday came from Yuling:

I want to learn more about missing data analysis from the seminar so I like Harry Houdini. But Yakov Smirnoff is indeed better for this topic — both Vodka and the Soviet are treatments that guarantee everyone to be Missing Completely at Random, and as statistican we definitely prefer Missing Completely at Random.

And now the contest is halfway done! We’re through with the first round. Second round will start tomorrow.

Global warming? Blame the Democrats.

An anonymous blog commenter sends the above graph and writes:

I was looking at the global temperature record and noticed an odd correlation the other day. Basically, I calculated the temperature trend for each presidency and multiplied by the number of years to get a “total temperature change”. If there was more than one president for a given year it was counted for both. I didn’t play around with different statistics to measure the amount of change, including/excluding the “split” years, etc. Maybe other ways of looking at it yield different results, this is just the first thing I did.

It turned out all 8 administrations who oversaw a cooling trend were Republican. There has never been a Democrat president who oversaw a cooling global temperature. Also, the top 6 warming presidencies were all Democrats.

I have no idea what it means but thought it may be of interest.

My first thought, beyond simply random patterns showing up with small N, is that thing that Larry Bartels noticed a few years ago, that in recent decades the economy has grown faster under Democratic presidents than Republican presidents. But the time scale does not work to map this to global warming. CO2 emissions, maybe, but I wouldn’t think it would show up in the global temperature so directly as that.

So I’d just describe this data pattern as “one of those things.” My correspondent writes:

I expect to hear it dismissed as a “spurious correlation”, but usually I hear that argument used for correlations that people “don’t like” (it sounds strange/ridiculous) and it is never really explained further. It seems to me if you want to make a valid argument that a correlation is “spurious” you still need to identify the unknown third factor though.

In this case I don’t know that you need to specify an unknown third factor, as maybe you can see this sort of pattern just from random numbers, if you look at enough things. Forking paths and all that. Also there were a lot of Republican presidents in the early years of this time series, back before global warming started to take off. Also, I haven’t checked the numbers in the graph myself.

Harry Houdini (1) vs. Yakov Smirnoff; Meryl Streep advances

Best argument yesterday came from Jonathan:

This one’s close.

Meryl Streep and Alice Waters both have 5 letters in the first name and 6 in the last name. Tie.

Both are adept at authentic accents. Tie.

Meryl has played a international celebrity cook; Alice has never played an actress. Advantage Streep.

Waters has taught many chefs; Meryl has taught no actors. Advantage Waters.

Streep went to Vassar and Yale. Waters went to Berkeley. I’m an East Coast guy, but YMMV.

Waters has the French Legion of Honor. Streep is the French Lieutenant’s Woman.

Both have won more awards than either of them can count.

So I use Sophie’s Axiom of Choice: When comparing a finite set of pairs of New Jersey Celebrities, choose the one who got into the New Jersey Hall of Fame earlier. That’s Streep, by 6 years.

And today we have the final first-round match! Who do you want to see: the top-seeded magician of all time, or an unseeded person whose name ends in f? Can a speaker escape from his own seminar? In Soviet Russia, seminar speaker watch you.

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

“Using 26,000 diary entries to show ovulatory changes in sexual desire and behavior”

Kevin Lewis points us to this research paper by Ruben Arslan, Katharina Schilling, Tanja Gerlach, and Lars Penke, which begins:

Previous research reported ovulatory changes in women’s appearance, mate preferences, extra- and in-pair sexual desire, and behavior, but has been criticized for small sample sizes, inappropriate designs, and undisclosed flexibility in analyses.

Examples of such criticism are here and here.

Arslan et al. continue:

In the present study, we sought to address these criticisms by preregistering our hypotheses and analysis plan and by collecting a large diary sample. We gathered more than 26,000 usable online self-reports in a diary format from 1,043 women, of which 421 were naturally cycling. We inferred the fertile period from menstrual onset reports. We used hormonal contraceptive users as a quasi-control group, as they experience menstruation, but not ovulation.

And:

We found robust evidence supporting previously reported ovulatory increases in extra-pair desire and behavior, in-pair desire, and self-perceived desirability, as well as no unexpected associations. Yet, we did not find predicted effects on partner mate retention behavior, clothing choices, or narcissism. Contrary to some of the earlier literature, partners’ sexual attractiveness did not moderate the cycle shifts. Taken together, the replicability of the existing literature on ovulatory changes was mixed.

I have not looked at this paper in detail, but just speaking generally I like what they’re doing. Instead of gathering one more set of noisy data and going for the quick tabloid win (or, conversely, the so-what failed replication), they designed a study to gather high-quality data with enough granularity to allow estimation of within-person comparisons. That’s what we’ve been talkin bout!

Alice Waters (4) vs. Meryl Streep; LeBron James advances

It’s L’Bron. Only pitch for Mr. Magic was from DanC: guy actually is ultra-tall, plus grand than that non-Cav who had play’d for Miami. But Dalton brings it back for Bron:

LeBron James getting to the NBA Final with J.R. Smith as his best supporting cast member is a more preposterous escape than anything David Blaine or Houdini did. So he’s already a better magician than Eric Antoine (who is seeded below Blaine and Houdini).

Plus, he’s featured in this (unfortunately paywalled) Teaching Statistics article which points out the merits of graphical comparison (“Understanding summary statistics and graphical techniques to compare Michael Jordan versus LeBron James” – https://onlinelibrary.wiley.com/doi/abs/10.1111/test.12111) I love the fact that statistics cannot determine the MJ and LeBron debate precisely because it all depends on which summary statistic you choose. Just goes to show that you need to put as much thought into which dimensions you choose to check your model (graphically and numerically) as you do in constructing your model in the first place.

All stats, yah!

Today it’s a cook vs. a drama star. Whaddya want, a scrumptious lunch or Soph’s option? Or ya want Silkwood? Fantastic Mr. Fox? Can’t go wrong with that lady. But you also luv that cookbook, that food, that flavor, right? You pick.

Again, full list is at this link, and instructions:

Trying to pick #1 visitor. I’m not asking for most popular, or most topical, or optimum, or most profound, or most cool, but a combination of traits.

I’ll pick a day’s victor not from on a popular tally but on amusing quips on both camps. So try to show off!

Our hypotheses are not just falsifiable; they’re actually false.

Everybody’s talkin bout Popper, Lakatos, etc. I think they’re great. Falsificationist Bayes, all the way, man!

But there’s something we need to be careful about. All the statistical hypotheses we ever make are false. That is, if a hypothesis becomes specific enough to make (probabilistic) predictions, we know that with enough data we will be able to falsify it.

So, here’s the paradox. We learn by falsifying hypotheses, but we know ahead of time that our hypotheses are false. Whassup with that?

The answer is that the purpose of falsification is not to falsify. Falsification is useful not in telling us that a hypothesis is false—we already knew that!—but rather in telling us the directions in which it is lacking, which points us ultimately to improvements in our model. Conversely, lack of falsification is also useful in telling us that our available data are not rich enough to go beyond the model we are currently fitting.

P.S. I was motivated to write this after seeing this quotation: “. . . this article pits two macrotheories . . . against each other in competing, falsifiable hypothesis tests . . .”, pointed to me by Kevin Lewis.

And, no, I don’t think it’s in general a good idea to pit theories against each other in competing hypothesis tests. Instead I’d prefer to embed the two theories into a larger model that includes both of them. I think the whole attitude of A-or-B-but-not-both is mistaken; for more on this point, see for example the discussion on page 962 of this review article from a few years ago.

LeBron James (3) vs. Eric Antoine; Ellen DeGeneres advances

Optimum quip Thursday was from Dzhaughn:

Mainly, that woman’s tag has a lot of a most common typographical symbol in it, which would amount to a big difficulty back in days of non-digital signs on halls of drama and crowd-laughing.

Should that fact boost or cut a probability appraisal of said woman writing an amazing book such as “A Void” (aka “La Disparition” in Gallic printings?) I cannot say, A or B. (If you don’t know what’s up, visit Amazon.com to find that book’s author’s autograph and a blurb on said book. You will know why its local omission is mandatory.)

That I should, so soon as now, so miss that most familiar symbol. But I do! Would you not? I should strongly disavow prodigalilty with it!

Good points, all. I must go with L.A. TV host and funny lady for this win. You go girl. You will soon stand vs. a hoops man or a magical guy in round 2. Good stuff all round.

Today, #3 GOAT is facing off against a magician. L’Bron could talk b-ball or politics and might want to know about schools policy, a common topic on this blog. But that français is funny looking and has strong tricks. Both guys on TV all days. Who do you want to show up to our Columbia talk?

Again, full list is at this link, and instructions:

Trying to pick #1 visitor. I’m not asking for most popular, or most topical, or optimum, or most profound, or most cool, but a combination of traits.

I’ll pick a day’s victor not from on a popular tally but on amusing quips on both camps. So try to show off!

Fitting multilevel models when the number of groups is small

Matthew Poes writes:

I have a question that I think you have answered for me before. There is an argument to be made that HLM should not be performed if a sample is too small (too small level 2 and too small level 1 units). Lot’s of papers written with guidelines on what those should be. It’s my understanding that those guidelines may not be worth much and I believe even you have suggested that when faced with small samples, it is probably better to just simulate.

Is it accurate to say that if a data set is clearly nested, there is dependence, and the sample is too small to do HLM, that no analysis is ok. That a different analysis that doesn’t address dependence but is not necessarily as biased with small samples (or so they say) is still not ok. I think you mentioned this before.

Let’s say you want to prove that head start centers that measure as having higher “capacity” (as measured on a multi-trait multi-method assessment of capacity) have teachers that are more “satisfied” with their jobs, that simply looking at the correlation between site capacity and site average job satisfaction is not ok if you only have 15 sites (and 50 total teachers unequally distributed amongst these sites). This is a real question I’ve been given with the names and faces changed. My instinct is they aren’t analyzing the question they asked and this isn’t right.

Would the use of a Bayesian GLM be an option or am I expecting too much magic here? This isn’t my study, but I hate to go back to someone and say, Hey sorry, you spent 2 years and there is nothing you can do quantitatively here (Though I much rather say that then allow this correlation to be published).

My quick response is that the model is fine if you’re not data-rich; it’s just that in such a setting the prior distribution is more important. Flat priors will not make sense because they allow the possibility of huge coefficients that are not realistic. My book with Hill is not at all clear on this point, as we pretty much only use flat priors, and we don’t really wrestle with the problems that this can cause. Moving forward, though, I think the right thing to do is to fit multilevel models with informative priors. Setting up these priors isn’t trivial but it’s not impossible either; see for example the bullet points on page 13 of this article for an example in a completely different context. As always, it would be great to have realistic case studies of this sort of thing (in this case, informative priors for multilevel models in analyses of social science interventions) that people can use as templates for their own analyses. We should really include once such example in Advanced Regression and Multilevel Models, the in-preparation second edition of the second half of our book.

Short-term, for your problem now, I recommend the multilevel models with informative priors. I’m guessing it will be a lot easier to do this than you might think.

Poes then replied:

That example came from a real scenario where a prior study actually had unusually high coefficients. It was an intervention designed for professional development of practitioners. In general, most studies of a similar nature have had no or little effect. An effect size is .2 to .5 is pretty common. This particular intervention was not so unusual as to expect much higher effects, but they ended up with effects closer to .8 or so, and the sample was very small (it was a pilot study). They used that evidence as a means to justify a second small study. I suspect there is a great deal more uncertainty in those findings than it appears to the evaluation team, and I suspect if priors from those earlier studies were to be included, the coefficients would be more reasonable. The second study has not yet been completed, but I will be shocked if they see the same large effects.

This is an exaggeration, but to put this large effect into perspective, it would be as if we are suggesting that spending an extra ten minutes a day with hands on supervision of preschool teachers would lead to their students knowing ten more letters by the end of the year. I think you have addressed this before, but I do think people sometimes forget to take a step back from their statistics to consider what those statistics mean in practical terms.

Poes also added:

While we are talking about these studies as if Bayesian analysis would be used, they are in fact all analyzed using frequentist methods. I’m not sure if that was clear.

And then he had one more question:

When selecting past studies to use as informative priors, does the quality of the research matter? I have to imagine the answer is yes. A common argument I hear against looking to past results as evidence for current or future results is that the past research is of insufficient quality. Sample too small, measures too noisy, theory of change ill-thought-out, etc. My guess is that it does matter and those issues all potentially matter, but . . . It seems like that then raises the question, at what point is the quality sufficiently bad to merit exclusion? Based on what criteria? Study rating systems (e.g. Consort) exist, but I’m assuming that is not a common part of the process and I would also guess that much of the criteria is unimportant for their use as a prior. I’ve worked on a few study rating tools (including one that is in the process of being published as we speak) and my experience has been that a lot of concessions are made to ensure at least some studies make it through. To go back to my earlier question, I had pointed out that sample size adequacy shouldn’t be based on a fixed number (e.g. at least 100 participants) and maybe not based on the existence of a power analysis, but rather something more nuanced.

This brings me back to my general recommendation that researchers have a “paper trail” to justify their models, including their choice of prior distributions. I have no easy answers here, but, as usual, the default flat prior can cause all sorts of havoc, so I think it’s worth thinking hard about how large you can expect effect sizes to be, and what substantive models correspond to various assumed distributions of effect sizes.

P.S. Yes, this question comes up a lot! For example, a quick google search reveals:

Multilevel models with only one or two groups (from 2006)

No, you don’t need 20 groups to do a multilevel analysis (from 2007)

Hierarchical modeling when you have only 2 groups: I still think it’s a good idea, you just need an informative prior on the group-level variation (from 2015)

Ian McKellen (2) vs. Ellen DeGeneres; Pierre-Simon Laplace advances

The arguments yesterday in favor of Laplace were valid, earnest, and boring. Dalton reinforced the contrast with this comment:

Belushi’s demons are a whole lot more interesting than Laplace’s demon. With the latter, you always know what you’re gonna get forever and ever evermore. The former offers heaps of exciting uncertainty, and if you remember the night, you’ll have a hell of story.

Then I read this comment from J Storrs Hall:

I fear that Laplace would be overly relaxed. Belushi, on the other hand, would be on a mission from God. With a full tank of gas. At midnight. Wearing sunglasses.

And he might even bring a penguin.

Compelling. But I don’t want a penguin in my seminar. A piranha or a kangaroo, sure, those have statistical relevance. But a penguin, no way. So Laplace, the first and greatest applied Bayesian statistician, goes to round 2.

Zbicyclist puts it well:

A man who had no need for God, and a man on a mission from God.

When our pastor was taking a statistics course as part of his MBA, I tried to explain how statistical models of human behavior were less of a violation of the notion of free will than the notion of an omniscient, omnipotent God was. I’d like to hear Laplace’s answer to this one, even if it’s just to sniff at the question.

Today we must choose between two charming show-business figures: Ian McKellen, seeded #2 in the “People whose names end in f” category, versus Ellen DeGeneres, an unseeded TV personality. You can’t go wrong with either one. All I’ve got for you is that Gandalf has a track record of saving people who are about to get eaten by trolls—I’ve been reading The Hobbit and happen to be right in the middle of that scene—and we do sometimes have trolls around here.

Any other thoughts?

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!

Wanted: Statistics-related research projects for high school students

So. I sometimes get contacted by high school students who want to work on research projects involving statistics or social science. I’ve supervised several such students, and what works best is when they have their own idea, and I can read what they’ve written and give comments. I’m more of a sounding board than anything else.

But sometimes we do have good ideas, quantitative research projects that a high school student could do that would have some interesting statistical content or would shed light on some political or social issue.

If you have any good ideas—projects that would be fun for a high school student, or something quantitative a student could do that could make the world a better place—place them in the comments, and then maybe we could put together a list.

I’m not looking for classroom activities—Deb and I have a whole book about that—I’m looking for ideas for research projects that high school students could do on their own.

Pierre-Simon Laplace (2) vs. John Belushi; Pele advances

For yesterday I was leaning toward Penn and Teller based on Bobbie’s reasoning:

Penn & Teller not only create interesting, often politically-relevant, magic. They are also visible skeptics who critique the over-claiming of magicians/mystics/paranormal advocates and they use empirical arguments/demonstrations when they speak to debunk pseudoscience. For those of us who care about such things as the “replication crisis,” creating better science, the acceptance of science, etc., is there a better analogy than to magic?

But then I read this from Daniel:

The question is whether we want a seminar focused on Bullshit! like most seminars, or on a universal truth and beauty: The Beautiful Game. I gotta go with Pele, but given it’s an academic seminar I’m pretty sure we’re going to get the Bullshit!

And the deciding argument from plusplus:

I would really like to hear Pele’s considered thoughts on who really is GOAT — him or Messi. I know, he is on the record about it already, but has been already refuted massively [sic] by video evidence, so what better than confronting a hostile seminar audience and justify his title?

And today we have the second-ranked mathematician of all time (recall that the ranking was done by a statistician; that’s how applied statisticians Laplace and Turing ranked so high) vs. an unseeded, but memorable, eater. Either one would be entertaining. Recall that Laplace anticipated all of behavioral economics, so his talk should attract people from the psychology and econ depts and b-school as well as the usual suspects from math, stat, and physics.

Again, the full bracket is here, and here are the rules:

We’re trying to pick the ultimate seminar speaker. I’m not asking for the most popular speaker, or the most relevant, or the best speaker, or the deepest, or even the coolest, but rather some combination of the above.

I’ll decide each day’s winner not based on a popular vote but based on the strength and amusingness of the arguments given by advocates on both sides. So give it your best!