Sermet Pekin’s open-source project that discovers blogs through recursive network exploration

Sermet Pekin writes:

I built an open-source project that discovers blogs through recursive network exploration–basically PageRank for the blogosphere. Your blogroll was the main seed source.

It recursively discovers new blogs by following citations and parsing RSS feeds, mapping out how blogs link to each other. Starting from a curated seed list (I used your blogroll recommendations), it can scale to hundreds or thousands of blogs depending on exploration depth. It supports different exploration strategies—breadth-first to explore widely across communities, depth-first to dive deep into niches, or mixed approaches.

I thought you might find it interesting:
– Helps surface quality blogs without relying on social media algorithms
– Your blogroll made excellent seed data—the blogs were well-curated and interconnected

Fun!

What’s it like to be the child of a white-collar criminal?

I’ve been listening to this BBC podcast, Lives Less Ordinary, that tells people’s stories. It’s interesting, also surprisingly lowbrow given that it’s BBC. For example, here are three typical shows: “Stolen as a baby, I called my abductor ‘Mom’,” “The bullet that ended our friendship,” and “He counted 3, 2, 1 – then stabbed me in the heart.” Also, they did a completely credulous show on Diana Nyad (see here for backstory). Mostly I like it, though.

Anyway, here’s the point of today’s post. Lives Less Ordinary has multiple episodes about people who grew up in criminal families and then, sometime during childhood, adolescence, and adulthood had to come to terms with what it meant to grow up around adults who were involved in violent crime. Sometimes it took awhile–I can only imagine that it’s something you’ll never get over.

So here’s my question. What’s it like to be the child of a white-collar criminal?

The reason I ask is because, with white-collar crime, it seems that it would be easier to remain in denial, to imagine that the parent did nothing wrong or that it was just a misunderstanding. I’d think that it would ultimately worse someone to avoid coming to terms with the crimes committed by the parent. But I don’t really know. On this blog we often discuss white-collar crime or white-collar unethical behavior, and sometimes I kind of wonder what it would be like to be in the family of such a person.

P.S. Last year we discussed what it was like to be the parents of a white-collar criminal, but this was a bit different than the Lives Less Ordinary stories in that the parents in question actively profited from their son’s crimes. Just horrible parents, in my opinion. Standing by your child even when he does wrong, that I understand. But joining in the crime . . . jeez. The parents are supposed to be the mature ones here! Talk about setting a bad example.

Flagging when the prior distribution is informative

For each parameter (or other qoi), compare the posterior sd to the prior sd. If the posterior sd for any parameter (or qoi) is more than 0.1 times the prior sd, then print out a note: “The prior distribution for this parameter is informative.” Then the user can go back and check that the default prior makes sense for this particular example.

This idea is in our prior choice recommendations wiki, and it’s related to our paper, The prior can often only be understood in the context of the likelihood.

The context is that we’re trying to develop default procedures, which implies default priors and also default diagnostics.

It’s not a bad thing that a prior is informative! It’s just something you’d want to know.

I’ve never actually tried the above idea. I just came up with it one day a few years ago and wrote it down. I like it, though!

P.S. In the top paragraph above, “qoi” stands for “quantity of interest.” Sometimes this is called an “estimand”–that is, something being estimated–but I prefer the more general term “quantity” because in classical frequentist statistics there is a distinction between parameters (fixed but unknown) and predictive quantities (unknown but with probability distributions that in turn depend on fixed parameters). “Quantity of interest” can be any function of data, parameters, latent parameters, missing data, future data and parameters, etc. Also, I associate the the expression “estimand” with the problem formulation in which you have a goal of estimating one particular thing; in contrast, you can have as many quantities of interest as you’d like, and they can be defined after you’ve seen the data.

Larry Summers, Ken Starr, Jeffrey Epstein, and everyone else

Two of the more memorable names to turn up in the Epstein files are Larry Summers and Ken Starr. Summers we’ve already discussed, and Starr has exchanges like this:

I was thinking about what makes the presence of Summers and Starr on this list so disturbing.

Perhaps the clearest reason is that they’re both so sanctimonious conservative. Summers played the “moral bankruptcy” card in his attempt to intimidate a student newspaper, and Starr, of course, is most famous for producing a report full of details on a sexual affair. Larry’s a liberal Democrat and Starr’s a conservative Republican, but one thing they had in common was a spurious air of rectitude.

The flip side of this is that the involvement of Larry and Ken in this world was disturbing but not at all surprising, as both men had track records of tolerating corruption when it was done by their friends–Summers had his Russian embezzler and Starr had the Baylor rapists–and also they were both directly connected to Epstein earlier, Summers through the financier’s Harvard donations and Starr actually working on his legal defense team.

But let me put it this way. Hearing that Donald Trump, Alan Dershowitz, or various sweaty businessmen were close to Epstein is no surprise: these guys given the impression of being proud sleazeballs–I guess the expression would be “men of the world”–who’d see no problem with all the financier’s dirty deeds. And I wasn’t particularly disturbed by the involvement of scientists such as Marvin Minsky, Stephen Hawking, and Murray Gell-Mann: these guys are subject to flattery, and who knows what demons haunt their private imaginations. This does not excuse their behavior; it’s just not particularly notable that there will be some subset of old guys who will want to join in this sort of activity while not looking too carefully about what’s actually happening.

Summers and Starr are different from Trump, Dershowitz, etc., or from Minsky, Hawking, etc., in that they represent a sort of legitimate establishment in our society: political figures who are respected for their probity and judgment.

The issue is not just that Larry and Ken are sanctimonious hypocrites–after all, who among us can claim to always live up to our highest ideals–but rather the roles they played in the worlds of politics and power.

The soft bigotry of low expectations

Following up on our recent discussion, someone pointed me to this post by political commentator Matt Taibbi labeling the now-disgraced Harvard economist Larry Summers as “a rare perfect 10 on the celebrity-repugnance scale.”

My immediate thought was that this assessment was ridiculous. Lots of celebrities are more repugnant than Summers. If Summers is a perfect 10, then what’s Alan Dershowitz, who was much more entangled with Epstein and also promoted torture? An 11? What about Epstein himself? Is he a 13? And then maybe Harvey Weinstein is a 13, and maybe O. J. Simpson and Phil Spector are 14’s?

tl;dr: Larry Summers is an unexceptional man who has been placed in positions of power.

He’s not a world-class intellect; he’s a well-connected professor who gets credit for saying a lot of unexceptional things. The idea that it makes economic sense to pollute more in poor places? The idea that, if men have higher variance than women, this will affect the tail of the distribution? The idea that positive-feedback trading could cause dangers for financial markets? These are not sophisticated concepts. It’s fine for Summers to use his influence to spread these ideas, but it’s kinda silly to think that this makes him some sort of wise man.

At the same time, he’s not a world-class scoundrel. Lots of rich guys are willing to take money and favors from people who operate on the wrong side of the law; lots of guys will maneuver to get sexual advantage from mentoring relationships; and Summers is hardly unusual in being a member of the executive class who has tried to intimidate the press. It’s fine to criticize the man for this immoral and destructive behavior, but it’s kinda silly to think that this makes him stand out from so many others of his sort.

A not-so-suspicious story

Taibbi also writes, “it’s extremely suspicious that a story that was deader than Epstein himself for years is suddenly the Most Important Thing now that Trump is back in the White House,” and I’m like, huh? It’s no mystery, dude. The news media often follow the lead of the government. When Epstein died in custody back in 2019, the government was not pushing the story, and eventually it faded. For several years the government wasn’t saying much about Epstein, and there wasn’t much to report. Then in early 2025 the administration announced it was releasing the Epstein files and so, no surprise, this got news coverage. Then the Speaker of the House shut down half of Congress for a month or so, seemingly just for the purpose of not seating an incoming representative so as to avoid having to vote on the release of the files. That was news. To frame all this as some sort of mystery or an anti-Trump conspiracy is some of the most clueless political commentary I’ve seen in awhile. The crazy man also writes, “the public’s fascination with Epstein is based on the notion that he was not only operating an organized blackmail ring, but doing so on behalf of intelligence agencies, probably Israeli,” and, ok, maybe? But I don’t think so. My impression is that the public fascination with Epstein is based on his connections with rich and powerful people and the impunity by which he was allowed to commit crimes in public even after his criminal conviction–he even received favorable treatment in prison (according to wikipedia, “Epstein’s cell door was left unlocked . . . guest logs were destroyed per the department’s ‘records retention’ rules . . . Epstein was allowed to use his own driver to drive him between jail and his office and other appointments,” and his suspicious death, and the current administration swinging wildly back and forth between hollering about the Epstein files and trying to keep them locked up. That, and all the sex, is enough reason for public fascination, no? The Israeli intelligence thing seems much lower down on the list of concerns. A quick google turned up this recent poll, which reports:

Forty-five percent of respondents, including 53% of Americans 18-49 years old, said that Epstein probably collaborated with a foreign intelligence service, compared to only 6% who said he probably did not, while 48% said they didn’t know. . . . We followed up with an open-ended question among the 45% of respondents who said Epstein probably collaborated with a foreign intelligence service: “What country’s intelligence service do you think Jeffrey Epstein collaborated with?” Many specified only one while others specified more than one. Even though the American intelligence service (we did not specify intelligence services’ names) is not ‘foreign,’ many respondents still specified the U.S., usually as one of several. The two countries that topped the list were Russia and Israel – both among those who selected only one intelligence service and multiple. Overall, 30% specified Russia, 27% specified Israel, 17% specified the U.S., 10% specified the U.K., and 4% specified China.

So, no, I don’t think the evidence supports Taibbi’s claim that “the public’s fascination with Epstein is based on the notion that he was not only operating an organized blackmail ring, but doing so on behalf of intelligence agencies, probably Israeli,” given that only half the people surveyed thought that foreign intelligence services were involved, and, of those, less than a third mentioned Israel. I’m not saying Epstein didn’t collaborate with Israeli intelligence–I have no idea–just that his claimed explanation of the public’s fascination is unsupported, just really bad political commentary.

I felt a kind of duty here to look into that clueless commentary and try to figure out what went wrong . . . but that’s not actually the focus of today’s post.

Rather, what I want to talk about is why a focus on Summers is missing the point of the Epstein story.

It’s not about Larry

Don’t get me wrong–I do think Summers is an important part of the story. Not of the story of Epstein in particular–here, Summers was more of a bit player–but as an example of the corruption of our ruling class, this idea that rich people should be able to get away with anything. I have only contempt for Summers’s attempt to intimidate Harvard’s student newspaper, and for Summers being such a cheapskate that, instead of dropping a million of his own dollars into his wife’s project, he and she instead tried to get that money by flattering a sex-trafficker, and for Summers trying to pull strings to sleep with a much younger scholar he was mentoring. All bad stuff–although, as noted above, not as bad as the deeds of some other celebrities such as Weinstein, Simpson, Spector, and of course Epstein himself.

My problem with a focus on Summers is not that he deserves a break, but rather that there are some more interesting questions to the story, most notably why did all these important business and academic leaders seem to love their time with Jeff so much? Part of it was the appeal of illicit sex, but I doubt that was the whole story either.

In a recent post, Paul Compos points out that lots of people fell into Epstein’s spell. And it wasn’t just stupid people, either. I can’t speak for Larry Summers, a man I’ve never met and whose writings have never impressed me, but other members of what we might call the “Epstein community” are legitimately brilliant, and their receptiveness to Epstein puzzled me. Some hypotheses:
(a) They thought Epstein might send some money their way.
(b) They were in the habit of asking sucking up to rich people for money so they were just sucking up to Epstein out of habit.
(c) They thought they might get some sex with underage women out of the deal.
(d) They liked to hang out with Epstein because he thought Epstein was cool.
(e) They genuinely enjoyed hanging out with Epstein.
I have a horrible feeling that for a lot of these people it was a mixture of all 5 of these things! I guess another option is that they enjoyed free Caribbean vacations and didn’t look too carefully at the other people at the resort.

My only point here is that if we’re talking about Summers, we can say, yeah, Summers is a stuffed shirt, he’s an idiot, the years as university president warped his brain (“flattering rich assholes and rich evil people” is pretty much the #1 requirement for the job of university president), etc. But not everyone was involved was an idiot, and a lot of them needed the money even less than Summers did. And yet they still seemed to have enjoyed hanging out with the guy.

Along this point, someone pointed me to this op-ed by Anand Giridharadas, who writes:

As journalists comb through the Epstein emails, surfacing the name of one fawning luminary after another, there is a collective whisper of “How could they?” How could such eminent people, belonging to such prestigious institutions, succumb to this? . . .

How did Mr. Epstein manage to pull so many strangers close? The emails reveal a barter economy of nonpublic information that was a big draw. This is not a world where you bring a bottle of wine to dinner and that’s it. You bring what financiers call “edge” — proprietary insight, inside information, a unique takeaway from a conference, a counterintuitive prediction about A.I., a snippet of conversation with a lawmaker, a foretaste of tomorrow’s news.

What the Epstein class understands is that the more accessible information becomes, the more precious nonpublic information is. . . . “Saw Matt C with DJT at golf tournament I know why he was there,” Nicholas Ribis, a former Trump Hotel executive, wrote to Mr. Epstein, making what couples therapists call a bid for attention. Jes Staley, then a top banking executive, casually mentioned a dinner with George Tenet, the former Central Intelligence Agency director, and got the reaction he probably hoped for: “how was tenet.” Mr. Summers laid bait by mentioning meetings with people at SoftBank and Saudi Arabia’s sovereign wealth fund. Mr. Epstein nibbled: “anyone stand out?” . . .

The smart need money; the rich want to seem smart; the staid seek adjacency to what Mr. Summers called “life among the lucrative and louche”; and Mr. Epstein needed to wash his name using blue-chip people who could be forgiving about infractions against the less powerful. . . . Mr. Summers wrote to Mr. Epstein: “U r wall st tough guy w intellectual curiosity.” Mr. Epstein replied: “And you an interllectual with a Wall Street curiosity.” . . . For this modern elite, seeming smart is what inheriting land used to be: a guarantor of opened doors. . . .

This sounds about right to me, and it gives us some perspective on things like the above-quoted this exchange with Kenneth Starr.

Part of the appeal was to feel connected to this network of big shots. So when someone sends the email, “Great meeting yesterday – really enjoyed it,” to Epstein, maybe the enjoyment came from the proximity to power and information, or the illusion of this proximity. A feeling of importance by proxy.

In my earlier swing at this topic, I remarked on Epstein’s impressive ability to make all these different people, ranging from Noam Chomsky to Ken Starr to Steve Bannon to Larry Summers, to all think he was in agreement with them. Epstein was offering these people information and connections, but in a context where they each thought he was on their side.

It’s all about the information

I was discussing some of this with Jessica Hullman, who writes:

This idea that there are information bartering networks where a heavy emphasis is placed on the “interpretive edge” that certain people have feels familiar based on what I [Jessica] have glimpsed in the more elite circles in CS, especially AI/ML where all the power and money are these days. It’s like a vibe that gets manufactured that makes it seem paramount to remain connected to the right people so you can be in the loop on how they view things, even if the only intel you gain is their casual quick takes that aren’t even well thought-out. I think there are people are very good at building these kind of vibey networks by making others feel like they have an edge on everyone else by virtue of being connected with them. I could see how Epstein might have been really good at making people feel like they had access to some kind of ultra-special network just by having the occasional email with him, even if nothing really deep came out of it. Seems related to being a master manipulator.

It’s complicated because making connections can be a good in itself.

For example, all sorts of interesting people come to our Playroom on Tuesdays. Sometimes I can work with them directly, sometimes they get looped into a group project on which I’m involved, but other times my only role is to say, “Hey! You’re interested in X, and someone else in the room knows about this topic, so the two of you can work together.” This sort of intellectual matchmaking is fun, partly because it can be a big help while requiring minimal effort on my part, and partly because it’s making use of my unique skills–or, I should say, my particular position as someone with a lot of intellectual connections in all sorts of different directions within statistics and social science.

It’s similar with blogging! The act of writing these posts . . . ok, it’s not effortless, and arguably it takes my personal resources away from more enduring contributions in theory, methods, applied research, policy, and book writing, but it’s a way to contribute to the world by levering my ideas and connections. I guess that something similar must also go on in the world of business, where you make lots of connections and sometimes you get a finder’s fee or its equivalent. In my case, I’m usually not looking for money (although, yes, it comes in handy), but I’m trying to do my part to facilitate useful things. Of course, that wasn’t where Epstein was coming from–I guess his main goal was to collect powerful friends who could keep him out of jail–but the social processes seem similar.

P.S. I saw in the newspaper that Jimmy Cliff just died. I’d like to think that in a hundred years, he will still be remembered long after Summers and Epstein have been forgotten. But who knows.

Effective sample size

This post is by Aki

Richard McElreath had a bsky post with a MCMC convergence diagnostic plot with one axis label saying “number of effective samples”. I commented that it’s wrong and misleading, and it would be better to write “effective sample size”. Frank Harrell asked elaboration. Before I had time to answer, some other people had posted just fine answers. I wanted to write a bit more but not having time to write short, I decided to write a blog post.

The problem with “number of effective samples” is that it sounds like some samples are effective and some are not, but the effectiveness is not a property of individual samples but the whole sample.

Before I continue further, I want to switch using different words. Each individual posterior draw is sample, and the collection of all posterior draws is sample. Technically this is correct, but can lead to confusion whether we refer to an individual (sample) or the group of individuals (sample). This is why I prefer to talk about individual posterior draws, and the collection of posterior draws is a posterior sample. This is also why posterior R package uses the term draw.

Every single MCMC draw has effective sample size 1, and the number of effective draws is the same as the total number of draws. However, when we use a collection of MCMC draws to estimate some expectation, the Markovian dependency makes the Monte Carlo error behave in a way that it makes sense to compare the estimation efficiency to the corresponding sample size of independent draws.

Effective sample size used to be denoted by n_eff. It might be that “number of effective samples” comes from some people reading n_eff as “number of effective”. n_eff has another problem, as n often denotes the number of observations. That’s why we recommend shortening the effective sample size as ESS. This is also what posterior package uses.

A further important point is that the effective sample size depends also on which expectation is estimated. Most commonly effective sample size has been reported for estimation of E[theta], but as the effective sample size can be very different for example for E[theta^2] it would be better to explicitly state what is estimated. By default, posterior package reports Bulk-ESS and Tail-ESS, and while neither is ESS for a simple expectation, they are more informative than just ESS.

It’s been a long time since I wrote a blog post which was not a job ad, and coincidentally also now I’m looking for postdocs who have strong background in Bayesian methods and interested to work on Bayesian cross-validation, model checking and comparison.

The three funniest items on the Kroger recall list

Palko points us to this news item, “Kroger Recall Update: Full List of Product Warnings Across 18 States.” My favorite:

Yummi Sushi, recalled October 28, 2025: Nashville, Knoxville, Georgia, and South Carolina stores. Possible contamination with metal fragments.

If you’d asked me why they were recalling sushi, “contamination with metal fragments” would not have been in my first hundred guesses. I guess the metal makes it yummier.

And my second favorite:

Face Rock Curds Vampire Slayer Garlic, recalled June 25, 2025: Affects Fred Meyer and QFC stores. Potential Listeria contamination.

I don’t see the problem. Listeria would slay a vampire too, no?

And this:

High Noon vodka Beach Variety, recalled July 28, 2025: Affects Kroger-owned stores located in Wisconsin, South Carolina and Virginia. Kroger says: “Specific lot codes of the product are being recalled due to variety packs may have cans labeled Celsius Energy Drink filled with seltzer alcohol.”

High noon, indeed.

The purpose of science vs. the purpose of scientists

The other day I posted something on the two faces of academic social science:

1. To provide theoretical and social justification for existing power structures.

2. To be oppositional, questioning existing power structures.

I got some replies, arguing that there’s no need for a social science to do either one of the two options above, and that it would be better to have no political agenda at all.

Setting aside the question of how possible it is to avoid a political agenda, I thought it might be helpful to distinguish between purpose of science and the purpose of scientists. As individual scientists, we do some mix of what we’re paid for and what we want to do. But there’s somebody out there paying us, or otherwise giving us the free time to work on what we want to work on.

Lots of science is paid for by the government or by large corporations or by somebody who has a bunch of money, so, yeah, option #1 is pretty clear, especially for social sciences. The natural sciences and engineering can serve society and its powerful institutions more directly, but increasing the efficiency of agricultural and industrial processes, improving public health and transport, etc etc. The social sciences can do some of that sort of thing (for example, improving measurements and studying outcomes of policies), but often their role is more indirect, to supply good reasons for why things should remain how they are, or continue going in the direction they’ve been going.

But academia, like journalism, also traditionally plays an oppositional role (“comfort the afflicted and afflict the comfortable”), which brings us to option #2 above. In the same way that churches play a societal role as a sort of refuge from the capitalist system, one of the roles of academic social science is to have the freedom to push against society. Biting the hand that feeds us is part of the job. Again, this does not imply that every social scientist needs to bite in this way, and I recognize that if you bite too hard there can be consequences. My point in this post is just that these two roles exist, and are in tension with each other, even if you, as a particular social scientist, are just going about your job.

Survey Statistics: quantity vs quality

In 2021, I taught Survey Research Methods at NYU (thank you to Daphna Harel for that opportunity !). We used the textbook by Groves et al. (in this blog series here). It’s got this helpful image, which even Twitter liked:

Image

I like this image because it reminds us to consider all the sources of error, even if that’s a bit stressful. For example, in considering different survey modes (face-to-face vs phone vs mail), Groves et al. p.151 compares response rate, cost, coverage, and measurement error. Meng 2018 “Statistical Paradises and Paradoxes” (in this blog series here) reminds us that big administrative datasets may have large coverage and sample size, but a smaller higher quality dataset could be better in important ways. In a talk Meng asks if an 80% (or even 99%) non-random sample is better than a 5% random sample.

Consider two data sources, maybe from different survey modes:

  1. 100% coverage, i.e. your sampling frame is the entire population. But response probability P[R = 1 | Y] differs a lot by Y, a variable of interest. These differences may not decrease with multiple contact attempts. They could in fact get worse.
  2. Only 5% coverage, i.e. you only can contact 5% of the population and ask them to respond to your survey, so P[R = 1] <= 5%. But suppose response probability P[R = 1 | Y] is roughly constant across Y. Here R = 1 includes both coverage and response.

(To keep this comparison simple, I left out discussion of auxiliary data X, which we’ve discussed a lot in this blog series.) The first data source has no coverage error, but lots of nonresponse error. The second data source has lots of coverage error, but it doesn’t ultimately result in bias. These are both cartoon examples, the real world is usually a bit more subtle.

Andrew advocates (2011, 2018, and 2025) throwing both surveys into one model with:

indicators for the individual surveys (varying intercepts, maybe varying slopes too)

In practice I think this is harder than it sounds, with big models accounting for lots of auxiliary X and surveys that differ quite a lot in their coverage and nonresponse mechanisms. It could be better to focus on understanding one survey’s mechanisms really well ? But it could also be better to combine data sources.

Problems with the so-called gender equality paradox

A few years ago I wrote something on what’s been called the gender equality paradox, a result found in some cross-national studies that “as gender equality increases, so do gender differences.”

My post discussed a particular paper published in that area. It was in the comments section that I laid out my larger concerns with this research:

I do think there’s some incoherence in the arguments I’ve seen presented in this area. Roughly speaking, there are five sorts of country-level variables to look at:

1. Policies and customs: These could include laws on women’s equality, abortion, child support, etc., as well as the prevalence of woman-friendly private-sector policies such as maternity leave and laws that restrict the clothing women can wear in public, etc. Also some measure of the left-right position of the government (although that could go in item 4 below, depending on whether we’re thinking of the government’s political stance as affecting policy or as a measure of the country’s political culture).

2. Health outcomes: Life expectancy is tricky, though. Is equality in life expectancy a sign of equality, or a sign that things are really bad for women, if they don’t have their usual several years advantage compared to men?

3. Sociological and economic outcomes: Labor force participation for men and women, proportions of women going on to higher education, courses of study at university, etc.

4. Social and political attitudes: Opinions of men and women from surveys on attitudes regarding women’s equality, gay rights, abortion, the role of women in the workplace and in politics, etc. There are two ways to summarize this: First, how much do the people in the country support ideals of equality between the sexes; Second, how much do attitudes of men and women differ?

5. Personality surveys: Examples would be the personality inventories used in the study discussed in the above-linked post.

Adding to the mess is that countries change over time, and there are also variables such as the wealth of the country, its geographical location, and its ethnic composition, all of which seem relevant but are not directly captured in any of the measures above.

In any case, if all five of the above sorts of measures were positively correlated, all would be clear: Countries with policies favoring women’s equality have more liberal politics, better health outcomes for women, more social and economic equality in economic outcomes, more support for women’s equality, and smaller differences between men and women in issue attitudes and personality measurements.

But, to the extent these have been studied, it appears that the relevant cross-country correlations are not uniformly positive. For example, the statement provided by Nick in that comment thread, “most nurses are women even in very egalitarian places like Scandinavia,” sounds like evidence in favor of a zero or negative correlation between items 1 and 3 in my list. The statement here that “the countries that minted the most female college graduates in fields like science, engineering, or math were also some of the least gender-equal countries” represents a negative correlation between items 3 and 4, or perhaps within different categories of item 3.

The challenge here is that the five items above can’t all be highly negatively correlated with each other–and, in any case, we wouldn’t expect them to be. Rather, there will be lots of the expected positive correlations, with occasional negative correlations that are newsworthy.

Any negative correlations among the five items above can be taken as a gender-equality paradox.

Even beyond issues of data and measurement, there is a problem of interpretation.

This occurs with just about any study of disparities. From a leftist or liberal perspective, any gender differences can be interpreted as signs of unfairness or discrimination. From a rightist or conservative perspective, any gender differences can be interpreted as reflections of the state of nature.

The same issue arises with cross-national correlations. If gender differences are higher in more gender-equal societies, this can support a leftist view that even the supposedly enlightened western societies still have a ways to go, or it can support a rightist view that in richer societies women have more freedom and so we can see their revealed preference for traditional gender roles. Conversely, if gender differences are lower in more gender-equal societies, this supports a leftist view that more equality leads to progress but also a rightist view that gender equality is an artificial condition arising in the decadent West.

Complicating the matter is that there are several different ways of measuring gender equality and gender differences.

I’m not saying these things shouldn’t be studied; I just think we need to be careful about glib political interpretations of these cross-national comparisons. For example, I don’t trust the argument in this paper:

We found that sex differences in personality, verbal abilities, episodic memory, and negative emotions are more pronounced in countries with higher living conditions. In contrast, sex differences in sexual behavior, partner preferences, and math are smaller in countries with higher living conditions. We also observed that economic indicators of living conditions, such as gross domestic product, are most sensitive in predicting the magnitude of sex differences. Taken together, results indicate that more sex differences are larger, rather than smaller, in countries with higher living conditions. It should therefore be expected that the magnitude of most psychological sex differences will remain unchanged or become more pronounced with improvements in living conditions, such as economy, gender equality, and education.

It just seems like they’re trawling through correlations and then jumping to predictive conclusions about the future.

For awhile I’ve wanted to write more on this topic–basically, something like my above comment with the 5 items etc., but with data.

In the meantime, though, I heard from psychology researcher Mathias Berggren, who seems to have done something about it:

In case you remain interested in the larger subject of the so-called “gender-equality paradox” (GEP); that several measures suggest gender differences are larger in “more gender-equal countries” (Western countries); we have now published a preprint that re-examines multiple previous such results. I think it contains several methodological aspects that could be interesting for further discussion (summarized below).

1. The GEP appears to be an example of how methodologically confounded results have become imbued with meaning over time. Not because these methodological confounds have been overcome (we show they are massive), but because a more intriguing explanation for the results has materialized: the evolutionary idea that more gender-equal societies enable people to follow their innate (gender-specific) preferences.

2. I [Berggren] began looking into this as I found the whole premise of the GEP strange. Basically, it seems to be employed as follows: (a) Researchers think that some gender differences found in the West reflect innate and universal gender differences. (b) They find smaller such differences outside of the West. (c) This is taken as proof that the differences in the West are the truest revelation of such innate and universal gender differences.

3. Indeed, as we show, the predictors that have come to be employed appear to provide nothing beyond just generally pointing towards the West. For example, a simple Western indicator has higher cross-country correlations. As it was already known that gender differences on the most employed measures were larger in the West, these variables thus appear to provide nothing on their own. In the manuscript, we show how easy it is to push some favored theoretical perspective when we know that differences are larger in the West, and just correlate them with some other variables we know have higher scores in the West. Thus, we show that the pattern of gender differences is just as well “explained” by a medical conspiracy, by people’s fear of death, and by novel reading rotting peoples’ brains, as it is explained by the evolutionary interpretation (described in more detail in the supplementary information).

4. The confounds include massive cross-cultural correlations with data quality. This was already shown in the early studies, but the association appears to have become unexamined after more intriguing predictors have been identified. However, we show that it is very strong in all the self-assessment studies we reassess – typically notably stronger than the correlation with gender equality. For an illustration, see Figure 3 in the manuscript.
That figure also contains a barplot of answers to a patience staircase item – with the largest floor effects I’ve ever seen. That item originates from an article published in Science, but whose open-source data only included scale scores rather than item scores – which we needed to assess scale reliability. Reaching out about this also did not result in sharing of the item scores. Thus, we had to separate item scores ourselves where we could do so. The method we employed is described the supplementary information. Some special factors made it possible in this case, but the method might be of interest to data miners who want to reassess results where item scores are not directly available.

I replied that yes, I remain interested in the topic, not so much because I think there’s a GEP but because it’s interesting to me that so many people seem to want the GEP to be true.

Berggren responded:

I think there are three reasons why the GEP has become popular:

1. It seems to provide proof for essentialist views of men and women.

2. It provides cover for any gender differences in the West. By the logic of the GEP, when the West has largest (measured) gender differences on some dimension, this is proof that those differences develop freely and naturally. Thus, either differences are larger elsewhere, and then the West has come further in adressing such unfair cultural norms; or differences are largest in the West, and then they are just natural and inevitable.

3. It provides cover for cultural/methdological confounds in cross-cultural comparisons, and gives the sense that “all is well” with the data, and that nothing needs to be adressed. If there is a theoretical explanation for the results, then one can just lean into that explanation and keep working as usual. But if results are are due to confounds, then one has to think further about how to adress those issues. As we note in the preprint, there were multiple considerations of confounds in the early studies. However, after the introduction of the evolutionary interpretation of the GEP, the focus appears to have shifted.

Some thoughts on empirical distributions of z-scores

The z-score is an estimated effect divided by its standard error. The standard error is the estimated standard deviation of the sampling distribution of the estimate. In a clean setting–if the estimate comes from a study with a large sample size, with data that don’t include extreme values, and with an estimate that is linear or asymptotically linear, and the sampling process is correctly modeled–the sampling distribution of the estimate will be approximately normal, and the standard error will be essentially equal to the sampling standard deviation. That is, the sampling distribution of the z-score will be approximately normally distributed with standard deviation of 1. Furthermore, if the estimate is unbiased, then the mean of the sampling distribution of the z-score will be the signal-to-noise ratio, that is, the true effect size divided by the standard error.

Just about all studies are biased in some way, and the way we usually handle this is to retroactively define the true effect size as something like the mean of the sampling distribution of the estimate–that is, the true effect size is defined as the thing that the study is estimating. So if you have a study that’s using a bad measure of exposure, its true effect will typically be lower than the true effect for a study that is measuring exposure more accurately. This is important–we care that studies are not always measuring what they’re trying to measure–but it’s not the focus of our investigation here, which is why we’re taking the “reduced form” approach and considering the target of interest to be the underlying effect size for the studies as they are actually conducted.

The normal approximation to the sampling distribution, and the assumption that the standard error is equal to the standard deviation of the sampling distribution, those too are approximations. I think they are reasonable approximations for the sorts of studies considered here, but we could consider the sensitivity of our findings to relaxing these assumptions in various ways.

So what we’re working with is some published collection of estimates, j=1,…,J, each with a z-score z_j, with the sampling distribution z_j ~ normal(SNR_j, 1), where SNR_j is the signal-to-noise ratio of the true effect from study j. There can be multiple estimates in the published collection from each study but we won’t really worry about that.

Under this model, the distribution of z_j’s across the J estimates–that is, not the sampling distribution for any particular estimate, but the distribution of the collection of J estimates or, equivalently, the distribution of a single z_j drawn at random from this collection of estimates–will look like a convolution of the unit normal distribution and the distribution of the SNR’s.

Mathematically, the convolution of the unit normal distribution with another distribution can look like all sorts of things, but it can’t look like this:

When a large collection of z-scores does look like this–as from the estimates collected by Barnett and Wren (2019), that tells us that one of the above assumptions is wrong.

In this case, the key assumption that is violated was not written above at all! It was the implicit assumption that, first there is some collection of J true effects, thus J signal-to-noise ratios SNR_j, and then for each a z-score is observed, and then all these z-scores are published. The violated assumption is that last part, that all these z-scores are published. The model described above doesn’t account for selection, and obviously there is selection: lots of z-scores between -2 and 2 are missing. To construct the dataset discussed here, Barnett and Wren collected confidence intervals published in papers published in Medline since 1976: “over 968 000 confidence intervals extracted from abstracts and over 350 000 intervals extracted from the full-text.” They write:

The huge excesses of published confidence intervals that are just below the statistically significant threshold are not statistically plausible. Large improvements in research practice are needed to provide more results that better reflect the truth.

And I think that’s right.

The answer to the question, “Why should we care about the distribution of the z-scores of published results?”, is that they tell about two things:

1. The distribution of signal-to-noise ratios for the underlying effects in these studies;

2. The selection process that determines which estimates get reported.

Item #1 is important for two reasons, first because we might be interested in the distribution of true SNRs in some population of studies, and second because we can use an estimated distribution of true SNRs as a prior for Bayesian inference, as Erik and I discuss in our paper from 2022.

Item #2 is also important for two reasons, first because it’s good to understand the processes of scientific reporting and publication–the reported and published results are usually the only ones we hear about–, and second because understand this selection process helps us to adjust for it, especially to account for type M errors or overestimates, an issue that has arisen in psychology, criminology, and many other areas and is directly relevant for policy analysis.

But why talk about the selection effect? Isn’t it obvious?

When you first see the graph of published z-scores, with that big chunk missing between -2 and 2, you might think, sure, that’s what we’d expect, given how the publication game goes. (Some people still argue that it’s fine not to publish non-statistically-significant results because they are uninteresting. I’d prefer to publish everything; but that’s another story. For here, let’s just say that, to the extent there is selection in what is reported and published, we should be accounting for such selection so as to avoid large avoidable errors in our summaries.)

But you don’t always see the pattern with missing values between -2 and 2! Here’s an example:

In our paper in preparation, Witold, Erik, and I look at corpuses of studies from several different sources. The distributions of the z-scores depend on the corpus. Just for example, from Erik’s recent post, here are two distributions, one from PubMed and the other from the Cochrane database of medical trials:

So it’s not automatic that you’ll get that cutout between -2 and 2.

Why look at z-scores rather than p-values?

Our main reason for looking at z-scores rather than p-values is that z has that linear convolution property. Also we think of z as being more generally interpretable, whereas p is interpretable only relative to the null hypothesis. You can think of z as an (unbiased) estimate of SNR with standard error 1.

Wrong intuitions

Applied researchers and statisticians often seem to think that the z-score, or the p-value, is more controllable than it is. With selection, of course, you can control anything, either by waiting until you see the results you want to see, or by sequentially gathering data until you reach some threshold, or by choosing among the many available analyses of your data, or by some combination of these procedures. In the absence of selection, though, you have only a limited amount of control over your z-score. Even if all goes ideally, it will be equal to the true signal-to-noise ratio of your experiment (which you as a researcher do have some control over, although with the limitation that you won’t know the true effect size ahead of time) plus that random unit normal error. So if you’re aiming for a SNR of 2.8 (which will give you 80% power under the usual conditions), your observed z-score could easily be anywhere from 0.8 (corresponding to a p-value of 0.42) and 4.8 (corresponding to a p-value of about 1.6e-6). In real life your SNR will usually be quite a bit less than 2.8, at least so Zwet, Schwab, and Senn estimate based on two corpuses of high-quality studies, which makes sense given that there are lots of reasons that researchers will be over-optimistic regarding effect sizes. (That’s the subject of another post.)

The point is that there’s this intuition that a researcher can, with care, aim for a p-value of about 0.05, or a z-score of about 2. But, no, even if researchers could pinpoint the SNR (which almost always they can’t), the z-score distribution would be blurred.

In theory, you could have a legitimately bimodal distribution of unselected z-scores if true studies were a mix of a distribution of SNRs close to 0 and a distribution of SNRs of 2 or higher, but there’s no evidence whatsoever that this is happening, and lots of evidence that even in high-quality studies, the distribution of SNRs has a peak around zero and gradually declines from there.

Normal sampling distribution != normal distribution of z-scores

In the absence of selection, the distribution of z-scores is approximately the distribution of SNRs convoluted with the unit normal distribution. (One of Erik’s big steps in all this research has been to frame the problem in terms of SNRs rather than effect sizes, which has two benefits: (1) everything is scale-free which allows us to easily set up default procedures, and (2) the math is simpler because we’re convoluting with a fixed normal(0,1) distribution rather than with a different distribution for each estimate.)

This causes some confusion, because there are three distributions: (a) the population distribution of z-scores, which are observed, (b) the population distribution of SNRs, which are unobserved, we can only perform inference for them, and (c) the sampling distribution of each z_j – SNR_j, which is approximately known. Distribution (c) is normal, but there’s no reason to think that (a) or (b) will be normal. Indeed, you can look at the data and there’s direct evidence that distribution (a) has wider tails than normal, which in turn implies wider-than-normal tails for (b). In theory, the distribution of (b) could be anything, but in practice, as noted above, it seems to be unimodal with a peak around zero. Which makes sense, given that any collection of studies is a mixture of all sorts of different things, including some experiments that are very precise and some that are very noisy, and some underlying effect sizes that are large and many that are small.

I think people sometimes get confused between population distributions and sampling distributions. At least, this confusion arises all the time whenever we teach statistics. So when we show a graph of z-scores and then we talk about the z-score having a sampling distribution that’s normal with sd of 1, it’s natural for someone to read this quickly and come out with the impression that we’re saying that the population of z-scores is normal, or that it should be normal. But we’re not.

Thinking continuously rather than discretely

A common theme in our research in this area is to think in terms of continuous effects rather than discretely in terms of effects being zero or nonzero.

We say this for four reasons. Actually, I think more than four, but these are what come to mind right now:

First, in the research areas that interest us, there are lots of effects that are close to zero, but I don’t see sharp divisions between effects that are very close to zero, and effects that are typically small but could be large for some people, and effects that are typically small but could be large in the context of one particular experiment, and effects that are moderate or large across the board, nor can these different scenarios can be reliably distinguished based on experiments designed to estimate an average effect. You’d need huge sample sizes to distinguish zero effects from realistically-sized small but real effects.

Second, studies in the human and environmental sciences typically have some leakage, or bias, or whatever you want to call it. With a big enough sample size, you can find an effect, even if it’s not the effect you’re looking for.

Third, moderate or large effects–or small effects studied carefully–will typically vary among people, across scenarios, and over time, so it’s a mistake to think of nonzero or non-effectively-zero effects as some sort of monolith.

Fourth, we’re looking at signal-to-noise ratios, and these will vary a lot because of variation across studies in measurement quality, resources, and the guessed effect sizes or thresholds used in the study design.

Why I wrote this post

As noted above, Erik, Witold, and I are writing some articles on this and related topics. So maybe it would’ve been better for me to have spent the past two hours working on those articles rather than writing the above post. The reason I wrote and am posting this is: first, sometimes it’s easier to get ideas out in an informal way; second, this way we can communicate with people right now and not have to wait until the article is done; and, third, this is an opportunity for feedback.

So if there’s something confusing about the above post, or if there’s something in it you disagree with, or if you think we missed or dodged any important points, please let us know. Thanks in advance. We’re doing this research for you–it’s our idea of a public service!–so whatever comments you can supply would be helpful. It’s always a challenge to write about points of confusion, because the challenges that make the topic confusing can also make it hard to clarify the problem.

Also, I’ll correct myself when I’ve made mistakes–so, again, if there’s something above that bothers you, let us know, as I’m very open to the possibility that we’re missing something here.

Why does this matter?

The graph at the top of this post is not a scandal in itself. It is what it is. Thousands of people published papers, now collected in Medline, that included confidence intervals. Nothing wrong with looking at these.

The challenge comes in the interpretation. The graph shows a lot of selection: lots of estimates that were less than 2 standard errors from zero were not reported, or were not reported with confidence intervals. It’s important to account for that selection when interpreting published results, whether individually or in aggregate. Accounting for selection is necessary for Bayesian inference (as it is part of the likelihood, the probability of the observed data given the model) and for frequentist inference (as it affects the reference set of problems for which a method will be applied). So, yeah, it’s important. As noted above, I’d prefer to publish everything (and the histogram above from the Cochrane database shows that it is possible to have publication without massive selection on statistical significance), but however things are being published, it’s important to understand the publication and reporting process.

Also, for reasons discussed in the above post we think it makes sense to model the distributions of underlying effects and underlying SNRs as continuous rather than binary.

Who has the lowest Erdos-Bacon-Epstein number?

Lawrence Summers of course has an Epstein number of 1 (the lowest you can have while still being alive), but his Erdos number is a disappointing 6. I don’t know any easy way to calculate his Bacon number, but Summers does have an IMDB page . . . maybe the best connection will be through this documentary, Panic: The Untold Story of the 2008 Financial Crisis, which also includes Epstein friends Steve Bannon and Donald Trump. Bannon produced this documentary about Sarah Palin, featuring Pamela Anderson and Roseanne Barr. And this site gives Anderson a Bacon number of 2. So Summers’s Bacon number is no more than 4, thus his Erdos-Bacon-Epstein is at most 11. That’s stretching it, though, because the definition of Bacon number seems to require that all the people in the chain have acted in the movies. Being filmed as yourself (as with Summers) or being a producer (as with Bannon) doesn’t really count. So I guess Summers has an infinite Erdos-Bacon-Epstein number. Sorry, Larry! No Nobel Prize and no Bacon number. I guess you can forget about the EGOT too.

Here’s a possibility. Sergei Brin has an Erdos-Bacon number of 5, and in the Epstein files there’s this story:

I don’t know that Brin had any emails with the famed financier, but I think it’s fair to guess that his Epstein number is no more than 2, so his Erdos-Bacon-Epstein number is at most 7.

Also . . . it seems that Bill Gates has an Erdos-Bacon number of 6, and he was a known associate of Epstein, so he also has an Erdos-Bacon-Epstein number of 7.

What about me? I have an Erdos number of 3 and, sadly, an Epstein number of 2 (through this guy and this guy). But my Bacon number is infinity, as I’ve never participated in a movie. If they ever make a film of Recursion, maybe. Come to think of it, Kevin Bacon would make a great Bob Dwyer; I could see it working out.

The Erdos-Bacon-Epstein challenge is that you need a connection in academic publication, a connection in the movies, and an email link to Epstein. That last bit is the easiest: anyone can send an email, and I expect that just about everyone online has a fairly low Epstein number. My mom has an Epstein number of 3! Lots and lots of scientists and writers have Epstein numbers of 2 the same way that I do, if they’ve ever worked with or considered working with the notorious book agent John Brockman. I’m locked out regarding Bacon, and lots of other people have infinite Erdos numbers. Peter Thiel, for example, is an Epstein associate and has some IMBD credits but an infinite Erdos number. Similarly, Woody Allen has an Epstein number of 1 (I expect) and a Bacon number of 2 but no Erdos connection.

But here’s a thought: Stephen Hawking! Another Epstein associate–no email from or to him in the recently released files, but I think it’s safe to assign him an Epstein number of 1. His Bacon number is 2 and his Erdos number is a surprisingly high 4–I guess Hawking didn’t actually publish so many papers during his career–so his Erdos-Bacon-Epstein number is 7.

And . . . Noam Chomsky. He has an Erdos number of 4, a Bacon number of 2, and an Epstein number of 1, so a total of 7. And Noam is still active! With some effort maybe he could find the right collaborator and get his Erdos number down to 3 and have an Erdos-Bacon-Epstein number of 6.

But I’d give Noam an asterisk. I got his Bacon number of 2 from wikipedia, where it says he “co-starred with Danny Glover in the 2005 documentary The Peace!, giving him a Bacon number of 2.” But it seems that he was just a talking head in that movie, not acting, so I don’t know if that really counts.

Here’s an amusing bit from Noam’s IMDB page:

“Trivia” . . . that’s about right!

Any other possibilities? Going over to Epstein’s birthday book, we see some famous names with some academic connections, including Nathan “Albedo boy” Myhrvold and Henry “Harvard” Rosovsky. Both are on Google Scholar and no doubt have finite Erdos numbers, but neither seems to have acted in any movies. Myhrvold hosted some kind of cooking show but that doesn’t really count, sorry. Also some prominent-but-not-famous-scientists such as Gerald Edelman, Stephen Kosslyn, and Lee Smolin, but no IMDB acting credits among them. And my namesake Murray Gell-Mann. I hate to see that! But, again, no relevant screen time.

Epstein’s birthday book also has an entry from someone named “Ace Greenberg.” Hey, what is this, a Damon Runyon story?

Really you have to read that birthday book all the way through. There’s a letter from a “Johnny Boy” who says that Epstein is his kid’s role model. How creepy is that? What a horrible parent. I get that some people are seduced by money, sex, and power and thought that Epstein was cool because he had all three, but to bring your kind into that? Yuck! And Myhrvold’s charming collection of animals in sexual positions.

And then there’s Marvin Minsky, who writes that Epstein is the second-quickest intellect he’s ever met. Wha . . .? Either Minsky’s students and colleagues at MIT were a lot slower than I’d have pictured, or Epstein was much more impressive in person than I could possibly imagine based on anything in his emails, or the famous artificial intelligence pioneer was not such a good judge of intellectual quickness. I’m racking my brain here to think of whatever witticisms or quick replies or deep thoughts Epstein had to offer that would lead Minsky to this assessment. Then again, one of the greatest statisticians of the twentieth century said he really enjoyed meeting with Epstein . . . so I guess the man had some charm that just doesn’t come across on paper.

Also featured in the birthday book is Alan Dershowitz. He’s gotta be a contender here, right? An Epstein number of 1–indeed, given all his connections, he practically has an Epstein number of 0–and he has acting credits on IMDB! He’s in this Rob Lowe movie from 2012 with the amusing-in-retrospect tag line, “A political strategist juggling three clients questions whether or not to take the high road as the ugly side of his work begins to haunt him.”

Don’t worry–the Dersh would never take the low road!

Anyway, this movie also features David Harbour, who actually appeared in a movie with the prolific Kevin Bacon. So Dershowitz has an Epstein number of 1, a Bacon number of 2, and an Erdos number of . . . Let’s go to Google Scholar.

Here are the first three links:

In the first one, he says the death penalty should be abolished. In the second, he defends the use of torture. In the third . . . I haven’t read it, but I guess he thinks the death penalty and torture are ok if Israel does it? But this doesn’t get us any closer to Paul Erdos. We need to find some place where Dershowitz has coauthored with a scientist, or someone who would’ve coauthored with a scientist, etc. His papers are mostly solo authored, so it’s tough. Dershowitz was interviewed in the magazine Litigation by Ashish Joshi, who also interviewed judge Jed Rakoff, who wrote an article on eyewitness identification with neuroscientist Thomas Albright, who’s published lots of scientific papers and so surely has had some collaborators who have worked with mathematicians. I don’t know what Albright’s Erdos number is, but if it’s 5, then this would give Dershowitz an Erdos number of no more than 8, thus an Erdos-Bacon-Epstein total of 11. Not bad! Too bad he couldn’t get closer on the Erdos part of the game. It’s kind of like being a potential triathlete but barely being able to swim.

What Dershowitz should do is publish a paper with Steven Pinker! It would be easy. Pinker famously wrote up a linguistics argument in defense of Epstein as a favor for Dershowitz. Pinker knows his linguistics so I’m sure this would be publishable somewhere. Pinker has an Erdos number of 3 (just like me!), so this collaboration would make Dershowitz an Erdos 4, giving him a combined Erdos-Bacon-Epstein number of 7.

What about Pinker himself? Erdos 3, Epstein 1 (or maybe 2), but no IMDB acting credits, just some TV appearances as himself, so no dice.

Again, the hard part is finding people in Epstein’s orbit who have academic publications and acting credits. Some academics, some people in the entertainment industry, but not many with both. Bacon himself, for example, has no academic publications. (Nor does he have any direct connection to Epstein, but he may well have had email exchanges with Kevin Spacey or someone else in the Epstein orbit.) I don’t think Andrew Mountbatten-Windsor has any academic publications either. I guess he was too busy with his research to get around to writing any of it up.

But wait . . . here’s a dark-horse candidate. Looking again at the contributors to Epstein’s birthday book, we see a bunch of businessmen, some politicians and scientists, some girlfriends, and some names that I didn’t recognize at all, including someone named Stuart Pivar. Hmmm . . . according to wikipedia, there’s person with that name, born 1930, who’s a “chemist and art collector known for his unorthodox views about evolution.” That sounds like someone in Epstein’s circle, and, indeed, scroll down the wikipedia page and you’ll see the connection. And it says here that he found a Vincent van Gogh painting from a flea market or something like that! Pivar’s Epstein number is 1 (you get that by contributing to the birthday book) and . . . ummm, yes, he has an IMDB acting credit, having been one of many many people to have played the role of Socrates in the 2010 film, The Death of Socrates (writers are listed as “Plato, Benjamin Jowett, and Natasa Prosenc Stearns“) and featuring Ray Abruzzo, who has a Bacon number of 2. So the somewhat obscure (although not completely obscure, I guess, given that there have been magazine articles about the guy) Pivar is an Epstein 1, Bacon 3. What about his Erdos number? Pivar was a chemist. According to wikipedia, “As an inventor, he made a large fortune in plastics.” On Google scholar, we see this 2016 paper, “Origin of the vertebrate body plan via mechanically biased conservation of regular geometrical patterns in the structure of the blastula,” with David B. Edelman, Mark McMenamin, and Peter Sheesley. At this point, Pivar is considered a bit of a crank, so I doubt these coauthors are serious scientists themselves, but maybe we can follow some links and get to mainstream science, and there to mathematics, and there to Erdos. McMenamin has a Google Scholar page but it all seems pretty narrow . . . hmmm, there’s a paper, “Did surface temperatures constrain microbial evolution?”, with David Schwartzman and Tyler Volk. Schwartzman seems like a bit of a dead end, but Volk, in addition to publishing some cranky-looking things himself (“Gaia’s body: toward a physiology of Earth”) also published a speculative paper in Science with coauthors including earth scientist Klaus Lackner, who I saw speak at Columbia once! Lackner hung out sometimes with Upmanu Lall, who has published a paper with me, and I have an Erdos number of 3. If we suppose that there is a link connecting Lackner to Lall, this would give Pivar an Erdos number of no greater than 9.

But it’s hard to imagine that 9 is the best we can do for Pivar. Another route is through another of his collaborators on that paper, Edelman, who also wrote this article:

Ummmm, I’m skeptical. Evaluating a horserace prediction method based on only 300 races? C’mon. But, hey, all things are possible. Perhaps this Edelman fellow is now rolling in the dough. Maybe he owns a few Arabian thoroughbreds himself!

Edelman (not the Gerald mentioned earlier, unfortunately) also wrote a couple of papers on finance. That seems like a possible route to mathematics, and thus Erdos. There’s a paper with Patrick O’Sullivan on Adaptive Universal Profiles, but his links are all applied finance, no math happening here, also a book, Numerical Methods for Finance, with two coauthors, including a John Appelby who wrote some papers on differential equations . . . whatever. I give up on this one. It’s surprisingly difficult to navigate the publication network. It’s hard for me to believe that we can’t get Stuart Pivar‘s Erdos number below 9, but maybe that’s what it is. I’ll just tag Pivar with an Erdos-Bacon-Epstein number of 9 + 3 + 1 = 13.

Also, I don’t like Pivar. I’ve never met him, but he appears to be a liar. His wikipedia page says, “Pivar was also a well-known friend of the late financier Jeffrey Epstein; however, the two had a falling out prior to Epstein facing charges for sex crimes. Pivar corroborated the account of Maria Farmer, a graduate of the New York Academy of Art in 1995, who stated that she had informed him about her abuse at the hands of Epstein in 1996. According to Pivar, this was when the friendship with Epstein ended.”

But Pivar contributed to Epstein’s 50th birthday book. Birthdays seem to have been a big deal to Epstein; his sycophantic correspondents are always wishing him happy birthday. Anyway, Epstein was born in 1953, so his 50th birthday was in 2003, so unless Pivar wrote that tribute seven years ahead of time, he was lying when he said in 2019 that he’d ended his relationship with Epstein back in 1996.

Also this bizarre bit:

In August 2007, Pivar sued a science blogger named P. Z. Myers and Seed Media Group, which hosted his blog, alleging defamation. Myers had lit into Pivar’s work, calling him “a classic crackpot.” In his complaint, Pivar made a point of mentioning by name two prominent members of SMG’s board: Jeffrey Epstein and Ghislaine Maxwell. The lawsuit was later dropped.

Remember blogging? That used to be a thing.

And . . . the winner!

This is someone you’d never have expected. According to wikipedia, MIT mathematician Daniel Kleitman has an Erdos-Bacon number of 3. Kleitman was at MIT forever, so I bet he has some email exchanges with Chomsky or some other Epstein intimate, in which case his Erdos-Bacon-Epstein number would be 5.

I knew Kleitman! He was my freshman adviser at MIT. There was a group of about five of us, and we met with him in his office a few times. He was a nice guy, very blunt spoken–not in a crude way at all, just the kind of guy who would say something was bullshit. I told Kleitman I was interested in doing research, and he connected me with a graduate student, Susan Assmann, who gave me a project to work on. It took me a year to figure it out. I learned a lot from the experience; the story is here.

From a statistical point of view, the lesson here is that low Erdos and Bacon numbers are rare, so the best way to perform this search is not to start with Epstein associates but rather to take Erdos-Bacon champs and then go to Epstein from there. For example, mathematician Jordan Ellenberg has an Erdos-Bacon number of 5. I’ve emailed with Jordan, so his Epstein number is at most 3, giving him an Erdos-Bacon-Epstein upper bound of 8. But Jordan could well have been contacted by Brockman at some point, in which case he’d have a number of 7, tied with various other people listed above.

Does anyone in the world have an Erdos-Bacon number of 6? I don’t know.

What’s the point?

Why do all this? Why did I spend two precious hours of my time on earth tracking down these links and writing all this? Or, maybe more to the point, why did you read all this. (I’m conditioning here on whatever subset of our blog audience has who’ve read this far down on the post.)

The quick answer is that connections can be interesting. You can learn all sorts of unexpected things from this sort of quasi-stochastic search.

Another answer is that seeing these connections of various elite and not-so-elite people gives us some sense of the social world. It’s a core sample of part of American society.

The other interesting thing about the Epstein files is the content. Not the crude sexism: people will say all sorts of things in private, so this sort of thing is hardly shocking. If a cookbook writer / retired technology executive thought it was cute to talk about sex with one of his rich friends, so be it. The part that was more stunning to me was all these luminaries who seemed so impressed by Epstein. In addition to the aforementioned pioneers of statistics and computer science, you’ve got ultra-successful businessmen such as Gates, artists such as Andres “Piss Christ” Serrano, leading physicists (sorry, no Bacon number here; according to IMDB the closest she came was an uncredited role on a TV show, and I think that only movies count), etc.

I get it that lots of politicians got caught in a net: if you’re a politician, you pretty much can’t avoid getting close to lots of distasteful people. I’m not saying that it’s cool that Trump, Clinton, Richardson, Bannon, Thiel, etc. were friendly with Epstein, but it’s also not so clear what the alternative would be. If you’re in politics, you only have a limited number of times you can piss off powerful and well-connected people. But in academia and in business, you can do what you want most of the time. The idea that these people were choosing to hang out with Jeff, going to the trouble to wish him happy birthday . . . it’s just weird. Again, I think Epstein must have had a real ability to talk with lots of different people, making people as different as Bannon, Chomsky, Minsky, and Summers to all think he agreed with them. And that’s kind of interesting.

Finally, I laugh because otherwise I would cry. I joke about all this because that’s a way to deal with disturbing things. So many horrible things were done here with the complicity of business and academic leaders as well as state and national governments.

The signal-to-noise ratio in statistics

This is Erik. When fortune smiles on us, we may get an unbiased, normally distributed estimator y with standard error s of some (unknown) parameter of interest theta. Here, we’ll even assume that s is known. The difference between knowing s and having to estimate it, is the difference between a t-test and a z-test. That’s a minor difference when there aren’t any serious outliers and the sample size is not too small. So, let’s just hope for the best and assume that y has the normal distribution with mean theta and standard deviation s. The 95% confidence interval for theta is y ± 1.96 × s. All very standard.

The z-statistic is the ratio of the estimator to its standard error, so z=y/s. It follows that z has the normal distribution with mean theta/s and standard deviation 1. There doesn’t seem to be a good name in statistics for theta/s, i.e. the ratio of the true parameter to the standard error of its estimator. However, we could borrow a term from engineering: the signal-to-noise ratio or SNR. So, let’s define SNR=theta/s. Then the z-statistic has the normal distribution with mean SNR and standard deviation 1.

The SNR is easy to interpret. If theta=0 then the SNR is zero as well. SNR=1 (or -1) means that the parameter we’re trying to estimate has about the same magnitude as the noise in our estimator. That’s not a very favorable situation. For example there is a 16% chance that the estimator has a different sign than the true parameter. That’s because

P(y < 0 | SNR=1) = P(z < 0 | SNR=1) = pnorm(0,1,1)=0.16.

If SNR=2.8 (or -2.8) then the probability to reject the hypothesis that theta=0 is 80% (alpha=0.05 two-sided). That’s because

P(|z|>1.96 | SNR=2.8)=pnorm(-1.96,2.8,1) + 1 – pnorm(1.96,2.8,1) = 0.8.

There is a 1-1 relation between the absolute z-statistic and the two-sided p-value for testing the hypothesis that theta=0. In R, we have z=qnorm(1-p/2) and p=2*pnorm(-abs(z)).Still, I like z-statistics better than p-values because of their direct relation to the SNR. In fact, we can think of the z-statistic as an estimate of the SNR with standard error 1. The SNR and z-statistic say something about the quality of an experiment without reference to hypothesis testing.

A few days ago, I posted a histogram of z-statistics from PubMed, and noted the lack of z-statistics between -2 and 2. I’ve now also made histograms of the corresponding two-sided p-values, with and without log-transformed axis. As expected, these show a steep drop at 0.05. To me, the histogram of the z-statistics is easiest to read. One might even say that the p-value is a distortion of the z-statistic. Or do you think that goes too far?

“The Limits of Ethical AI”

Aleks Jakulin writes:

This is such a (didactically) beautiful piece of investigation.

Maybe hunting for imagined “bias” is a folly, and we should be maximizing the bias in favor of better outcomes.

I don’t get why Aleks refers to bias as being “imagined,” but I agree with his general point, which is that the focus should be on outcomes. Most simply, you’d want to assign a positive utility to each good outcome and a negative utility to each bad outcome. Given that this AI system is being implemented at all, the goal has got to be to do better than whatever was the existing procedure, so the net outcome will be positive. I’d think the best approach would be to maximize utility, as defined based on individual and aggregate outcomes, and then use some sort of side payments to compensate people who have been inappropriately classified.

That said, there’s nothing wrong with estimating various aggregate measures of disparity as well, although I’d recommend against using evocative terms such “fairness” which then get associated with various mathematical measures of asymmetry.

To put it another way, “ethical AI” has two limitations here:

1. According to the linked report, it doesn’t work so well at its stated goals.

2. Various definitions of algorithmic ethics, fairness, and bias contradict each other, and they seem to be based on a false intuition that it should be possible for all measures of disparity to be zero.

Who wants no kid vax law?

Palko points to this link to a recent Pew Research Report which states, “Nearly two-thirds have high confidence in vaccine effectiveness, and about half trust their safety testing and schedule; Republican support for school vaccine requirements continues to slide,” and includes the above graph.

The focus of the online discussion was that Democrats’ views have remained stable while Republicans have shifted a lot, in a disturbing direction. From a political science perspective, this is an interesting example of partisan polarization of an issue attitude happening in real time.

But I was struck by something else: Even back in the pre-covid era, before vaccination became politically polarized, a sixth of Americans opposed vaccine mandates. Some of this must be from the question wording (“Parents should be able to decide not to vaccinate their children, even if that may create health risks for others,” rather than, say, “Schools should be able to require that children be vaccinated”), but I wonder if some of this is a sort of costless opposition. It’s possible to oppose a policy in principle, without any expectation that it will be repealed.

P.S. This post’s title harks back to the John Lennon classic, What joker put seven dog lice in my Iraqi fez box?

“What do you think is the ideal number of children for a family to have?” Two different statistical measurement challenges arise from this one question on the General Social Survey.

Philip Cohen shares the above graph.

There’s a lot to chew on here. First, what is the question asking, exactly? As Cohen says, “This is a question of normative views, not how many children people want themselves (which we would call intention or preference).” The “normative views” thing is interesting, in part because the ideal number would depend on the family. Indeed, even framing the question as “for a family” could be taken to imply that the number of children would be positive. I’m actually surprised that so few respondents gave “1” as an answer. It makes sense that 2 is the most common response, and it’s interesting to see the time series; I’m just not quite sure how to think about these responses. Indeed, I’m not sure how I myself would respond to the question!

Cohen focuses on a different aspect of the above graph, which is the increasing number of people who respond, “As many as you want.” Whassup with that? Cohen writes:

The category has gotten big enough that you can no longer ignore it, as many analysts have. The worst culprit may be Lyman Stone, who has repeatedly used the contrast between ideal and reality to promote pronatal policy, as here and here, where he used it to declare, “most women achieve less than their desired fertility.” The question does not measure respondents’ desired fertility, but rather their normative ideal.

I agree with Cohen that it does not make sense to use the “ideal number” question to represent “desired fertility.” Lyman Stone writes, “What if we give female survey respondents a bit more latitude, and ask them how many kids they’d ideally like to have?” But the “ideal number of children” question is not asking people how many children they would like to have. Again, this can easily be seen by looking at how few 0’s and 1’s there are on the above graph. There are people out there who don’t want any kids themselves, but they could still think that 2 is “the ideal number of children for a family to have?”

But back to the “As many as you want” option. Here’s Cohen’s explanation:

The major difference is the survey is increasingly done online. From 2004 to 2018, the vast majority — roughly 80-90% — of interviews were done in person. In 2021, because of the pandemic, none were in person, 12% were by phone and 88% were online. In 2022, 46% were online (this is the MODE variable). How does this affect ideal family size? In the 2022 codebook, GSS calls this variable a “classic example of differences in stimulus between interviewer-administered and self-administered modes.” In person, they don’t give people the choice of “as many as you want,” they just record it if people spontaneously offer that opinion. But in the web version, that option is now on the list presented to them. As a result, the number of people choosing this answer has shot up.

In retrospect, it seems a mistake to have included “As many as you want” explicitly on the web version. Instead, maybe they could have allowed a free response for anyone who did not give a numerical answer to the question. It’s too late now for the 2022 GSS, but if you’re asked, “What do you think is the ideal number?”, then, yeah, “As many as you want” seems like a very natural answer to the question–if it is given as an option. If it’s not an option, then it should be clear that the respondent is supposed to supply a number.

There’s more in Cohen’s post–he also looks at time trends broken down by political ideology. Here I just wanted to focus on the measurement issues. It’s an interesting statistical example because measurement problems arise in two separate places: first there’s the interpretation of “ideal number of children for a family to have,” which is not the same as “how many children you would ideally like to have”; second there’s the “As many as you want” response, which got out of control because of carelessness when adapting a face-to-face survey to an online format.

Three meta-principles of statistics: the information principle, the methodological attribution problem, and different applications demand different philosophies

The information principle: the key to a good statistical method is not its underlying philosophy or mathematical reasoning, but rather what information the method allows us to use. Good methods make use of more information. This can come in different ways . . .

The methodological attribution problem: the many useful contributions of a good statistical consultant, or collaborator, will often be attributed to the statistician’s methods or philosophy rather than to the artful efforts of the statistician himself or herself. . . .

Different applications demand different philosophies: consider Rob Kass’s remark: “I tell my students in neurobiology that in claiming statistical significance I get nervous unless the p-value is much smaller than 0.01.” In political science, we are typically not aiming for that level of certainty. (Just to get a sense of the scale of things, there have been barely 100 national elections in all of U.S. history, and political scientists studying the modern era typically start in 1946.)

These appeared in my 2010 article, Bayesian statistics then and now, which is a discussion of an article by Brad Efron, “The future of indirect evidence” and of Rob Kass’s discussion of Efron’s article.

Here’s another line from my paper:

Maximum likelihood, like many classical methods, works pretty well in practice only because practitioners interpret the methods flexibly and do not do the really stupid versions (such as joint maximization of parameters and hyperparameters) that are allowed by the theory.

This is related to the idea that theoretical statistics is the theory of applied statistics, and methods as they are applied in real life are not always the same as what you might think from the formulas alone.

StatRetro: The twitter feed that spits out our old blog posts, one at a time, every 8 hours

Hi all. Just a reminder that we have a bot that posts all the posts on this blog. It’s the twitter account StatRetro. We started it a couple years ago with our very first post from 2004, and it just keeps going through the blog in order, posting every 8 hours. I guess that in another 10 years or so it will catch up and we’ll start over again.

Right now it’s in April, 2010, and here are the few most recent entries:

Enjoy.

This guy’s mad about fake research, and he should be. Research incompetence, research fraud, and the promotion of fraudulent or incompetent work . . . these are not victimless crimes.

Here:

Michael Sanders based his World Bank-funded study in Guatemala on a trick supposed to increase tax compliance, but now says Francesca Gino’s work misled him

At the time Michael Sanders remembers being embarrassed, even ashamed. He had convinced the World Bank to fund a study on an entire country to show that one simple trick could drastically increase tax compliance. If you made people pledge their honesty beforehand, research showed, they were more honest. But the one simple trick had proven tricksy. Despite strong prior evidence it should have had dramatic effects, it had done nothing at all. . . .

“I’m furious,” he said. The former member of the British government-created Behavioural Insights Team believes he has good reason to feel misled about that evidence. . . . Now Sanders is left wondering how they didn’t spot it earlier. Not least because Gino, who was paid $1 million a year for her media-friendly research, went on to write a book titled Rebel Talent: Why it Pays to Break the Rules at Work and in Life. “If only there was a clue,” Sanders said, “that she thought you could get ahead by breaking the rules.”

Here’s the full story:

Back in 2015 Sanders, now a professor at King’s College London, was working with the group better known as the Nudge Unit. Their job was to use behavioural science to help governments work better.

They had, they thought, found one such method. To keep people truthful when filling in official documents, make them sign the declaration of honesty before, rather than after. Then, when they put in their details, they will remember they had promised to tell the truth.

A trial by some of the world’s leading researchers, including from Harvard, had shown stunning success. It was Sanders’ job to validate it at scale. They went to Guatemala, a country with very low tax compliance. They conducted a study on all the tax returns, comparing those with honesty declarations at the start with those at the end.

He was confident. “It was an enormous amount of data. If there was an effect, we would see it. With samples that size, you can detect the sound of a gnat’s wing.” And the effect? “Nothing.”

They had promised the Guatemalans big impacts, and with justification. The authors of the original research were, said Sanders, “a who’s who of behavioural science.” Yet they couldn’t replicate the work. It was inexplicable.

The news article continues:

It is, today, more explicable. There is compelling evidence in the raw data to suggest Gino manipulated her results. She denies it, but Harvard apparently concurs. Which means that hers is just the latest Ted-talk-friendly research paper to tumble. . . . Many studies involved small numbers of participants and questionable statistics. . . .

He said they really should have thought more about her book. “It’s like if she wrote a book, My Adventures as a Pirate by Long John Silver, and then we still get surprised when it turns out Long John Silver is a bit of a wrong ‘un.”

This sort of news story is important. It’s easy to focus on the sloppy researchers and the careerists and the ideologues and the people whose names keep appearing on papers with fake data and the copiers-without-attribution and the cartoonishly overconfident hypesters and the megalomaniacs and the people who won’t acknowledge their mistakes and the people who never back down (see page 51 here) and the plagiarists and the torment executioners and the would-be manipulators of the system and the people who copy without attribution or shame, even from wikipedia, or . . . ok, you get the idea. It’s easy indeed to focus on the wrongdoers, or even to come up with elaborate justifications for their behavior or analyses of their character. It’s the fascination of moral and intellectual train wreck.

I have this image in my mind, that scene from The Fugitive where they’re hijacking the bus and then it crashes into the moving train. It’s that volatile combination of intellectual failure (bad research, statistical errors, lack of imagination, the one-way-street fallacy, etc.) with moral failure (not letting go, defensiveness, careerism, etc.).

Anyway, yeah, let’s look away from the villains for a moment and spare some thought for the victims, who in this case include some individual Guatemalans whose time was wasted, the Guatemalan government which isn’t getting the revenues they deserve and for which this failed intervention may have served as a distraction from a more direct approach to tax collection, the researchers who wasted some chunks of their careers on this (it’s ok to spend time on bad ideas, that happens to all of us; the problem here is that the wasted effort was based on falsified evidence, so it was really an unnecessary blind alley down which they were led), the British taxpayers who were funding that “nudge unit,” and . . . all of us. I just spent 20 minutes writing a blog post instead of making a valuable research contribution to the world, you just spent 5 minutes reading this, etc.

Now, you might say that this last inefficiency was unnecessary, as nobody was forcing me or paying me or otherwise strongly motivating me to write this post. But I think it’s important! Research incompetence, research fraud, and the promotion of fraudulent or incompetent work (recall Clarke’s law) . . . these are not victimless crimes.

Survey Statistics: sampling the sample

So far we’ve focused on means E[Y] as our quantity of interest (or means for subgroups like voters E[Y | V=1]). But now suppose you want to browse Y with your eyeballs. For example, maybe Y are survey openends (“describe how you feel about the candidate”), and you want to read thru a few of them. But you want these to look like draws from the population, not survey responders, who might have a different distribution of openends.

Suppose your survey openends are y_1, …, y_n. Can we sample from them to get openends that look like they were drawn from the population ?

Let Y* be samples (with replacement) from our survey openends with probabilities p_1,…,p_n. What should these probabilities be so that P[Y* = y] = P[Y = y] for all openends y ? We’ve been working with means, so let’s write this as E[f(Y*)] = E[f(Y)] where we can take f(Y) as an indicator that Y=y.

We can identify means E[f(Y)] as E[f(Y)W | R=1] where we get weights W either by a model for response R or by “equivalent weights” (see the October post and its follow-up in November). Estimated with our survey sample this is

1/n sum_i=1^n w_i f(y_i)

So if we take p_i = w_i/n, then we get Y* with the correct distribution ! In other words, we should sample from our openends with probability proportional to the weights. Has anyone done this before ? Tell me more !

 

Under what sort of systematic reporting errors will science be self-correcting, or not? And do gardening programs reduce obesity?

This paper by Jon Agley, Sarah Deemer, and David Allison begins:

Science is often said to be self-correcting . . . However, self-correction does not always happen to scientific evidence by default; it requires human agency. Further, even when scientists actively pursue corrections, it can take such a long time to achieve them that additional work may be done, or policy enacted, in the interim based on unjustified premises.

And here’s their point:

The standard dialogue seems to be that errors occur randomly, and that larger and better studies will eventually wash them out. We argue that this is not true for all types of errors. Specifically, we present a taxonomy of three types of errors: basic, non-Markovian, and biased Markovian.

• Basic errors are both (a) Markovian, meaning that whether the error occurs does not depend on any factors before the study at hand (e.g., honest coding errors), and (b) uninfluenced by common biasing factors outside the study that may exert influence in the same direction and away from the truth (e.g., a cultural norm or expectation).

• Non-Markovian errors are those for which the likelihood or nature of the error depends on prior errors within other studies, such as a junior researcher replicating an incorrect methodology used regularly by a more senior scientist in the field. We sometimes also refer to such errors as being “correlated,” by which we mean a dependency of this type, not a statistical correlation.

• Biased Markovian errors are those for which a common force leads to consistent bias in the same direction across many studies, but the errors remain Markovian because they do not depend on prior errors in a sequence of studies.

We argue that basic scientific errors may resist correction, but in principle can be mitigated by the general scientific progress within a field. In contrast, biased Markovian and non-Markovian errors may be more likely to lead an entire domain of study astray because they produce sequences of studies that are probabilistically linked, either directly or indirectly, through erroneous constructs. In other words, these errors are more likely to propagate errors.

We illustrate our hypothesis using the popular idea that gardening programs can reduce obesity, and we propose several directions for pursuing solutions.

Gardening programs can reduce obesity? I had no idea!

Seriously, though, I like the ideas in this paper, even though I’m not thrilled with the terms “Markovian” and “bias,” as these have existing technical meanings that don’t quite line up with how they’re being used in this paper.

I also see connections to the classic paper by Paul Smaldino and Richard McElreath, The Natural Selection of Bad Science