Skip to content

What is the landscape of uncertainty outside the clinical trial’s methods?

I live in the province of British Columbia in the country of Canada (right, this post is not by Andrew, it is by Lizzie). Recently one of our top provincial health officials, Dr. Bonnie Henry, has received extra scrutiny based on her decision to delay second doses of the vaccine. The general argument against this is the one I have heard from Dr. Fauci of the US, who has been various levels of adamant that you do what the trial did (I would say very adamant, very adamant, very adamant, then slightly less adamant after the kerfuffle with the UK). You don’t deviate from the methods of the trial.

This has got me wondering what the landscape of uncertainty looks like as you move away from the methods of a clinical trial. And what progress we’ve made — if any — on this in the last couple of decades, when I first realized how stark the divide between inside and outside the trial methods is for many.

Over 15 years ago I was helping take care of a 50-year-old family member who had cancer and was struggling to get through a 6 week regime of radiation + chemotherapy at a major cancer institute in Boston. She had gotten through the first couple of weeks okay, even driving herself the 6-8+ round-trip hours from her home to the institute five days a week for her daily radiation appointments. But things got progressively worse in the third and following weeks (when I was her trusty chauffeur and companion). By her fifth week she was in and out of the ER with various major issues and was receiving various infusions to attempt to prop up her system so she could survive the next dose of radiation. Every day before radiation she needed a series of tests followed by a visit to her oncologist to get approval for that day’s dose of radiation, and this did not seem out of the ordinary for the later weeks of high-dose radiation therapy.

At one visit, when things were going particularly poorly, the radiation oncologist was brought in to consult on whether to continue treatment. He was advising for continuing, though it would be hard. It took a lot of her energy to speak, so she was often quiet, but on this day she asked him: ‘why do I have to do this? I have done this for most of the 30 visits, why do these last few matter so much?’ And he told her the truth — “because we only have data on the people who get the full dose. We don’t know what happens if you don’t take the full dose, or you take a few days off before continuing.” It was very helpful. I remember she said something along the lines of ‘okay,’ and we drove home in semi-shock, but at least we knew why they were pushing for this now. It was always her choice, but until then neither of us realized how gaping the uncertainty was between, say, going for 27 of your total 30 radiation visits, and going for all 30.

Clearly, the ethics matter, and that’s especially clear with a highly infectious and deadly disease like Covid. I assume that many of these deviant public health officials who have delayed second doses have done the simple SIR model math and figured out that: (n higher number of people vaccinated at X% efficacy given Y weeks of delaying the second dose)*(black box uncertainty as you deviate away from the trial methods)=likely more lives saved. Henry has cited studies showing >90% efficacy for the three weeks after the first dose of the Moderna and Pfizer vaccines, so I suspect she’s feeling good that her X in extending the second dose to four months is still fairly high and thus has some internal estimate on the landscape of uncertainty beyond the methods of the trial, and there’s growing data on this.

But if you listen to various interviews with Fauci and other public health officials, I start drifting into memories of discussions that start with, ‘what is the variance of a fixed effect? It’s either 0 or infinity.’ Now, I don’t mean exactly that — but I do mean there seems to be a large gap in perspectives here. In one — the clinical trial methods must be followed to a T, until a new or properly vetted trial of any deviation is approved, conducted and reviewed. And in another — some adjustments happen given the potential for lives saved despite the uncertainty and that ‘population health data’ is then used to make further adjustments on the fly (in conjunction with other ways of viewing the clinical trial data you have).

These debates have made me wonder what progress have we made addressing this uncertainty from both a bioethics, and data collection and design standpoint? I am not (at all) a bioethicist but the rigid adherence to the trial methods doesn’t feel terribly ethical to me, and I think Covid has highlighted that. So I wonder how much has changed in last 10, 20 or 30 years of how those who deviate or ‘drop out’ of clinical trials are handled as datapoints. Are they required to be tracked? Or is it better to save money by focusing only on those who follow the trial perfectly? Is there an incentive for research or new methods or databases that compile these deviants to start fleshing out that landscape of uncertainty beyond the clinical trial methods? Or is everything beyond basically zero, or maybe infinity? Or maybe somewhere in between.

Bayesian methods and what they offer compared to classical econometrics

A well-known economist who wishes to remain anonymous writes:

Can you write about this agent? He’s getting exponentially big on Twitter.

The link is to an econometrician, Jeffrey Wooldridge, who writes:

Many useful procedures—shrinkage, for example—can be derived from a Bayesian perspective. But those estimators can be studied from a frequentist perspective, and no strong assumptions are needed.

My [Woolridge’s] hesitation with Bayesian methods—when they differ from classical ones—is that they are not “robust” in the econometrics sense.

Suppose I have a fractional response and I have panel data. I’m interested in effects on the mean. I want to allow y(i,t) to have any pattern of serial correlation and any distribution. I want to allow heterogeneity correlated with covariates.

I know how I would approach this: pooled quasi-MLE with a Chamberlain device and using cluster-robust inference.

How does a Bayesian solve this problem under the same assumptions plus a prior? I think it’s possible, but are such methods out there and in use?

My reaction to this will be milder than you might expect. Compared to the remarks of some other anti-Bayesians (see for example here and here), Wooldridge is pretty modest in his claims. He’s not saying that Bayesian methods are bad, just that they give him some hesitation.

Wooldridge’s main point seems to be that he and his colleagues have had success with non-Bayesian methods and, on the occasions that they’d looked around to see if Bayesian ideas could help, they haven’t been clear on where to start.

This suggests a need for a short paper taking some of his classical models and expressing them in Bayesian terms with Stan code. Wooldridge appears to be a Stata user, so it could also be useful to include some Stata code using StataStan to call the Stan program.

Somebody other than me will have to do that, as I don’t know what is meant by a fractional response, or pooled quasi-MLE, or a Chamberlain device. It won’t be possible to exactly duplicate these models Bayesianly—he wants it to work for “any pattern of serial correlation and any distribution,” and a Bayesian model would need some parametric form. But these parametric forms can be very flexible (splines, Gaussian processes, etc.), and I don’t actually think you really need the procedure to work for any pattern of serial correlation and any distribution. There are some patterns you’re never gonna see, and some distributions with such long tails that that no procedure would work to estimate effects on the mean. So I think his procedures must have some implicit constraints. In any case, I expect that it should be possible to set up a Bayesian model that pretty much does what Wooldridge wants, without taking too long to compute.

Regarding the claim that Bayesian methods are not robust in the econometrics sense . . . I dunno. I guess I’d have to see some simulation studies. I guess that in some sense his claim must be true, in the following sense: By construction, Bayesian inference maximizes statistical efficiency under the assumptions of the model Efficiency is only one of many goals of inference; thus, if you’re maximizing efficiency you must be losing somewhere else. We could just as well flip it around and say that I have hesitation with any statistical procedure X because it will be flawed when its assumptions fail.

As we’ve discussed in the past, one common failure mode of purportedly conservative or robust-but-inefficient methods is that users want results. They don’t want confidence intervals that are robust but are a mile wide. The way to get reasonable-sized confidence intervals with a statistically inefficient procedure is to throw in more data. For example, when fitting a time-series cross-section model, you might pool data from 40 years rather than just 10 years, so that you can estimate the average treatment effect with a desired level of precision. The trouble is, then you’re estimating some average over 40 years, and this might not be what you’re interested in. People will take this average treatment effect and act as if it applies in new cases, even though it’s not clear at all what to do with this average. Or, to put it another way, this parameter is answering the questions you want to answer—as long as you’re willing to make some strong assumptions about stability of the treatment effect.

So, ultimately, you’re trading off one set of assumptions for another. I’d typically rather make strong assumptions about something minor like the covariance structure of an error term and then be flexible about the things I really care about, like treatment interactions. But I guess the best choice will depend on the particular problems you work on, along with what can be done with the tools you’re familiar with.

I respect Wooldridge’s decision to stick with the methods he’s familiar with. I do that too! It makes sense. There’s a learning curve with any new approach, and I can well believe that Wooldridge using classical econometrics techniques will do better data analysis than Wooldridge using Bayesian methods, especially given that the tutorial I’ve outlined above does not yet exist.

Also, I agree with him that Bayesian methods can be studied from a frequentist perspective. That’s a point that Rubin often made. Rubin described Bayesian inference as a way of coming up with estimators and decision rules, and frequentist statistics as a framework them. And remember that Bayesians are frequentists.

I recommend that Wooldridge continue to use the methods he’s comfortable with. What would motivate him to try out Bayesian methods? If he’s working on a problem where strong prior information is available (as here) or where he has lots of data in scenario A and wants to generalize to similar-but-not-identical scenario B (as here) or where he wants to pipe his inferences into a decision analysis (as here) or where he’s interested in small-area estimation (as here) or various other settings. But until he ends up working on such problems, there’s no immediate need for him to switch away from what works for him. And we end up working on problems that our methods work on. Pluralism!

Being able to go into detail on this is a big reason I prefer blogs to twitter. I enjoy a good quip as much as the next person, but it’s also good to have space to explain myself and not just have to take a position.

My talk’s on April Fool’s but it’s not actually a joke

For the Boston chapter of the American Statistical Association, I’ll be speaking on this paper with Aki:

What are the most important statistical ideas of the past 50 years?

We argue that the most important statistical ideas of the past half century are: counterfactual causal inference, bootstrapping and simulation-based inference, overparameterized models and regularization, multilevel models, generic computation algorithms, adaptive decision analysis, robust inference, and exploratory data analysis. We discuss common features of these ideas, how they relate to modern computing and big data, and how they might be developed and extended in future decades. The goal of this article is to provoke thought and discussion regarding the larger themes of research in statistics and data science.

This one pushes all my buttons

August Wartin writes:

Just wanted to make you aware of this ongoing discussion about an article in JPE:

It’s the same professor Lidbom that was involved in this discussion a few years ago (I believe you mentioned something about it on your blog).

Indeed, we blogged it here.

Here’s the abstract of Lidbom’s more recent article:

In this comment, I [Lidbom] revisit the question raised in Karadja and Prawitz (2019) concerning a causal relationship between mass emigration and long-run political outcomes. I find that their analysis fails to recognize that their independent variable of interest, emigration, is severely underreported since approximately 30% of all Swedish emigrants are missing from their data. As a result, their instrumental variable estimator is inconsistent due to nonclassical measurement error. Another important problem is that their instrument is unlikely to be conditionally exogenous due to insufficient control for confounders correlated with their weather-based instrument. Indeed, they fail to properly account for non-linearities in the effect of weather shocks and to control for unobserved heterogeneity at the weather station level. Correcting for the any of these problems reveals that there is no relationship between emigration and political outcomes.

This one pushes all my buttons:

– Breakdown of the traditional publication process, with a series of replies appearing in different places,

– Measurement error problems,

– The use of weather as an instrumental variable,

– Claims based on statistical significance.

I do not have the energy to look into the details here, but it seemed to be worth sharing.

Alan Sokal on exponential growth and coronavirus rebound

Alan Sokal writes:

Last week Prime Minister Boris Johnson assured Britons that, come 21 June—at least, if all goes according to plan—we will “re-open everything up to and including nightclubs, and enable large events such as theatre performances.” Life will return to normal, or so he says.

Alas, Johnson is fooling himself, and it takes only a modest understanding of exponential growth to see why. . . .

For the original variant of SARS-Cov-2—the one prevalent throughout the world in early 2020—the evidence shows that, under conditions of normal social interaction, the reproduction number R is somewhere between 2 and 3. Let’s be optimistic and say 2. And the incubation period of the virus—how long it takes, on average, for an infected person to become infectious—is about 5 days. So one infected person will generate two cases after 5 days, then four after 10 days, then eight after 15 days, then 16 after 20 days, then 32 after 25 days, then 64 after one month. Then about 4000 after two months, and about 250,000 after three months, from just one initial case.

This is all approximate, of course, but you get the idea: exponential growth means a very rapid explosion, as anyone who has taken out a loan at 700% interest will sadly attest.

But the B.1.1.7 variant—the one now dominant in the UK—is believed to be about 50% more transmissible. That means an R, under normal social conditions, of somewhere between 3 and 4.5. Let’s again be optimistic and say 3. One infected person will therefore generate about 700 after one month, and about half a million after two months.

But now we have vaccines; let’s factor those in. . . .

[Suppose] we have 85% of the adult population receiving a 70% effective vaccine. Eighty-five percent times 70% equals a 60% reduction in transmission. That brings our assumed R=3 down to about 1.2.

That’s a huge reduction, but 1.2 is still bigger than 1: it means that the epidemic will double every 3 weeks. And this estimate is based on the most optimistic assumptions about the R factor, vaccine efficacy and vaccine uptake. . . .

What could bring R below 1, short of a full-scale lockdown? The answer is straightforward: vaccines, plus all the social distancing measures that we know well . . .

How long will this have to last? Until the number of cases has been reduced to a small enough number that further outbreaks—and there will be many—can be contained by contact tracing and brief local lockdowns. . . .

I think Alan might be too pessimistic here, in that even if the government and businesses officially “return to normal,” enough people will be scared or civic-minded enough to not go to clubs, theatre, etc. Yes, some people will do so, but I expect there will be a lot less circulation of people for awhile. So maybe I should say that I think Alan’s right that life won’t really return to normal anytime soon, but I’m not sure he’s right about Johnson fooling himself. I feel like we could interpret Johnson as saying that people will be free to go out and go to clubs etc., but this freedom will implicitly be relying on the fact that many people won’t actually make use of this freedom, at least not at the rate they’d been doing so before.

Earliest Known Uses of Some of the Words of Mathematics

Aki points us to this fun 1990s-style webpage from Jeff Miller. Last year we featured his page on word oddities and other trivia. You might also enjoy his page, Earliest Uses of Various Mathematical Symbols.

Here’s an example:

The equal symbol (=) was first used by Robert Recorde (c. 1510-1558) in 1557 in The Whetstone of Witte. He wrote, “I will sette as I doe often in woorke use, a paire of parralles, or Gemowe lines of one lengthe, thus : ==, bicause noe 2, thynges, can be moare equalle.” Recorde used an elongated form of the present symbol. He proposed no other algebraic symbol (Cajori vol. 1, page 164).

Here is an image of the page of The Whetstone of Witte on which the equal sign is introduced.

The equal symbol did not appear in print again until 1618, when it appeared in an anonymous Appendix, very probably due to Oughtred, printed in Edward Wright’s English translation of Napier’s Descriptio. It reappeared 1631, when it was used by Thomas Harriot and William Oughtred (Cajori vol. 1, page 298).

Cajori states (vol. 1, page 126):

A manuscript, kept in the Library of the University of Bologna, contains data regarding the sign of equality (=). These data have been communicated to me by Professor E. Bortolotti and tend to show that (=) as a sign of equality was developed at Bologna independently of Robert Recorde and perhaps earlier.
Cajori elsewhere writes that the manuscript was probably written between 1550 and 1568.

Lots more great examples at the above link.

Marshmallow update

Gur Huberman points us to this interesting article by Dee Gill about a posthumous research publication. It’s about 80 zillion times better than the usual science press release.

P.S. I did some quick googling and found some fun links showing past credulity on the marshmallow thing from the usual suspects: Sapolsky, Brooks, NPR. None of them could resist the immediate gratification associated with sharing that study.

No Gladwell or Easterbrook, though. They showed self-control. Let’s give credit where due.

A new approach to pandemic control by informing people of their social distance from exposure

Po-Shen Lo, a mathematician who works in graph theory, writes about a new approach he devised for pandemic control. He writes:

The significance of this new approach is potentially very high, because it not only can improve the current situation, but it would permanently add a new orthogonal tool to the toolbox for pandemic control, which works without vaccines or pharmaceutical treatments.

It’s a completely new type of phone app (www.novid.org), which resolves deep flaws in “contact tracing apps.” Functionally, it gives you an anonymous radar that tells you how “far” away COVID has just struck. “Far” is quantified by the graph-theoretic distance in the graph of physical relationships, automatically collected by the app via short-range communication with other people’s apps.

The simple idea flips the incentives. Previous approaches focused on controlling you, preemptively removing you from society if you were suspected of being infected. This new tool lets you see incoming disease to defend yourself just in time (e.g., by temporarily upgrading to a mask that protects you from others, as opposed to a cloth mask that mainly protects others from you). This uniquely aligns incentives so that even if everyone does what is best for themselves, they end up contributing to the whole. This solves the “tragedy of the commons.”

This could be important because (1) the COVID vaccines are being challenged by evasive new variants, and (2) at this rate, the next pandemic will paralyze the world again until new vaccine development and distribution.

Here’s the technical report, Flipping the Perspective in Contact Tracing.

It makes a lot of sense to me. You can read the report for details, but the key points are: (1) it’s measuring physical connection, not social connection (a link is established when two smartphones are within a specified distance of each other for a specified length of time), (2) it should work even if many people don’t enter their COVID-positive status themselves, and (3) Po-Shen tells me they’re trying it out now in two universities, so it’s a real thing. And here’s the general theme:

Instead of asking everyone within distance 1 of a positive case to quarantine, it tells everyone how far away the new positive cases have struck in their physical interaction network. This reversal changes the nature of the intervention, from one which “protects others from you” to one which “protects you from others.” Through that flip, the incentive structure also reverses, as users are given the opportunity to protect themselves before it is too late. Suddenly, users prefer false positives over false negatives (“better safe than sorry”), which is the opposite of the situation when they use apps that ask them to quarantine (the culturally unfamiliar “guilty until proven innocent”).

I find Po-Shen’s description persuasive, without my knowing enough about the details to say more than that. Speaking more generally, I like the combination of local data collection, centralized analysis, and individual decision making. This is an idea we’ve been interested in for a long time; see for example here.

There are also some interesting statistics problems here involving how the method performs as the data degrade in quality, how much this would be expected to improve outcomes compared to alternative procedures that don’t use the physical contact information, and how this procedure can be evaluated as it gets rolled out in different places.

P.S. Tyler Cowen beat me to this one by a few months. Po-Shen said that they’re doing it at Carnegie Mellon and Georgia Tech and that they have a critical mass of users. This would suggest that the way to go would be for a school or otther institution to get lots of people on it, and then the network effect works in a positive way, because if others are on it, there’s a benefit for individual users to join too.

P.P.S. Po-Shen responds to several concerns in commments.

Statistical fallacies as they arise in political science (from Bob Jervis)

Bob Jervis sends along this fun document he gives to the students in his classes. Enjoy.

Theories of International Relations

Assume that all the facts and assertions in these paragraphs are correct. Why do the conclusions not follow? (This does not mean that the conclusions are actually false.) What are the alternative explanations for the facts? What tests would tend to confirm or disconfirm these explanations? There are no tricks here and only in question 1a is the specific wording crucial. Numbers 19 and 22 are of a different character in that they do not involve specific fallacies but should provoke thought.

1. Doctors have found that in patients with a specified set of symptoms a certain kind of spinal fusion operation produces a success rate of 72% (as measured by the patient’s assessment that he or she feels much better a year later). Therefore if you have those symptoms you should have the operation.

1a. Today older people tend to be more politically conservative than younger ones. The explanation lies in the aging process: older people are more set in their ways and have more to lose by social change.

2. There is a positive correlation between the per capita GDP of a country and the degree to which it is democratic. Therefore as the poor countries get richer, they will also become more democratic.

3. Taking a random sample of wars, we study debates within each country that preceded the decision to fight and find that in the majority of cases the decision-makers, both civilian and military, were overoptimistic about the chances of victory. That is, they usually thought they would win and even if they did not, they thought they would do better than they actually did. From this I conclude that wishful thinking exists and is a cause of wars.

4. Taking a sample of Republican leaders, Republican voters, Democratic leaders, and Democratic voters, I compute the average Liberal-Conservative score for each group and find that while the Republican leaders are much more conservative than the Republican voters, the Democratic leaders are only slightly more liberal than the Democratic voters. From this I conclude that the individual Democratic leaders are moderates (in the sense of being close to the middle of existing political spectrum).

5. Many theorists claim that domestic instability tends to lead to foreign aggression. Others have made the opposite claim. The posited linkages are obvious. Assume that I develop a good measure of both variables. For each year I compute the total amount of domestic instability in all countries in the international system and correlate this with the total amount of external aggression by all states. I find no correlation at all and conclude that, contrary to both theories, there is no connection between domestic instability and war. (There are at least two major related fallacies here.)

6. To test Galtung’s proposition that status incongruity (i.e. being simultaneously high on some dimensions of international status and low on others) leads to aggressiveness, I correlate a measure of each state’s status incongruity (whose validity is assumed to be correct) with the number and intensity of wars it has been in. I find no correlation and conclude that proposition is false.

7. In explaining the origins of World War II, A.J.P. Taylor is correct in making almost no reference to Hitler’s extermination of the Jews. He is not concerned with making moral judgments about Hitler nor is he arguing whether or not the Allies should have made war on Hitler for the sake of those in Germany and the occupied territories. All he is trying to do is explain how and why the war started, the degree to which Hitler was unalterably aggressive, and the extent to which other countries share responsibility for the war. For questions such as these Hitler’s racial policies are irrelevant. Without endorsing Taylor’s answers to the questions he has set himself, or claiming that those questions are the most important ones, we can see that he was right to avoid being drawn into a discussion of Hitler’s domestic policies.

8. In discussing the balance of power system, one author argues: “A system containing merely growth-seeking actors will obviously be unstable; there would be no provision for balancing or restraint.” (Donald Reinken, “Computer Explorations of ‘Balance of Power'”, Morton Kaplan, ed., New Approaches to International Relations, p. 469).

9. Finding generalizations is much less important than is usually thought. Although they are useful for description, they do not help to explain anything. Knowing that X happens often, or even that it always happens, does not help explain why X occurs.

10. To study whether wishful thinking (i.e. the distortion of perceptions by desires) is a cause of crises within an alliance I take several alliances and examine a random sample of crises (defined as a sudden and unexpected occasion in which the partners find themselves in disagreement over an important issue). In all cases I find that each partner had expected the other to act in support of, or at least acquiesce in his policy. I therefore conclude that empirical evidence shows wishful thinking to be a major cause of intra-alliance crises.

11. It has been found that psychiatrists have the highest suicide rate of all occupational groups. This finding is explained by the proposition that becoming involved with other people’s problems creates strains that often cannot be handled even by a person with professional training. (Assume that we have an accurate measure of suicide rates.)

12. Assume that we have good measures of the success of policies and the amount of nongovernmental advice solicited. We find a strong negative correlation between these two measures. I conclude that the more the President listens to outside “experts” the worse off he will be.

13. Most Washington lobbyists say that they exert significant influence over the outcome of Congressional votes. On a series of close Congressional votes I ask Congressmen if they were influenced by lobbyists. The percentage answering “no” varies from 95 to 98. (Assume that Congressmen know what influences them and are telling the truth.) I conclude that, at least in the cases I have studied, lobbyists have little influence. (There are at least two major fallacies. Don’t stop with the easy one.)

14. Since every war has a loser, we can deduce, without even having to examine the pre-war debates, that in at least half of the cases the nation’s leaders overestimated their chances of victory.

15. What inferences about discrimination can we draw from the fact that the average batting average of blacks in the major leagues is higher than that of whites? Or from the finding that within the group of science Ph.D.s who have had research and teaching jobs for at least three’ years, women produce good scientific papers at a higher rate than do men? What about the following argument: we take a sample of male and female sales personnel and management employees and find that the women “are at least as reliable, somewhat less complacent, and somewhat more sociable. Women are a bit more impulsive than men, and certainly do not trail men in their energy level or willingness to work.” From this we conclude that

These (findings) clearly destroy many of the myths relating to sex difference in effective work potential and demonstrate that the under-representation of women in responsible jobs reveals that the process by which people are admitted to these jobs unfairly discriminates against women. (L.A. Times, Sept. 4, 1974, part 1A, p. 45)

16. An ad for a psychological biography of Nixon asked, “Did the bombing of Hanoi begin at the playing fields of Whittier?” What kind of evidence would be needed to confirm or refute the claim that Nixon’s policy in Vietnam is best explained by his personality?

17. Psychological theories yield several related propositions about the effects of tensions and crises. Because of time pressures, limitations on information channels and information processing abilities, and emotional strains, we expect that in a crisis: 1) decision-makers will tend to perceive the range of their own alternatives to be more restricted than those of the other side; 2) the search for one’s own alternatives will become increasingly narrow; 3) as the crisis develops, more and more information is flatly rejected, and 4) dissenters are increasingly excluded from the centers of decision. To confirm or disconfirm these propositions I plan to study several crises that led to wars (e.g. July 1914, August 1939). In order to provide the necessary comparisons, I will also study several crises that were resolved peacefully (e.g. Cuban Missile Crisis, Munich, Fashoda).

18. “The relevance of [Gerhard Ritter’s discussion of Allied war aims in World War I to his] history of militarism in Germany is not easy to detect. One of Ritter’s criticisms of Fischer was that he failed to talk about other people’s war aims. This criticism only means that no German historian should say that Germany behaved badly without also showing that other nations behaved worse. Ritter shows here that the Allies’ policy was at least no better than the Germans’.” (Norman Stone, “Gerhard Ritter and the First World War, ” in Historical Journal, vol. 13, 1970, p. 161).

19.

Explanations in international relations, and in the social sciences in general, are different from those in the physical sciences because they fail to perceive the essential difference from the standpoint of causation, between a paper flying before the wind and a man flying from a pursuing crowd. The paper knows no fear and the wind no hate, but without fear and hate the man would not fly nor the crowd pursue. If we try to reduce it to its bodily concomitant we merely substitute the concomitant for the reality expressed as fear. We denude the world of meaning for the sake of a theory, itself a false meaning which deprives us of all the rest. We can interpret experience only on the level of experience. (R.M. Macliver, Society, p. 530).

This means that an understanding of international relations requires that we reconstruct the values, emotions, and calculations of decision-makers. The only way to explain their behavior is to see the world the way they saw it.

20. To determine whether external or internal variables are a more important source of foreign policy, I measure some national attributes, e.g. size, level of political and economic development, nature of the regime-some characteristics of dyads, e.g. geographical distance, similarity or difference of regimes, similarity or difference of power-and, as the dependent variables, events data on conflict and cooperation for each state and dyad. I propose certain hypotheses about the effect of each relationship and national attribute on the amount of conflict behavior it engages in; the closer the members of a dyad, the greater the conflict. (For the purposes of this exercise, the content of these hypotheses doesn’t matter). When I compare the actual correlations with the predicted ones, I find a much closer match for the propositions involving national attributes than for the ones involving relations. From this I conclude that internal factors are more important causes of foreign policy than are external factors. (There are at least three fallacies here.)

21. To see whether the amount of conflictual behavior that a state initiates is inversely related to the amount of cooperative behavior it initiates, I group countries according to national attributes (e.g. see above). I find that there is a direct relationship. Those kinds of states that initiate a lot of conflict (e.g. large, developed, powerful ones) also initiate a lot of cooperation. I explain this finding by the argument that to maintain even minimal order in the international system (and the data is gathered from a period of relative peace), nations cannot be totally hostile to each other. If peace is to be kept, a nation that initiates a lot of hostile behavior toward another must also initiate a significant amount of cooperative behavior toward it. As the data show, nations do not aim undiluted hostility at each other. (There are at least two fallacies here.)

22.

“The historian need not and cannot (without ceasing to be a historian) emulate the scientist in searching for the causes or laws of events. For science, the event is discovered by perceiving it, and the further search for its cause is conducted by assigning it to its class and determining the relation between that class and others. For history, the object to be discovered is not the mere event, but the thought expressed in it. To discover that thought is already to understand it. After the historian has ascertained the facts, there is no further process of inquiring into their causes. When he knows what happened, he already knows why it happened…

The value of generalization in natural science depends on the fact that the data of physical science are given by perception, and perceiving is not understanding. The raw material of natural science is therefore ‘mere particulars’ observed but not understood, and taken in their perceived particularity, unintelligible. It is therefore a genuine advance in knowledge to discover something intelligible in the relations between general types of them. What they are in themselves, as scientists are never tired of reminding us, remains unknown: but we can at least know something about the patterns of facts into which they enter.

A science which generalizes from historical facts is in a very different position. Here the facts, in order to serve as data, must first be historically known; and historical knowledge is not perception, it is the discerning of the thought which is the inner side of the event. The historian, when he is ready to hand over such a fact to the mental scientist as a.datum for generalization, has already understood it in this way from within. If he has not done so, the fact is being used as a datum for generalization before it has been properly ‘ascertained’. But if he has done so, nothing of value is left for generalization to do.

If, by historical thinking, we already understand how and why Napoleon established his ascendancy in revolutionary France, nothing is added to our understanding of that process by the statement (however true) that similar things have happened elsewhere. It is only when the particular fact cannot by understood by itself that such statements are of value.” (R. C. Collingwood, The Idea of History, p. 214, 2223)

Both historians and political scientists should be able to agree that Collingwood is right. There may be universal laws and generalizations. If they exist, they are to be found through the cumulation of case studies. If we understand several cases, we can see what they have in common and how they differ. But each case must be understood in its own terms, by examining it in detail in its own context. We cannot learn why an outcome occurred in one case, or why an actor behaved as he did in one instance, by looking at other cases. These comparisons come only after we have explained each case. For how could we explain one event or one problem by comparing it to others? This might tell us if the case was unusual or if we could construct a valid generalization, but it could not help explain the case itself. Since the causes of any outcome obviously lie in the preceding events, looking elsewhere is at best a distraction.

Thus, for example, the way to discover the impact of the frontier on American life and politics is to intensively study the American frontier itself-what life was like there, what were only myths, and what patterns were common. Turner’s research was skimpy and his conclusions may be incorrect, but his general approach was surely the proper one. Similarly, it would be foolish to try to explain American foreign policy by looking at the foreign policies of other countries. For example, it is foolish to try to refute the revisionist arguments about American policy after World War II by comparing it to the policies of Russia and of the European states.

Furthermore, it is usually a basic intellectual error to try to find one explanation that can cover several cases. Even when the outcome is the same-e.g., American intervention abroad-the causes often differ from case to case.

23.

“White prejudice and any specifically Negro characteristics account for much less of the difference in employment rates between Negroes and whites than would otherwise appear.” This is shown by the fact that when we look at what can be called “‘the Statistical Negro’-that is, the Negro when all non-racial factors (e.g., education, urban-rural residence, etc.) have been controlled for-is a very different fellow from what will be called the Census Negro. In some respects the Statistical Negro is indistinguishable from the white, and in all respects the differences between him and the white are smaller than those between the Census Negro and the white.” Thus whatever the effects of previous discrimination, current discrimination is relatively unimportant. “If overnight Negroes turned white, most of them would go on living under much the same handicaps for a long time to come.” (Banfield, The Unheavenly City, p. 6973)

Think about this argument, and particularly the claim for the importance of comparing the achievements (e.g. income levels) of the Statistical Negro to that of whites.

24.

“The strongly adverse relation between cigarette smoking and health led to the banning of cigarette advertising on television. Since television advertising of cigarettes was discontinued, sales have not been noticeably affected. With the awareness that the money previously spent on television advertising was seemingly wasted, it is not immediately obvious why the tobacco industry continues to advertise at all. Knowing the intensity of addiction experienced by most smokers, it is probably not necessary to convince them that they should smoke. Indeed, most regular smokers find it very difficult not to smoke and certainly don’t need encouragement to continue. Yet, the tobacco industry continues to advertise heavily.

If the money spent on television advertising was useless, why continue the same practice in the printed media? What is the tobacco industry getting in return for their investment? One return is the promotion of the notion that smoking cigarettes is a matter of user’s choice and not an uncontrollable addiction. A more disquieting possibility is that this investment serves as hush money, softening the telling of how bad the story of smoking versus health really is.”

William Oldendorf, “Cigarette Advertising”, Science, vol. 184, April 10, 1974, p. 112.

25.

“Parolees…do little better in the community [as measured by the recidivism rate] than those who are not paroled [and serve out their full sentences], which suggests that ‘discretionary release’ is really potluck, and those who decide who gets paroled have only the sketchiest idea of who has been ‘rehabilitated’.” (Tom Wicker, “The Lessons of Parole”, New York Times, March 8, 1974, p. 33)

26. 75% of cars that are stolen had been left unlocked. Therefore locking your car will reduce the chance that it will be stolen. (If this gives you trouble, 40 is similar and easier.)

27. In studying the factors that lead a state to conclude that others are a potential threat to it, I examine a number of cases where states have come to see others as a menace. In almost all these cases I find that the state seen as threatening had broken a “rule of the international relations game”. From this I conclude that if one state breaks a “rule of the international relations game”, others will see it as a threat. (Do not worry about the vagueness of the idea of “rules of the game”.)

28. I find that harsh peace treaties are usually followed by long periods of peace whereas soft treaties (i.e. those in which the winner does not take a great deal from the loser) usually lead to new wars quite quickly. I therefore conclude that I can tell decision-makers of countries that win a war: “The best chance of ensuring that the peace will last is to be very tough and force the other side to accept harsh terms.”

29. In disputing the argument that the Soviet Union consolidated its hold over East Europe not because she sought to expand as far as possible, but because she wanted to guarantee her own security against Western attack, one scholar points out that at the same time Russia was also encouraging secessionist movements in China, moves that cannot be explained by the desire for security. Is this line of argument legitimate?

30. Examining a random sample of wars, I find that the side that initiates the fighting (assume that we have solved the obvious empirical problem this involves) usually loses the war. From this I infer that it is usually politically and/or militarily disadvantageous to strike the first military blow.

31. In order to investigate the causes of Soviet armed intervention in East Europe, I look at the relevant cases: East Germany in 1953, Hungary in 1956, Czechoslovakia in 1968, and perhaps Poland in 1981, and find that each time the local Communist Party was losing control of the situation. I therefore conclude that the Soviets were very likely to intervene whenever their client parties are unable to keep the situation in hand.

32. In order to determine the proportion of cases in which a state is able to achieve military surprise, I look at a random sample of cases of the initiation of war. I find that surprise occurs in almost all of them. From this I conclude that most cases of attempted surprise succeed.

33. Studying the causes of all or most of the wars in international politics is fairly foolish: what we are most concerned with are wars which have very great consequences. Therefore we should mostly-if not only-study the causes of great wars such as World War I or World War II. (There are several problems here.)

34. In order to test the proposition that changes in the power relations among the leading states (what are often called “power transitions”) is an important cause of major wars, one should look at the major wars that have occurred and see whether they were preceded by the posited power changes.

35. In order to reduce the amount of time I am likely to have to spend on the phone waiting to speak to someone who can make my airplane reservations, I plan to place my call at a time of day that few others are likely to be calling. (There are at least two problems here.)

36. The USSR was able to gain many more spies in the West than the latter was able to place in the USSR. The explanation must be either that Communism had greater ideological appeal in the West than capitalism and Western democracy did in the USSR or that the Soviets were willing or able to pay a great deal more for secrets than the West was.

37. According to the New York Times (June 4, 1997), a study conducted by the United Negro College Fund found that “contrary to the widespread belief that black students are a dominant presence in urban public schools, less than one-third of black public school students attend schools in large cities.” (There is no assertion of causality here, but what is wrong with this sentence as a descriptive statement?)

38. In trying to support the claim that the US sponsored the coup in Iran in 1953 because of anti-Communism, not because of the desire to gain a share of the oil fields, one scholar notes that

The Cold War was at its height in the early 1950’s and the Soviet Union was viewed as an expansionist power seeking world domination. Eisenhower had made the Soviet threat a key issue in the 1952 elections, accusing the Democrats of being soft on communism and of having “lost China.” Once in power, the new administration quickly sought to put its views into practice: the State Department was purged of suspected communists, steps were taken to strengthen the Western alliance, and initiatives were begun to bolster the Western position in Latin America, the Middle East, and East Asia. Viewed in this context, and coming as it did only two weeks after Eisenhower’s inauguration, the decision to overthrow Mossadeq appears merely as one more step in the global effort of the Eisenhower administration to block Soviet expansionism. (Mark Gasiorowski, “The 1953 Coup D’Etat in Iran,” International Journal of Middle East Studies, vol. 19, September 1987, p. 275)

Do you find this way of reasoning legitimate and persuasive?

39. To determine the causes of wars, I look at a random sample of wars, examining in detail the domestic, bureaucratic, and international factors that seem to be involved and from these results build a general theory about the relative importance of these influences.

40. Since most automobile accidents occur in trips of 5 miles or less, I should substitute long drives for short ones whenever possible.

41. In his famous 1954 Foreign Affairs article enunciating the massive retaliation doctrine, John Foster Dulles said that “a potential aggressor must not be left in any doubt that he would be certain to suffer damage outweighing any possible gains from aggression.” Why is this neither necessary nor sufficient for deterrence?

42. To test the argument that the main sources of US weapons procurement policy lie in the outlooks and preferences of the armed services, I look at the weapons the US has bought over a period of years and see if they correspond to the services’ desires.

43. In “Toughen the Will and You Toughen the Mind,” Andrew Revkin reports (New York Times, July 21, 1997) on the effect of an Outward Bound program for inner-city teenagers:

87 percent…who participated in the…program either had graduated from school or were still attending, compared to an overall graduation rate of less that 40 percent at the school. Half the participants have gone to college…. Reading scores rise more than half a grade, and math scores even more.

The…teenagers were recruited…from ninth and tenth graders who scored in the bottom third of their class on literacy tests. More than two dozen were invited to try a three-day hike in the Catskills in May, but only 12 took up the offer. Now nine remain.

What inferences can one draw about the influence of this program on various categories of teenagers?

44.

Policy-maker: “If you scholars are good for anything, you should be able to tell me what policy instruments are likely to work under what circumstances. Can you? I need to know in order to guide me in what I should do in the future.”

Eager scholar: “Yes, sir. I will examine the outcomes of a random sample of cases in which the US used economic pressure and compare the results with those that occurred in a random sample of cases in which the US used force.”

Would this meet the policy-maker’s requirements? What inferences could be drawn from this study? How would you design a better one?

45. A graduate program that believes it has greatly improved its quality over the past 5 years is shocked to find that yield (the percentage of those accepted into the program that actually enroll) has declined, not increased. Does this show that the program’s reputation is lower than it was before? Would the inference be different if the yield at peer institutions had increased? declined? remained steady?

46. If I have a serious heart disease and want the best treatment, I should select the hospital that is the best as measured by the available statistics showing its rate of success in dealing with this disease.

47. Many HMOs offer to pay for health club memberships for those who join. The reason is that they want to encourage people to exercise and so stay healthy. (This is tricky. The statement may be correct, but what other–perhaps stronger–reason would there be for HMOs to make this offer?)

48. About 30 years ago, Brown University radically changed its curriculum by drastically reducing its requirements. Since then, its graduates have achieved much greater success after they graduate (assume the validity of the measures employed). This shows that the students learned much more from the new curriculum than from the old one. (There are at least two fallacies here.)

49. Everyone tells me that Professor Nit is a hard grader whose class is very challenging and Professor Wit, who teaches the same course, is an easy grader. But through a friend at the Registrar’s office I have seen their grade sheets and the distribution of grades is the same. So the rumors must be incorrect.

50. “Smoking increases your chances of lung cancer by 900%.” That is all you have to know to conclude that you shouldn’t smoke.

51. “65% of the deaths in accidents involving SUVs are due to rollovers, whereas only 22% of the deaths in car accidents come from this cause.” (NBC Nightly News, 9/20/00.) From this we can infer that SUVs are much more prone to rollovers than are cars.

52.

The dozen states that have chosen not to enact the death penalty since the Supreme Court ruled in 1976 that it was constitutionally permissible have not had higher homicide rates than states with the death penalty, government statistics and a new survey by the New York Times show.

Indeed, 10 of the 12 states without capital punishment have homicide rates below the national average, Federal Bureau of Investigation data shows, while half the states with the death penalty have homicide rates above the national average. In a state-by-state analysis, The Times found that during the last twenty years, the homicide rate in states with the death penalty has been 48 percent to 101 percent higher than in states without the death penalty.

The study by The Times also found that homicide rates had risen and fallen along roughly symmetrical paths in the states with and without the death penalty, suggesting to many experts that the threat of the death penalty rarely deters criminals. (Raymond Bonner and Ford Fessenden, “States with no Death Penalty Share Lower Homicide Rates,” New York Times, Sept. 22, 2000)

Why does this not show that the death penalty fails to deter?

53. SUVs have a rollover rate (calculated as rollovers per 100,000 miles traveled) 3 times the rate of cars. From this we can infer that they must be less safe than cars. (There is one obvious fallacy here; once you have found it, look for 2 other deeper fallacies.)

54. In the wake of the Firestone/Ford tragedy, Congressional committees and newspapers will try to explain what happened and cast blame by examining the internal documents in the companies about this case. What the problem with proceeding in this way?

55. Public opinion polls revealed that most people oppose the impeachment of President Clinton. The behavior of the members of Congress who strongly pushed for impeachment therefore shows the weakness if not inaccuracy of the claim that politicians seek to maximize their chances of re-election.

56. Scholar A:

“Realism should predict that the strongest state will prevail in a crisis, and, for the Cold War, the only real dispute is over whether we should expect the conventional or the nuclear balance to be most important.”

Scholar B:

“No, Realism predicts that as long as the situation approximates the game of Chicken, the state with the stronger reputation for resolve or with the greater stake in the issue should prevail.”

What is wrong with both these claims?

57. “According to rational choice theory, a state will fight if the expected utility for going to war is greater than the utility of the status quo.” Why is this statement incorrect?

58. Gore won the popular vote in the 2000 Presidential election. It follows that he would have been elected President had there been a previous change in the Constitutional eliminating the Electoral College and replacing it with a popular vote.

59.

“Most students involved in school shootings discussed their plans beforehand and did things that could have telegraphed the attacks, two Secret Service agents said.” (Judith Cohler, Associated Press story, July 18, 2001)

From this we can infer that wise public policy would be to act on these warning signs, immediately calling in for questioning students who display them.

60. It is striking how often borders between states of very unequal power are quite peaceful (e.g. US-Canada, France-Belgium), while conflict is more common when the neighbors are of roughly equal power. From this I can infer that rough equality of power is more conducive to conflict than is a very unequal distribution.

61. “The purpose of this book is to measure the capabilities of democracies in the realm of foreign policy by looking at the politics and institutions of two of the oldest and most prominent of democratic states.” (Kenneth Waltz, Foreign Policy and Democratic States, p. 1.) In fact, this does not describe what the book does, which is to compare the foreign policy capabilities of Great Britain and the US. But if the sentence did give the book’s purpose, it would fall into 2 methodological traps.

62. The proper way to conduct a post-mortem on why the US was taken by surprise by the terrorist attacks of September 11 is to go back over the information that was or should have been available to the CIA and FBI and ask whether this was sufficient to have enabled a reasonable person or organization to have inferred that this attack was quite likely.

63a. “We were debating whether to go to war with a particular country and I thought I had won the argument when I was able to convince my boss that the chances of victory were clearly greater than 50 percent.”

63b. “Being wiser than I was in the previous case, I was sure I had won the argument when I showed my boss that, taking everything into account, the expected utility of starting the war was greater than the value of the status quo.” Why might this not be a winning argument?

64c. “OK, this time I’m sure I’ve got it right. In this case, I was able to show that the expected utility of fighting was less than the value of the situation as it is today. I was sure that this would mean that no serious person could argue for fighting. But I was wrong yet again.” Why?

65. Most international agreements are complied with. This disproves the common argument that difficulties in ensuring compliance explain why cooperation is difficult to develop and sustain in international politics.

66. To study the effects of whether a mother is employed outside the house on a child’s achievements and adjustment (assume that I can measure these), I need not only to look for the overall correlation, but to use control variables in order to establish causation. Most importantly, I want to see if any relationship I find remains after I hold constant the income of the mother and the family.

67. “Every known human carcinogen causes cancer in animals.” It follows that we should test all chemicals on animals for their carcinogenity and refuse to release any that fail the test. (Mount Sinai Center for Children’s Heath and the Environment, “She’s the test subject for thousands of toxic chemicals. Why?” New York Times, August 15, 2002.) (There are at least two problems here.)

68. The fact that the US was able to keep the USSR out of West Europe without a war shows the efficacy of the policy of deterrence.

69. Federal states like the US and the former Yugoslavia are more likely to have civil wars or dissolve than are unitary ones. The obvious lesson to those who are writing constitutions is to avoid a federal system.

70. High school dropouts on average earn $9,000 dollars less than those who complete high school. It therefore should be a major objective of public policy to decrease the number of dropouts. (There are two fallacies here).

71. There is a strong correlation between the extent to which a state is democratic and the extent to which it respects human rights. I infer that to protect the latter I should facilitate the former.

72. “Under Mayors Giuliani and Bloomberg crime in New York has significantly decreased. The obvious reason is the policing tactics they have adopted.” What are the 2 obvious sources of information that you could tap to judge the plausibility of this argument about causation?

73. I look at recent cases of attempted and successful revolution and find that most instances in which there was little if any violence were successful and that, by contrast, most cases in which there was significant bloodshed ended with the regime staying in power. From this I infer that rebels should use peaceful protest only.

74. To study whether some hospitals spend too much on desperately ill patients, I look at a sample of cases in which people died and see how much was spent on their care in the last two years of their lives. I find that the level of spending among excellent facilities varies by a factor of two. “We are comparing patients with identical outcomes—all were dead in two years–so it’s unlikely that differences in severity of illness account for the [spending] variations we saw.” (Robert Pear, “Researchers Find Huge Variations in End-of-Life Treatment,” New York Times, April 7, 2008.) From this why can I not infer that the spending level in the more expensive hospitals was excessive?

75.

“Almost half of those arrested for plotting or carrying out attacks against the U.S. had prior criminal records, mostly for small-time offenses, a study for New York State investigators found. Such interactions with local law enforcement represented possible opportunities to ‘detect and deter an attack,’ the study said” (Sean Gardiner, “Early Chances Often Missed In Terror Cases,” Wall Street Journal, January 3, 2011).

What are the problems here? (The rest of the article does not point them out, showing yet again the embarrassment of journalism.)

76. Your doctor tells you “Take this medicine and it will cut in half the chance that you will get a particular kind of cancer even though it has somewhat unpleasant, although not dangerous, side-effects.” His statistic is correct, but it is not the one you want. What is?

77. During the summer of 2012, many analysts and American officials said things like: “In response to continued Iranian provocations, we’re instituting new sanctions. As they take hold and the pain inflicted on Iran increases, Western bargaining leverage will increase.” Assuming that the sanctions indeed are causing pain, that the population blames the government, and that the government cares, why does the conclusion not follow? (Note that that conclusion is not that Iran will give in, but just that Western leverage will increase as the sanctions take hold.)

78. To help people live through avalanches, I interview survivors about the techniques they used (e.g., staying clam, moving slowly, being guided by any light they see). I then print (and sell) a pamphlet detailing these methods to increase the chance that anyone caught can survive. Perhaps I shouldn’t. (There is both a fallacy and a problem here.)

Good stuff. Lots more interesting than the usual medical examples.

Drew Bailey on backward causal questions and forward causal inference

Following up on my paper with Guido on backward causal questions and forward causal inference, education researcher Drew Bailey writes:

(1) Some disagreements between social scientists or between social scientists and the public arise when one side is in “forward causal inference” mode and the other side is in “backward causal question” mode;

(2) Individuals or entire subfields can develop blind spots when they spend too much of their time in one of these modes and not enough in the other; and

(3) Students should get practice flexibly switching back and forth between these two modes.

His longer writeup of these ideas is here.

What’s the best novel ever written by an 85-year-old?

I recently read A Legacy of Spies by John Le Carré. It was pretty good. Which is impressive given that the author wrote it when he was 85! OK, I’m not saying it was as good as Tinker Tailor Soldier Spy, but I still liked it. It was done well, and if it featured some of Le Carré’s more annoying tics, it also featured some excellent examples of his interweaving of thought and action, which I’d call “cinematic” except that in some ways it’s the opposite of cinematic in that so much is happening inside the character’s head.

Anyway, here’s my real question. What’s the best novel every written by an 85-year-old? Old authors can write excellent essays—they’re practiced in putting words together, and writing an essay is like noodling around on the piano for an experienced musician: they know how to structure their ideas and make them go down smoothly. But a novel, that’s another story. Updike’s novels were disintegrating for decades even while he kept up the quality of his stories and essays—and he didn’t even reach 80. Who else is or was still writing solid, readable novels at 85?

Summer research jobs at Flatiron Institute

If you’re an undergrad or grad student and work in applied math, stats, or machine learning, you may be interested in our summer research assistant and associate positions at the Flatiron Institute’s Center for Computational Mathematics:

There is no deadline, but we’ll start reviewing applications and making offers in early March and we only have a limited number of positions.

If you’re applying because you’d like to work on Stan, please mail me directly at bcarpenter@flatironinstitute.org and let me know.

Edit: I should clarify that we have capacity at CCM for quite a few summer researchers in applied math, stats, and ML. I’m particularly interested in hearing from people who are interested in working on the core Stan language, Stan math library, or on challenging applied projects. Here’s a rundown of my current projects.

Here is how you should title the next book you write.

I was talking with someone about book titles. I liked the title Red State Blue State Rich State Poor State when I came up with it, but the book did not sell as well as I hoped (not that I thought it would sell enough to make me lots of money; I’m just using sales here as a proxy for influence). The trouble was that this title was a way to signal that the book would be a fun read—but, let’s face it, for most people a book full of graphs is not so fun. The appeal of the book was that it had lots of analyses that had never been done before, along with some that were not new but helped us understand what was going on.

Look at Bill James. He called his book the Baseball Abstract. Can’t get much more boring than that. But people wanted to read it because it had the facts. We would’ve been better off calling it the Voting Abstract or Crunching the Election Numbers or something like that.

A general principle

So this got me thinking about a general principle for titling your books.

Bad title: This Book Will Be Fun to Read.

Good title: This is the Book Your Competitor Has Already Read.

The idea is that people should read your book because otherwise they’re missing out.

Regression and Other Stories as a counterexample?

But is Regression and Other Stories a counterexample to the above principle? It’s on a technical topic but has a fun title and it’s been successful, or so I think. The difference, I think, is that the title of Regression and Other Stories is just not so important. We could’ve called it Regression, or Applied Regression, or Applied Regression from a Computational Perspective, or all sorts of other options, and I think it would be selling just as well. Maybe even better, who knows. Knowing the authors of the book gives enough of a sense of the content that people will buy it (or not buy it) for the right reasons. Red State Blue State was a different story because at the time we (the book authors) were more of an unknown, and we were trying to reach new groups of people, so the signal sent by the title was more important.

“I looked for questions on the polio vaccine and saw one in 1954 that asked if you wanted to get it—60% said yes and 31% no.”

Apparently there are surveys all over the world saying that large minorities of people don’t want to take the coronavirus vaccine. If it was just the U.S. we could explain this as partisanship, but it’s happening in other countries too. This seems like a new thing, no? When there was talk of the anti-vax movement a few years ago, I recall it being something like 10% or fewer parents not vaccinating their kids. One difference is that vaccinating your kids is the default (no vaccine, no school), whereas the coronavirus vaccine is an option. Defaults matter. But is there more to it than this?

I asked some public-opinion experts and this is what they told me:

Bob Shapiro said:

I did a quick iPOLL search. Regarding the flu non-flu vaccine takers is more the norm when it comes to flu shots and we saw it with the swine flu. I have not looked at any of this by partisanship. The current irony is that Trump can claim credit on the vaccine front but his supporters may not be more inclined to take it. It is related to trust, etc. Also, some people have an aversion to injections.

And he pointed to this recent research article, Policy Views and Negative Beliefs About Vaccines in the United States, by Dominik Stecula, Ozan Kuru, Dolores Albarracin, and Kathleen Hall Jamieson.

David Weakliem wrote:

I looked for questions on the polio vaccine and saw one in 1954 that asked if you wanted to get it—60% said yes and 31% no. There was also one on whether you’d like your children to get it and 75% said yes and 17% no (of the people who had children). That was more “no” answers than I expected. Maybe a significant number of people are always reluctant to try something new, or at least to be among the first to try it. I think there has been a change in the media—they used to be more deferential to the authorities and less inclined to report news that might promote doubts. Today they are more willing to report on anti-vaccine sentiments and on potential problems or limitations of the vaccines. I’m not an expert in the media, but that’s my impression.

To which Bob added:

David makes a very good point on the media. Market forces lead the media to emphasize conflict wherever they can find it. Polio and the 1947(?) smallpox scare in NYC may have been the high point of positivity toward vaccines. Also the big push on polio circa 1960 or so involved two doses of sugar cubes and a mass campaign for everyone to take them.

Bob also says that this is a good research topic, and I agree.

P.S. More polling data here from Civiqs.

Meg Wolitzer and George V. Higgins

Regular readers of this blog will know that I’m a Meg Wolitzer fan (see here and here). During the past year or so I’ve been working my way through her earlier books, and I just finished Surrender, Dorothy, which was a quick and fun and thought-provoking read, maybe not quite as polished as some of her more recent books but who cares about that, really.

Coming to the last page of this short book, it struck me that Wolitzer is a lot like George V. Higgins, a long-time favorite of mine (see here, here, and here). They each have a strong style, also each of them writes about an insular group of people, but they’re interested in how these people link up to the rest of the world. I guess that describes a lot of novels; still, I see a similarity here. The specifics of their styles are different, though: Wolitzer tells us her characters’ thoughts, while Higgins mostly portrays his characters though dialogue and some action.

One thing that Wolitzer and Higgins have in common is that they take sides. Not against each other—they write in different genres and don’t seem to be talking to each other, as it were (I googled *”George V Higgins” “Meg Wolitzer”* and all I could find was this column from political columnist George Will, which mentions these two authors briefly, but not with any connection to each other)—but, rather, they take sides in their own fiction. With both Wolitzer and Higgins, you get a sense that the author likes some of the characters and dislikes others. Some authors are more Olympian; others have a rigorous single-viewpoint narrative; but Wolitzer and Higgins are a bit less disciplined, or so it seems to me, in that they jump between perspectives but with a kind of implicit narrator who is taking sides in the action. I don’t mind this—I actually find it kind of charming, that the author cares enough about his or her characters in this way.

Both authors also have a good skill of managing expectations. Much of storytelling involves expectations and surprise: building suspense, defusing non-suspense, and so on. Recall that saying that the best music is both expected and surprising in every measure. So, if you’re writing a novel and you introduce a character who seems like a bad person, you have to be aware that your reader is trying to figure it out: is this truly a bad person who just reveals badness right away, is this a good person who is misunderstood, will there be character development, etc. Some of this can be managed using multiple perspectives. Anyway, I think that both Wolitzer and Higgins are good at this, in different ways.

Writing this post, I also thought of another similarity between the two authors, which is that I don’t think either is particularly good at physical descriptions of people. Surrender, Dorothy had one vivid description of a fat man (“squat and friendly and seemed to be waiting for his first heart attack to happen”) and his thin wife (“built like a praying mantis and draped in jewels”), but that’s as much of a caricature as a physical description. The main characters in Surrender, Dorothy, as in other Wolitzer books, are often described as pretty or plain or attractive or unattractive or handsome, but not much more than that. The description is vivid and it does the job of distinguishing the people; it’s just not usually visually specific. For example, one character is described as having “a long, studious face . . . He had been an awkward adolescent . . . ears were perpetually red-hot, like someone who seems to have just come back from the barbership, and he was a jiggler; a crossed leg often went flapping like a wing . . .” A minor character is described as “a pudding-faced woman . . . who had a head of hair that looked as though she cut it herself while blindfolded.” So it’s not like Wolitzer can’t do vivid descriptions; it’s just that she only does it once in awhile, and, when she does it, it’s usually more conceptual than straight physical description. The result is that I can’t quite visualize what her characters look like, and this can be a problem because sometimes the plot is driven by characters being attractive or appealing, or unattractive or unappealing. I say this not to complain about Wolitzer—I’m a big fan of her books and I look forward to reading more of them—it’s just interesting after reading a book to think about its style.

P.S. This is completely unrelated, but since this post is off-topic anyway, here’s something funny I came across from 2012: Rick Santorum quotes as New Yorker cartoons. Yeah, I know, shooting fish in a barrel. But, what can I say, they’re funny. I guess this dates me as someone who’s so old that he’s heard of santorum.

The Mets are hiring

Des McGowan writes:

We are looking to hire multiple full time analysts/senior analysts to join the Baseball Analytics department at the New York Mets. The roles will involve building, testing, and presenting statistical models that inform decision-making in all facets of Baseball Operations. These positions require a strong background in complex statistics and data analytics, as well as the ability to communicate statistical model details and findings to both a technical and non-technical audience. Prior experience in or knowledge of baseball is not required.

Interested applicants should apply at this link and are welcome to reach out to me (dmcgowan@nymets.com) if they have any questions about the role.

Modeling, data analysis, computation, decision making, communication . . . all the good things.

If they offer you a job, my advice is to try to negotiate something like the contract they gave to Bobby Bonilla.

Multivariate missing data software update

Ranjit Lall writes:

In 2018 you posted about some machine learning-based multiple imputation software I was developing that works particularly well with large and complex datasets. The software is now available as a package in both Python (MIDASpy) and R (rMIDAS), and a paper describing the underlying method was just published online in Political Analysis (gated and ungated).

I’m trying to generate some interest in the software and get a few more people to try it out, and I was wondering whether you might be willing to link to the paper (or to our GitHub page). I think the software would be of real interest to a lot of your followers (I’ve incorporated many of the features they requested following your blog post).

Stanford prison experiment

Mark Palko points us to a review by Alison Abbott of a book by Susannah Cahalan telling a disturbing story of a psychology professor at a prestigious university who had stunning academic and popular success based on research that he seems to have incorrectly and misleadingly reported.

Disturbing—but not surprising, given we now have a template with many examples of “psychology professor at a prestigious university who had stunning academic and popular success based on research that he had incorrectly and misleadingly reported.”

#NotAllPsychologyProfessors

Science reform can get so personal

This is Jessica. Lately I’ve been thinking a lot about philosophy of science, motivated by both a longtime interest in methodological reform in the social sciences and a more recent interest in proposed ethics problems and reforms in computer science. The observation I want to share is not intended to support any particular stance, but just to note how personal these topics can be and how what at first glance seem like trivial decisions can bring up questions about who you think you are as a scientist and how you think empirical research works. 

I’ll take a couple examples related to methods reform. One which is related to Andrew’s post the other day about how statisticians choose their methods (sometimes by convention or convenience) is doing Bayesian statistical analysis. Some of the research I do involves running controlled experiments, and I’ve always gravitated toward Bayesian philosophy despite being taught statistics by Frequentists. So shortly after becoming faculty I made a more concerted effort to use it. Since then in my lab we tend to default to it (and it helps that my close collaborator Matt Kay has done a lot of it than me so we can look to him for advice when needed). 

But, sometimes we’re analyzing some data from an experiment that uses a relatively simple, say between subjects design, where we don’t really have useful prior information going into it. So specifying a Bayesian model seems nice for reasons like interval interpretation but otherwise not very consequential.  We end up with Bayesian models that essentially produce the same thing you’d get with the Frequentist version. Which is fine, of course, but in cases like this it strikes me as maybe more honest to use Frequentist stats, since that’s better understood in my field. That way we’re not running the risk of implying there’s some big added value of being Bayesian in this case, to others or even to ourselves.  

I guess the premise I’m assuming is that if you don’t expect everyone to bother thinking about whether your model choice was well motivated or not, but you do expect people to pay attention to who is using what methods, then you may be signaling things through your choices that influence how other people think about what good science means. I dislike this signaling or heuristic aspect, because it’s counter to properties I associate more strongly with good science, like being honest about why one is doing something and being skeptical of any method presented as a panacea. 

Another example is preregistration. For the last five or so years I’ve been preregistering most of the experiments I do. I think I even had the first preregistered experiment ever at some visualization venues. I never really questioned doing this too much, since preregistration is associated with transparency and I think transparency is good, especially if it prevents authors from exploiting degrees of freedom in an analysis until they get the result they want. Also, most of my papers have lead authors who are Ph.D. students, as is common in computer science, and preregistration has been very useful as a forcing function for them and me for making sure we think about and agree ahead of time about the modeling approach and exactly what comparisons we want to make. 

But sometimes I find myself thinking about what sorts of values preregistration implies, and feel a bit conflicted about it, again because of the conflict between what might get signaled and my own values when it comes to science.

For example, we use a pattern on these projects where we design an experiment, collect pilot data, then use what we see to simulate fake data to figure out how much we should collect to learn from the comparisons we do and the models we think we’ll use. Preregistration is easy in that it simply involves writing down everything we’ve planned. And of course we can deviate from it if necessary. 

If we didn’t do it, we might delude ourselves into thinking we are being honest and actually tweaking things in our favor. So there’s an implied value about having a forcing function to keep us honest, as well as transparency being correlated with good science. 

But post-tenure especially I find myself increasingly distrustful of experimental work in my field. I would like to think that these days, I have no reasons not to be honest and I can trust that my judgment on whether a result is valid is not deluding me. So then when I preregister, I feel like I’m admitting to myself I need external forces to keep me from being devious and I can’t make ethical decisions without needing to be policed. There’s something unpleasant about signaling that to oneself, whether or not it’s true, or believing it about human nature in general.

I could instead tell myself that I can make decisions without preregistration, and undoubtedly many others can too, but I’m doing it to signal to others that preregistration is important because I believe there is an overall benefit for the field if we do it. Treating it as signalling seems realistic given that preregistering doesn’t actually hold you to anything, it’s a gesture toward transparency. 

But if preregistration is about signaling the value of transparency, then maybe I should consider other things that it can signal. For instance, related to the premise above, I’m pretty certain that some people in my field who haven’t followed methods reform closely but recognize certain terms see the word preregistration or Bayesian model and think, ‘oh that’s a good sign’ when reviewing or deciding who to pay attention to. Which is probably smart in the sense that preregistering and using Bayesian models may be correlated with paying closer attention to possible threats to the validity of empirical claims you are presenting. But this is just another heuristic, when heuristics have in many ways been part of what’s led us so astray in empirical science.

On a personal level, I think skepticism about any easy solution to fixing science is part of the solution, and I can’t help but care about how I am or am not contributing to better science. So I wish there was a way to signal that I choose methods because I think they can help, not that I think they are necessary in any way. Maybe if what I value is being honest and transparency about where I stand with science reform, I should also be honest about why I’m preregistering or using Bayesian stats. I could say in the paper, “We preregistered to impart a sense of honesty.” Or, “We preregistered because while we can’t say much about how important it is for good science, we think it is useful to get more people taking transparency seriously.” When we use Bayesian methods but don’t think they add anything special, we can say that or report how close the results are to estimates from the equivalent non-Bayesian model. Nobody really seems to be doing much of this, but maybe it’s not such a crazy idea. It requires considering your stance on the methods you use, which I like, and folds some reflection on science reform into papers that aren’t really about that. 

None of this is meant to bash preregistration or Bayesian stats of course. I’ve learned a lot by doing both and have undoubtedly improved my process. My bigger point is that science reform is complex and can provoke personal reflection on values. I think this is a good thing, even if it can seem hard sometimes to be thoughtful and honest about how few answers we have within the usual constraints.

“Men Appear Twice as Often as Women in News Photos on Facebook”

Onyi Lam, Stefan Wojcik, Adam Hughes, and Brian Broderick write:

A new study of the images accompanying news stories posted publicly on Facebook by prominent American news media outlets finds that men appear twice as often as women do in news images, with a majority of photos showing exclusively men. . . .

Researchers chose to study news images on Facebook because the site standardizes the presentation of news images and text across outlets. News posts that appear in social media feeds like Facebook feature large photographs and contain only a small amount of text and a link to a longer article. In contrast to other formats such as print media, the photograph in a Facebook post occupies more screen space than the accompanying text and is the main object that Facebook users see when they scroll through the news feed. Academic research based on data collected between July 2014 and January 2015 finds that Facebook users only clicked on about 7% of national news, politics and world affairs posts that they viewed in their news feeds. Previous research from the Center has examined how representations of men and women in Google Image Search results can sometimes be at odds with real world data. And academic researchers have leveraged similar tools to study the depiction of women in the news. . . .

The 17 media organizations included in the study were selected according to several criteria. These included: whether they conduct original reporting on general topics, whether they primarily covered national news rather than local news, whether the site was for a news organization based in the U.S., and whether their websites received at least 20 million unique visitors in the third quarter of 2018, according to data from Comscore. The study does not include media outlets that focus their coverage on one topic, such as business, politics, entertainment or sports. The study also excludes local media outlets. The full list of sites included in the study appears in the Methodology. . . .

They also break things down by topic:

I’m surprised that 17% of sports stories were exclusively women. I’d’ve expected this would be lower. I didn’t realize there was this much coverage of women’s sports.

Lots more at the link.

I wonder what Alison Bechdel would say here? I guess just that she’s not surprised.

Where can you find the best CBD products? CBD gummies made with vegan ingredients and CBD oils that are lab tested and 100% organic? Click here.