Important statistical theory research project! Perfect for the stat grad students (or ambitious undergrads) out there.

Hey kids! Time to think about writing that statistics Ph.D. thesis.

It would be great to write something on a cool applied project, but: (a) you might not be connected to a cool applied project, and you typically can’t do these on your own, you need collaborators who know what they’re doing and who care about getting the right answer; and (b) you’re in your doctoral program learning all this theory, so now’s the time to really learn that theory, by using it!

So here we are at Statistical Modeling, Causal Inference, and Social Science to help you out. Yes, that’s right, we have a thesis topic for you!

The basic idea is here, a post that was written several months ago but just happened to appear this morning. Here’s what’s going on: In various areas of the human sciences, it’s been popular to hypothesize, or apparently experimentally prove, that all sorts of seemingly trivial interventions can have large effects. You’ve all heard of the notorious claim, unsupported by data, “That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications,” but that’s just one of many many examples. We’ve also been told that whether a hurricane has a boy or a girl name has huge effects on evacuation behavior; we’ve been told that male college students with fat or thin arms have different attitudes toward economic redistribution, with that difference depending crucially on the socioeconomic status of their parents; we’ve been told that women’s voting behavior varies by a huge amount based on the time of the month, with that difference depending crucially on their relationship status; we’ve been told that political and social attitudes and behavior can be shifted in consistent ways by shark attacks and college football games and subliminal smiley faces and chance encounters with strangers on the street and, ummm, being “exposure to an incidental black and white visual contrast.” You get the idea.

But that’s just silly science, it’s not a Ph.D. thesis topic in statistical theory—yet.

Here’s where the theory comes in. I’ve written about the piranha problem, that these large and consistent effects can’t all, or even mostly, be happening. The problem is that they would interfere with each other: On one hand, you can’t have dozens of large and consistent main effects or else it would be possible to push people’s opinions and behavior to ridiculously implausible lengths just by applying several stimuli in sequence (for example, football game plus shark attack plus fat arms plus an encounter on the street). On the other hand, once you allow these effects to have interactions, it becomes less possible for them to be detected in any generalizable way in an experiment. (For example, the names of the hurricanes could be correlated with recent football games, shark attacks, etc.)

We had some discussion of this idea in the comment thread (that’s where I got off the quip, “Yes, in the linked article, Dijksterhuis writes, ‘The idea that merely being exposed to something that may then exert some kind of influence is not nearly as mystifying now as it was twenty years ago.’ But the thing he doesn’t seem to realize is that, as Euclid might put it, there are an infinite number of primes…”, and what I’m thinking would really make the point clear would be to demonstrate it theoretically, using some sort of probability model (or, more generally, mathematical model) of effects and interactions.

A proof of the piranha principle, as it were. Some sort of asymptotic result as the number of potential effects increases. I really like this idea: it makes sense, it seems amenable to theoretical study, it could be modeled in various different ways, it’s important for science and engineering (you’ll have the same issue when considering A/B tests for hundreds of potential interventions), and it’s not trivial, mathematically or statistically.

As always, I recommend starting with fake-data simulation to get an idea of what’s going on, then move to some theory.

P.S. You might think: Hey, I’m reading this, but hundreds of other statistics Ph.D. students are reading this at the same time. What if all of them work on this one project? Then do I need to worry about getting “scooped”? The answer is, No, you don’t need to worry! First, hundreds of Ph.D. students might read this post, but only a few will pick this topic. Second, there’s a lot to do here! My first pass above is based on the normal distribution, but you could consider other distributions, also look not just at the distribution of underlying parameter values but at the distribution of estimates, you could embed the whole problem in a time series structure, you could look at varying treatment effects, there’s the whole issue of how to model interactions, there’s an entirely different approach based on hard bounds, all sorts of directions to go. And that’s not meant to intimidate you. No need to go in all these directions at once; rather, any of these directions will give you a great thesis project. And it will be different from everyone else’s on the topic. So get going, already! This stuff’s important, and we can use your analytical skills.

29 thoughts on “Important statistical theory research project! Perfect for the stat grad students (or ambitious undergrads) out there.

  1. Consider the following data generating process:

    Treatments A1,…,Am ~ Bernoulli(.5) iid
    Unobserved U1,…,Um ~ Uniform(0,1) iid
    Y ~ Bernoulli(.5)
    If Ui < p, Y=1, and Ai=1, then the counterfactual Y(Ai=0) = 0
    If Ui < p, Y=0, and Ai=0, then the counterfactual Y(Ai=1) = 1

    Under this data generating process, Ai can have quite a large effect on Y for each i (as p can be arbitrarily close to 1). Further, each effect could be accurately estimated by a randomized experiment that manipulated the corresponding treatment variable alone. (These effects would not be detectable from observational data, as Y is independent of all the A variables.)

    This data generating process seems to be a counterexample to your impossibility conjecture. I realize that this is not a realistic representation of how data are generated in the world, but I think this example means that your conjecture needs to be modified to condition on certain properties of the data generating process.

    • Z,

      I don’t know what “p” is in your model. But, speaking generally, yes, some bounds would be needed. This would not preclude a theoretical result. The idea would be that, for any fixed bounds on how extreme the distribution of predicted probabilities could be, in the limit as the number of possible causal factors increases, eventually there will be a problem. The only way to not have a problem would be for the effect sizes to diminish fast enough as the number of possible effects increases.

  2. (I suppose in the example above you might argue that an experiment couldn’t actually reliably detect the effect of Ai because it would be too difficult to intervene on Ai without intervening on the other variables? For this part of your argument I’m not sure I see how theory comes into play as it’s more a substantive assertion about the limits of experimental precision. Maybe the theory could quantify the lack of generalizability as a function of the degree of experimental imprecision?)

  3. Just to be clear, by the “Piranha Principle” you mean the claim that there cannot be many large and predictable effects on human behavior? (I get that formulation from the other post you link to.) I agree this is an important observation, but — as you also suggest — it seems to me to generalize far beyond the context of human behavior and to not have anything in particular to do with statistics. Here’s a general heuristic argument: If y = f(x1, x2, …., xn), n is a large number, and y is known to lie in some (relatively narrow) range [a, b], then in order for a < y < b to be true, either (1) most of the xi have to have a small effect on y, or (2) if many of the xi have a large effect, most of these effects have to cancel each other out at any given time. This argument seems to me to make sense whether or not the xi are assumed to have probability distributions associated with them. But I don't see how one could make this argument rigorous and still keep it general—won't the particular way in which the Piranha Principle manifests itself depend very specifically on the particular nature of the function f? Do you think a rigorous and general Piranha Principle can be derived from a statistical point of view? Wouldn't such a result probably have to make very strong assumptions? Or are you envisioning a number of more specific Piranha Principles, each for a different type of problem?

    • Yes, strong assumptions. All statistics paper make very strong and sometimes peculiar assumptions to acheive a result, but with the benefit of being clearly stated. In the extreme case, I have see “if assumptions 1, 2 , 3…hold, then it follows…so on and so forth”. A good example is the classical sparsity literature, where there are strong assumptions when a method is consistent for variable selection. So no problem with assumptions, in my opinion, as they are very useful to think of things

    • Olav, what you’re talking about, at least in the case where x_i are random variables, is already a well established field of research in mathematics called “concentration of measure”

      https://en.wikipedia.org/wiki/Concentration_of_measure

      Basically what you need is independence, a Lipschitz function f, Lipschitz means its partial derivatives with respect to each variable are never bigger than some constant, so in other words if not a “small” effect at least never “arbitrarily large”, and also not too dramatic differences in the magnitude of each effect, for example if x1 can change the function over the whole range a,b but no combination of any of the other values could ever change the function more than 1/100 that amount the results obviously wouldn’t apply.

      So long as all the effects have some similar order of magnitude contribution, the partials are bounded by some constant, and *all the dimensions are independent* then you will find that f(x1,x2,…,xn) never varies much and there are various provable bounds on these things already based on various concentration of measure results. I imagine a review paper or monograph on concentration of measure would be a useful thing to investigate for this project.

      • To follow up though, the idea that each of the x_i are random and independent *in real applied problems* is the most problematic one. In real applied problems the first realistic assumption is probably that everything is correlated with everything else in some way. The next step is to maybe get bounds on how much dependence there is between the variables etc.

        Andrew: a particularly interesting kind of issue would be to look first at a pure mathematical “concentration of measure” type result, showing that f(x1,x2,…,xn) is basically constant, and then… investigate how correlations and interactions and dependencies between variables would break down that result (or not) doing this in an applied setting such as maybe an economic case where feedback mechanisms apply could be very interesting. Obviously situations like the great depression or the 2008 financial crisis *do* occur, and they represent rapid large deviations compared to what you’d expect from a function of millions of *independent* decisions by people making decisions by rolling dice… So in reality people don’t act like independent dice-rollers, and how much inter-dependence and feedback do you have to induce to get things like housing crises and soforth.

      • Daniel, I agree the concentration of measure idea is relevant, but it runs in the opposite direction, doesn’t it? Concentration of measure says that if the independent variables and f satisfy certain constraints, then the dependent variable will be (narrowly) bounded. But the Piranha principle is supposed to say (if I understand it correctly) that if the dependent variable is (narrowly) bounded and f satisfies certain constraints, then most of the independent variables cannot have a large effect unless they interfere with each other in such a way that they mostly cancel each other out. So it appears the concentration of measure idea and the Piranha principle are (almost) converses of each other.

        • I don’t see it that way: provided you have no control over them, you *can* have a lot of things which independently and randomly control f while f stays mainly constant.

          the piranha principle basically just adds “and if there are a lot of things, they *do* have to be independent, random, and out of our control BECAUSE f does stay near constant”

        • Or perhaps the piranha principle also says df/dx_i needs to be smallish for all the factors that can be controlled, because we don’t see f swing wildly even when we attempt to control people/things

        • But you cannot infer the second of your claims from the first. The first claim says, approximately, that A (many independent and random factors) is sufficient for B (f’s staying mainly constant), while the second claim says that A is necessary for B. These two claims are converses.

        • A is sufficient for B

          B

          therefore either A or some other thing sufficient for B

          piranha principle basically says either A (lots of things are random and out of our control) or some other thing sufficient for B (f stays near constant)

          so yes you can say “not A” (ie. not many random things) and then try to identify what is going on instead.

          I don’t think piranha principle necessarily includes the “not A” it’s more just “either A or something else that makes f constant”

    • Basically, create a simple simulated system that illustrates how hard it is to simultaneously have “many ways that you can easily influence an outcome” (lots of “large consistent main effects”) AND to still have observed stability in many aspects of everyday life (like, you don’t see people between breakfast and lunch altering their political ideas from party-line communism to middle of the road semi-conservative and then by dinner they’re NAZIs and the next morning they wake up market socialists).

      How can it be that there are hundreds of “reliable” ways to “noticeably” affect thing X by doing any of many small things Y and yet X rarely changes that much overall from moment to moment?

      So long as all the things are *random and independent* this is known as concentration of measure, but if these things are *under the control of others* then you should be able to push a few buttons and make everyone run out and spend their life savings on dry pinto beans or whatever.

      • Good, clean summary.

        Suggested research topic:

        1. Estimate how often voters change party identification (or change which party they vote for, or whatever)
        2. Using this estimate, establish an upper limit on the power of all priming factors that affect party identification.

        Be careful that priming factors have at least two dimensions (frequency and power). Will end up with results something like “we estimate that priming factors powerful enough to change party identification in 10% of subjects happen no more than once every 18.9 years.” Probably take estimates of priming power from the literature.

        • Could further model changes in voter behavior by including other (non-priming) factors known to affect voter behavior (the economy, aging, …). Obviously, this will further reduce the estimated effectiveness of priming factors.

        • IMO when you acknowledge the frequency issue of priming factors, you are walking away from Professor Gelman’s (IMO, wrong, as stated) research idea. He says, inter alia:

          “that all sorts of seemingly trivial interventions can have large effects.” (key word: interventions)

          “else it would be possible to push people’s opinions and behavior to ridiculously implausible lengths just by applying several stimuli in sequence”

          Shark attacks could (I suppose, but very doubtful – there are non-piranha criticisms of this research) have had a large effect on elections. And, if staged at the same rare frequency, maybe they still do – if we stage them rarely and randomly enough, they won’t often enough be confounded by something else (e.g. bear attacks). But that’s not at all the same as saying that we can stage them as we wish (e.g. specifically after staged bear attacks) and expect to see the effect. The stage-them-as-you-wish (or ‘reliable intervention’) interpretation of such results rules out frequency (among other thing) as a relevant factor.

          My opinion having followed a number of the citations here: very little of this research claims – even implicitly – a ‘reliable’ intervention in remotely the very strong sense Gelman needs for his complaint. Not only isn’t it claimed, most of the time, but the research can (not always will, of course) remain true and interestingly regardless.

        • Good comments.

          There seems to be a lot going on here, so let me just think out loud.

          Yes, what I suggested is not exactly what Andrew suggested. I am focused on showing that priming cannot be very strong or very common because if it were, we would see much more volatility in things like changes in party affiliation. Others have made this point above as well.

          Andrew acknowledges this in passing (“these large and consistent effects can’t all, or even mostly, be happening”) and goes on to say the effects must interact (“the problem is that they would interfere with each other”). I don’t get this. The logic seems to be that, SINCE we don’t see large net effects, THEN the effects must be cancelling because of interaction. (Others talking about concentration of measure above seem to think that increasing the number of interventions does not increase the net effect. I don’t buy that either.) (Andrew then goes on to make the point that IF there were all these interactions, THEN studies of interventions are invalid because of these interactions. I think this is a dead end.)

          The flaw in this thinking is that there is no reason to think there is a limit to the net effect of all these interventions EVEN IF THE INTERVENTIONS SOMETIMES CANCEL EACH OTHER. If interventions are i.i.d., variance in the dependent variable grows with the number of interventions even though there is some cancellation. In such a system, the dependent variable is a random walk, and there is no limit on the net effect as the number of interventions increase: more interventions = more variability in the dependent variable. The only way this would NOT be true is if the system had a negative feedback that made the system mean-reverting, and there is no reason to believe that.

          I think Brownian motion is a plausible model here. The more frequent the interventions and the more powerful the interventions, the larger the volatility of the dependent variable. (In a Brownian motion model, a particle’s net movement is positively related to the number of collisions and the force imparted by the collisions.)

          The model I proposed above was a simple way to get at this: assume people experience an intervention every x years and there is a y% chance that an intervention will make that person change party affiliation. Interventions have i.i.d. effects. What can we say then about x and y based on the observed frequency of changes in party affiliation?

          P.S. Andrew may think that this point is too obvious to support a paper (in passing he says “these large and consistent effects can’t all, or even mostly, be happening”). I don’t think so. It seems like a devastating critique of the intervention literature. But, if everyone already knows this, why are priming papers still being published?

        • > I am focused on showing that priming cannot be very strong or very common because …

          Make sense, and when you include ‘or very common’ I have to think you do appreciate my concerns. IMO, the thesis you are after is probably formalizable, and provable. It’s just not the professor’s.

          When you or your student does this, it’s not going to cast the slightest doubt on the shark
          or college football study. (I suppose I need to emphasize that while there are good criticisms of these studies, the piranha cricicism per se – even after your student proves it in your sense – won’t work _at all_).

          Moving to priming studies. Not in the area, but the evidence seems to be adding up to: nonsense. Still, it’s conceivable that there are countless very strong priming effects relative to a ceteris paribus world, but if applied together they could not possibly achieve the potential synergy – yet that wouldn’t say the individual results were wrong. Or even uninteresting.

        • Terry, start with your Brownian motion type idea. Now take the low observed volatility of the outcome, what does this imply about the size of the individual effects? If they’re uncorrelated, and there are a lot of them, then they must be small or as you say the volatility would be large.

          If on the other hand you identify at least a few large effects then there must be only a few, or they must be dependent and anti correlated.

        • Exactly.

          Above, I said “… priming factors have at least two dimensions (frequency and power). Will end up with results something like ‘we estimate that priming factors powerful enough to change party identification in 10% of subjects happen no more than once every 18.9 years.’”

  4. I’m still really not getting this, at all.

    I can see that research finding gets touted as finding “if _you_ do X, then Y” (perhaps the power pose quote is an example)
    can imply an implausible robust response to a given intervention; i.e. vulnerability to the piranha problem.

    But how much research claims this, even implicitly? I’ve just re-skimmed the college football paper. Whatever it’s other defects, I can’t see any honest reading under which they suggest that football success is an independently manipulable random variable, or even that it’s particularly interesting in itself. (Rather the idea seems to be: because of the in inherent randomization and clean measurement, local sports team surprises can be an effective instrument for shocks to ‘well-being’/mood.) If they are setting themselves up for any piranhas, it’s their cautiously phrased suggestion of a (‘fragile’, they say) effect that ‘well-being’ is an interesting predictor that can influence voting behavior.

    • bxg,
      OK, let’s say that if the local college’s football team wins, the people in the area vote differently than if it loses, and that this effect is big enough to be robust, i.e. it stands out above the noise caused by all the other effects. No problem, we’ll give you that one. Could be true.

      But now we hypothesize that whether the weather is good or bad also has a robust effect on voter preferences. That could be true too.

      Now we hypothesize that whether there has been a recent mass shooting in your state has a big effect. Sure, makes sense, right?

      Now we hypothesize that whether your local economy has grown stronger or weaker over the past year has a big, robust effect. How could it not?

      Each of these individually could plausibly be a large effect, but they can’t _all_ be large effects: if they were, then you wouldn’t see the college football effect in the data, because you’d be comparing local-team-won voters to local-team-lost voters, but some of those local-team-won voters will be voting in good weather in an improving local economy in a state with a recent mass shooting, while others will be voting in bad weather in a worsening local economy, and so on. It’s impossible to have a bunch of signals that are all big enough to stand above the noise, because each of those signals is noise when it comes to looking for the other signals.

  5. Does the observed low volatility of human behavior put limits on how large priming effects can be?

    If there are hundreds of powerful priming factors, shouldn’t the induced behaviors (such as voting patterns, speed of walking, feelings of power, etc.) be hugely volatile? Shouldn’t humans heave like malfunctioning, flailing robots? When they pass a red-colored poster, they do this, and when they see a smiley-face, they do that.

    Is Brownian motion relevant? The volatility of a particle tells us something about the size of the particles bombarding it. The fact that people change little from day to day (ignoring transient moods due to other factors) suggests that priming is little more than an occasional Nerf ball.

    This sounds like it is related to the “concentration of measure” comments above.

  6. I work as a statistician in advertising and this topic is particularly relevant to the effect of advertising. The typical ad campaign has effect sizes in the 1% range on the average treatment effect of the treated (ATT)*. The surprising finding is that either advertising is effective or the models are spuriously detecting effects. However, corporations continue to spend billions on advertising, which says that they’re convinced of the effect (unlikely from my little analysis corner and more likely from correlations between their profit and ad expenditures).

    One way that I think of this problem is similar to this coin game they used to have at arcades. The game is simple. There’s a moving platform with coins piled up on it and some of the coins are nearly falling off. Your objective is to drop a coin from above to see how many coins you can knock off the platform. Typical effects we see in the wild are due to people who are already on the edge and this little nudge has pushed them over.

    It’s not really that priming is influencing everyone. It’s that priming effects are mostly seen in those highly susceptible to the priming. I suspect that many studies your referencing are done in atmospheres quite different from reality which highly inflates effect sizes.

    * The time when the average treatment effect (ATE) overall is of interest has waned because 1) it’s lower, our clients don’t like that, and it’s not generally understood what the difference is to the layman – meaning competitors can capitalize on it; and 2) targeted advertising by way of activation has made it more plausible that the advertiser is actually targeting their intended treatment group.

  7. “It’s not really that priming is influencing everyone. It’s that priming effects are mostly seen in those highly susceptible to the priming. I suspect that many studies your referencing are done in atmospheres quite different from reality which highly inflates effect sizes.”

    This reminded me of, it think it was, Kahneman who in his book wrote something about him telling non-academics about priming effects and that they all started laughing.

    I sometimes think that academia, and psychology in particular, has become such a c#rcle j%rk that they don’t even think about what they are doing. It’s like the emperor has no clothes-story.

  8. ‘On one hand, you can’t have dozens of large and consistent main effects or else it would be possible to push people’s opinions and behavior to ridiculously implausible lengths just by applying several stimuli in sequence’

    This is not obviously correct. It may be, but it presupposes there is nothing that counteracts these ridiculously implausible additions, and the people who make the inidivual effects claims are silent on that.
    By way of example, it is possible to consume lots of substances which act as simulants and sedatives respectively, whose effects can be reliably demonstrated in lab experiments, and yet people rarely end up being awake for 14 days or falling asleep on the spot. Same applies to the laxative (or it’s opposite) properties in foods.
    In the case of priming effects the simplest such limitings factors I can think of (i.e. without any claim to exhaustiveness or correctness, just in terms of counterarguments) would be (a) that only the last prime counts (e.g. because the other primes just generate readiness to act which evaporates if there is no action it can be applied to), (b) that primes decay quickly and so it’s again the last prime and a little of the second to last and a very small bit of the third to last prime that influence behaviour.

  9. Could someone illustrate the approximate background knowledge required to attempt to answer this question? I think this would be interesting to attempt as an end-goal to some intensive study of statistics

Leave a Reply to Olav Cancel reply

Your email address will not be published. Required fields are marked *