That controversial claim that high genetic diversity, or low genetic diversity, is bad for the economy

Kyle Peyton writes:

I’m passing you this recent news article by Ewen Callaway in the hope that you will make a comment about the methodology on your blog. It’s generated some back and forth between the economics and science communities.

I [Peyton] am very sceptical of the reductive approach taken by the economics profession generally, and the normative implications this kind of research generates. For example, p. 7 of the working paper states: “…[according to our model] decreasing the diversity of the most diverse country in the sample (Ethopia) by 1 percentage point would raise its income per capita by 21 percent”. Understandably, this piece is couched in assumptions that would take hours to pick apart, but their discussion of the approach belies the uncertainty involved. The main response by the authors in defense is that genetic diversity is a ‘proxy variable’. This is a common assertion, but I find it really infuriating. I happen to drink coffee most days, which correlates with my happiness. So coffee consumption is a ‘proxy’ for my happiness. Therefore, I can put it in a regression and predict the relationship between my happiness and the amount of times I go to the bathroom. Ergo universal conclusions: ‘relieving yourself improves mental well being.’ New policy – you should relieve yourself atleast 2 times per day in order to maintain high levels of emotional well being. I know this sound like a South Park episode, but I have heard far worse.

But let’s put the normative implication aside — what can we learn from star gazing at the tables in this paper?

Here’s the background. Two economics professors, Quamrul Ashraf and Oded Galor, wrote a paper, “The Out of Africa Hypothesis, Human Genetic Diversity, and Comparative Economic Development,” that is scheduled to appear in the American Economic Review. As Peyton has indicated, the paper is pretty silly and I’m surprised it was accepted in such a top journal. Economists can be credulous but I’d expect better from them when considering economic development, which is one of their central topics. Ashraf and Galor have, however, been somewhat lucky in their enemies, in that they’ve been attacked by a bunch of anthropologists who have criticized them on political as well as scientific grounds. This gives the pair of economists the scientific and even moral high ground, in that they can feel that, unlike their antagonists, they are the true scholars, the ones pursuing truth wherever it leads them, letting the chips fall where they may.

The real issue for me is that the chips aren’t quite falling the way Ashraf and Galor think they are. Let’s start with the claims on page 7 of their paper:

Once institutional, cultural, and geographical factors are accounted for, [the fitted regression] indicates that: (i) increasing the diversity of the most homogenous country in the sample (Bolivia) by 1 percentage point would raise its income per capita in the year 2000 CE by 41 percent, (ii) decreasing the diversity of the most diverse country in the sample (Ethiopia) by 1 percentage point would raise its income per capita by 21 percent.

I think “CE” is academic jargon for what we call “A.D.” in English (or Latin, whatever), and strictly speaking the above bit is not a claim at all, it’s just an interpretation of their regression coefficients. But it clearly is a claim, in that the authors want us to take these examples seriously.

So let’s take them seriously. What would it mean to increase Bolivia’s diversity by 1 percentage point? I assume that would mean adding some white people to the country. What kind of white person would go to Bolivia? Probably someone rich enough to increase the country’s income per capita. Hey—it works! What if some poor people from Ethiopia were taken to Bolivia? They’d increase the country’s ethnic diversity too, but I don’t see them increasing its per-capita income by 41 percent. But that’s ok, nobody’s suggesting filling Bolivia with poor Africans.

OK, what about Ethiopia? How do you make it less diverse? I guess you’d have to break it up into a bunch of little countries, each of which is ethnically pure. Is that possible? I don’t actually know. If you can’t do that, you’d need to throw in lots of people with less genetic diversity. Maybe, hmmm, I dunno, a bunch of whites or Asians? What sort of whites or Asians might go to Ethiopia? Not the poorest ones, certainly: why would they want to go to a poor country in the first place? Maybe some middle-income or rich ones (if the country could be safe enough, or if there’s a sense there’s money to be made). And, there you go, per-capita income goes up again.

So I don’t see it. It’s true that later on page 7 the authors try to wriggle out of this one:

Reassuringly, the highly significant and stable hump-shaped effect of genetic diversity on income per capita in the year 2000 CE is not an artifact of postcolonial migrations towards prosperous countries and the concomitant increase in ethnic diversity in these economies. The hump-shaped e§ect of genetic diversity remains highly signiÖcant and the optimal diversity estimate remains virtually intact if the regression sample is restricted to (i) non-OECD economies (i.e., economies that were less attractive to migrants), (ii) non-Neo-European countries (i.e., excluding the U.S., Canada, Australia, and New Zealand), (iii) non-Latin American countries, (iv) non-Sub-Saharan African countries, and, perhaps most importantly, (v) countries whose indigenous population is larger than 97 percent of the entire population (i.e., under conditions that virtually eliminate the role of migration in contributing to diversity).

I don’t buy it. I’m not saying their central point is wrong—it’s basically a twist on the classic “why are some countries so poor” question—but the extrapolations that they give themselves reveal the problems with their interpretation of the regression model. The way you make Bolivia more diverse is by adding more white people. It’s fine to study these things but you have to think about what your models mean.

Everybody wants to be Jared Diamond, that’s the problem.

OK, if all this is the case, what went wrong, and how could Ashraf and Galor have done better? I think the way to go is to start with the big pattern they noticed: the most genetically diverse countries (according to their measure) are in east Africa, and they’re poor. The least genetically diverse countries are remote undeveloped places like Bolivia and are pretty poor. Industrialized countries are not so remote (thus they have some diversity) but they’re not filled with east Africans (thus they’re not extremely genetically diverse). From there, you can look at various subsets of the data and perform various side analysis, as the authors indeed do for much of their paper.

The problem is closely related to their paper appearing in a top journal. The way I see this work, the authors have an interesting idea and want to explore it. But exploration won’t get you published in the American Economic Review. Instead of the explore-and-study paradigm, Ashraf and Galor are going with assert-and-defend. They make a very strong claim and keep banging on it, defending their claim with a bunch of analyses to demonstrate its robustness. I have no problem with robustness studies (recall that I was upset about some claims about age and happiness because I had difficulty replicating them with new data), but I don’t think this lets you off the hook of having to think carefully about causal claims. And presenting tables of numbers to three (meaningless) decimal places doesn’t help either.

High-profile social science research aims for proof, not for understanding—and that’s a problem. The incentives favor bold thinking and innovative analysis, and that part is great. But the incentives also favor silly causal claims. In many social sciences, it’s not enough to notice an interesting pattern and explore it (as we did in our Red State Blue State book). Instead, you’re supposed to make a strong causal claim even in a context where it makes little sense.

92 thoughts on “That controversial claim that high genetic diversity, or low genetic diversity, is bad for the economy

  1. I suppose Ethiopia would also become less diverse if parts of it were wiped out by a famine or something, in which case also one might expect income per capita to go up for Malthusian reasons?

  2. Andrew,
    I happen to agree with you that this paper (and most of growth theory) is nonsense, but I don’t think your criticism are really fair in this case.
    The interpretation of the regression that you quote is not meant to be an actionable “raise the diversity today and gdp will go up X%”- it’s a historical model that refers to diversity as it contributes to innovation (up to a certain point) or stagnation (beyond that). Nor do they say that diversity should mean adding ‘white people’, and not adding Bolivians to Ethiopia and Ethiopians to Kenya.

    I think a much more salient criticism is that it doesn’t add much to practicable knowledge, given that diversity cannot be enforced, and that it seems to be an unfalsifiable theory because of all the covariates involved in growth. Essentially it is a ‘just-so’ story, and if it neither contributes to actionable knowledge (ostensibly why we study growth) nor furthers scientific understanding of the phenomenon, what is the point? That is a criticism that applies to Oded and Galor’s larger research program “unified growth theory”, which while admirably ambitious, seems to be even more flawed. You can read up on it yourself, but the idea is essentially that humanity has been in phases of malthusian trap and exponential growth, and they try to generate a model that predicts two phases as such. To me, it seems like this is akin to using your entire dataset without a hold out sample; trying to claim that your theory is predictive or illuminating and not a just-so story or consequence of overfitting seems like a stretch.

  3. Ashraf and Galor wrote a response to the anthropologists’ claims –

    The anthropologists have written a longer response than the one linked to above, and it has just been published in Current Anthropology –

    There was also some interesting debate between the critique authors and some other anthropologists below a post I wrote on this paper a couple of months ago –

    • I believe you were incorrect in your interpretation of how they measured diversity.

      Yes, they used migratory distance to derive “native” diversity indices for individual ethnic groups. This had to be done because they only had direct diversity data for a limited number of groups. Whatever data they did have correlated well with migratory distances.

      But then they calculated overall diversity of each country from the data for individual ethnic groups and estimated genetic distances between them (appendix B).

      The kicker is that, when they say “diversity”, they really mean “heterozygosity”. Which is very different from what you and I mean by “diversity”.

      In the extreme idealized example, imagine a country comprised of two perfectly homogeneous ethnic groups, 50% of Bolivian Native Americans and 50% of whites. You and I would describe this country as pretty diverse. But according to their definition, each ethnic group has heterozygosity of 0, and therefore the diversity index of the country as a whole is still 0 regardless of genetic distance between them.

      To use more realistic numbers, let’s assume that Bolivia is 85% Indian / 15% white. Using data from figure 1, whites are, say 0.72, natives are 0.55, genetic distance is, say, 0.1, overall hetorozygosity (0.72*0.15+0.55*0.85)/(1-0.1) = 0.64, which is more “homogeneous” than any country in the world and comparable to heterozygosity among, say, Aztecs circa 1500.

      • After thinking this through some more, my first example is bad. Genetic distance between two perfectly homogeneous but distinct ethnic groups would be infinite (by construction), so that’s not the real problem.

        The real problem is that the “heterozygosity” metric answers the question: “how diverse would this country be if genes of its residents were thoroughly mixed together by intermarrying across ethnic group boundaries for a few centuries, and there were no more Indians, whites, etc, but just one uniform population”?

        If you thoroughly mix 85% of very innately homogeneous Indians with 15% of whites, you can end up with a slightly whiter but still Indian-looking and very homogeneous population. Which is obviously not the state Bolivia is today.

        • I think what Ashraf and Galor are trying to do is model the factors contributing to economic development as of, roughly, 1492 — before Columbus started mixing everything up. So, they are just going to ignore that modern Bolivia is a fairly mestizo country and try to think about it before Columbus.

          In a superstylized sense, they have roughly three huge datapoints: in 1492, Eurasia was the most developed supercontinent, sub-Saharan Africa the least (assuming we ignore Australia), and parts of the Americas in between. You can correlate that to “junk” gene diversity (highest in sub-Saharan Africa, lowest in the Americas due to multiple choke points) by assuming an upside down hump shaped curve running through their three datapoints and call it a day.

          I think the genetic diversity stuff is mostly a red herring. Keep in mind that they aren’t looking at diversity in genes that do important things like cause lactose tolerance or fight malaria or accomodate Bolivian Indians to living at high altitude. They are following population geneticists who look for genes that do as little as possible in the real world (more or less “junk genes”) because those genes aren’t selected for. They just propagate according to easily calculated statistical principles.

          You could make a decent argument about some geographic locations being better than others for purposes of cultural diffusion: getting your hands on technology invented somewhere else. For example, Bolivia was hard to get to, so the Incas apparently didn’t have the wheelbarrow, a device that they would have found highly useful in their impressive construction projects. In general, New World Indians of MesoAmerica and the Andes tended to have more variability than other peoples in degree of technological advancement. They tended to be highly sophisticated in some ways but not in others. Because they were so cut off from the Old World, they had to invent most of their technology themselves and couldn’t get the missing pieces from the Old World, whereas, say, Chinese inventions tended to filter to Europeans over time and vice-versa.

          In contrast to Bolivia, Venice was located in a spot ideal for being exposed to new inventions from all over the Old World: e.g., Marco Polo returned home to Venice from China, and later Gutenberg’s printing press came down from Germany rapidly and made Venice the bookprinting capital of the world, which is a good thing to be, economically speaking.

        • “they are just going to ignore that modern Bolivia is a fairly mestizo country and try to think about it before Columbus.”

          No, that’s not the case. They have separate charts for 1492 and 2000, and the problem is that, in their methodology, mestizo-fication of South America isn’t reflected nearly strongly enough. Bolivia manages to stay on top, probably because is the least mestizo country of all its neighbors. (Bolivia is 59% pure Amerindian, Peru is 45% pure Amerindian, Ecuador is under 30%, Brazil and Chile are under 10%.) They all start extremely high on homogeneity scales (figure 4, page 28), but Bolivia barely moves toward European numbers in 500 years, when, for example, Colombia drops so much that it ends up below a couple of Asian countries (but still above all European countries).

          One would think that, in any sensible model, Colombia (37% Amerindian / 46% Spanish / 17% African ancestry) would be more diverse than most European countries and certainly more diverse than Finland, but that’s just not how they define “diverse”.

          “They are following population geneticists who look for genes that do as little as possible in the real world (more or less “junk genes”) because those genes aren’t selected for.”

          I think this study is looking at randomly selected genes. Yes, most of them likely would not be selected for.

        • After thinking even more, things appear even worse.

          According to the source they are using, Bolivia is 59% Amerindian, 27% Mestizo, 14% European, or (in terms of original contributions) 72% Amerindian, 27% European, 1% African.

          Their charts report “(Predicted) Genetic diversity” of Bolivia in 1500 CE (that is, pure Amerindian) at 0.41, European countries around 0.27, Bolivia in 2000 CE at 0.37. (Why can’t they just provide a spreadsheet with all the numbers?) There’s no way to get 0.37 for Bolivia if we follow their methodology literally. You get a lot less homogeneity, possibly as low as 0.30, which would put it in the middle of the pack and totally screw up the results.

          The only way I see how to get homogeneity that high is to add one extra step:

          * For 59% of Amerindians, assign diversity index of 0.41
          * For 14% of Europeans, assign index 0.27
          * For 27% of Mestizos, follow formula B.2 of the appendix and derive an index of ~0.3
          * Then do weighted average of all of the above.

          In plain English, if that’s how it was calculated, we are declaring Bolivia MORE homogeneous than it could have been, because there are still pure Amerindians and pure Whites left and there hasn’t been enough cross-ethnic intermarrying!

        • Thanks.

          How did this paper come to get such a big splash? Why didn’t somebody in the pre-publication process object that Bolivia in 2000 was a country where genetic diversity was obviously a major factor in everyday life, so, at minimum, the authors need to rewrite their abstract to not set off readers’ Reality Check alarms?

          I guess because few people these days have thought hard about genetic diversity. All you are supposed to know about it is that Africans are the most genetically diverse.

          In contrast, the effects of genetic difference strikes me as a fascinating topic, so I pay attention to the occasional news story about how, say, the capital of Bolivia, La Paz, is ethnically sorted by altitude. Poor Indian people live in the high suburbs up around 12,000 or 13,000 feet elevation because they tend to have a physiological adaptation (identified by Cynthia Beall of Case Western in 2006) that helps them endure such thin air, while the whitest people live in the expensive real estate at the very bottom of the canyon because white women have terrible pregnancy problems above about 10,000 feet.

          All this has political implications as well: Evo Morales is the first Bolivian president in a long time to look mostly Indian and he is seen as representing long-oppressed Indian interests. Sometimes, the lowlanders talk about seceding. It’s a country where politics, race, and altitude are all intertwined inextricably.

        • In the end, I think they did a fairly good job estimating country-average heterozygosities. It’s not really their fault that “country-average heterozygosity” has very little to do with what we’d normally understand as “diversity”, or that the whole distinction would be completely lost on most of their readers.

      • The index of diversity is much deeper. The authors write:
        Nonetheless, while the existing data on genetic diversity pertain only to ethnic
        groups, data for examining comparative development are typically available at the
        country level. Moreover, many national populations today are composed of multiple
        ethnicities, some of which may not be indigenous to their current geographical
        locations. This presents two complex tasks. First, one needs to construct a measure
        of genetic diversity for national populations, based on genetic diversity data at the
        ethnic group level, accounting for diversity not only within each component group,
        but for diversity due to differences between ethnic groups as well. Second, it is
        necessary to account for the possibility that nonindigenous ethnic groups may have
        initially migrated to their current locations due to the higher economic prosperity of
        these locations.
        To tackle these difficulties, this study adopts two distinct strategies. The first
        restricts attention to development outcomes in the precolonial era when, arguably,
        regional populations were indigenous to their current geographical locations.
        Specifically, in light of the serial founder effect, the presence of multiple indigenous
        ethnicities in a given region would have had a negligible impact on the diversity
        of the regional population during this period. The second, more complex strategy
        involves the construction of an index of genetic diversity for contemporary national
        populations that accounts for the expected heterozygosity within each subnational
        group as well as the additional component of diversity at the country level that
        arises from the genetic distances between its precolonial ancestral populations. The
        examination of comparative development under this second strategy would have to
        account additionally for the potential inducement for members of distinct ethnic
        groups to relocate to relatively more lucrative geographical locations.”

  4. The funny thing is that Bolivia and Ethiopia have surprising amounts in common: they’re mostly remote highland countries, with a fair amount of mixture between indigenous peoples and Caucasians. (Just look at a picture of the late Emperor Haile Selaisse.)

    To you and me, Bolivians may look pretty genetically diverse, with major contributions from Europe and the indigenes, with maybe some African in there in the lowland.

    Their critics are pretty silly, but Ashraf and Galor screwed up almost exactly the way I figured they did, only maybe even more so: instead of using diversity of junk genes in isolated pre-Columbian tribes like I assumed, they went one step farther and used a measure of migratory distance from the ancient Out of Africa event to come up with a stylized version of how much Out of Africa junk gene diversity there _would_ be if there hadn’t been any post-1492 admixture with Europeans or Africans!

    But, in the remarkably stylized model built by Ashraf and Galor, Bolivia _has_ to be the most genetically homogeneous because it’s just about the hardest place to walk to from the Olduvai Gorge, creating numerous genetic bottlenecks. You have to get out of Africa, then you have to get out Siberia, then you have to get past the Panamanian isthmus, then you have to climb high into the Andes. I’m tired just typing all that.

    In contrast, Ethiopia has to be the most genetically diverse country because it’s close to the Olduvai Gorge. (Nevermind about how Abyssinians believe they are descended on one side from the son of the Queen of Sheba, who came from the Arabian Peninsula thousands of years ago. But in the Ashraf and Galor model, somebody who is part black African and part Arab, like Haile Selaisse, is less genetically diverse than somebody who is all black African because the Selaisse’s Caucasian Arab distant ancestors had to go through the Out of Africa bottleneck eons ago.

  5. Something others here did not note: The measure they use of economic developmen is also very indirect, it is (log) population density in 1500 CE! Is that a reasonable measure of economic development ? (In 1500)

  6. More strange things: They have an appendix where the 53 ethnic groups used are listed. In that appendix, there are exactly NONE ethnic groups from Bolivia! So where do they get their Bolivian data?

    • They do a regression on genetic diversity vs. migration distance from Ethiopia among the 53 ethnic groups, and then use coefficients of the regression to derive genetic diversity of everyone else.

  7. Why cant the critics try to be a little more precise:

    “To you and me, Bolivians may look pretty genetically diverse, with major contributions from Europe and the indigenes, with maybe some African in there in the lowland. ”

    Only guesswork, no knowledge at all. Historically, yes, there were black slave in Bolivia, but not in the lowlands, not at all. They were used in the silver mines in Potosi. Well, not inside the mines, they didnt last there, there the indians were used. The blacks worked in the melters. When colonial times ended around 1825, slavery were abolished, and the blacks left Potosi, and settled in a few all-black villages in the yungas (I have been to one of them, it is still all-black). See here:

      • The all-black villages are in the Yungas, which are the valleys down from the Andes in the north of La Paz department. In a very short distance the altitude goes from 4500 to 1000 metres! The all-black villages are at about 1000 m altitude. Rather hot there, infact, but probably not as hot as Africa!

  8. Pingback: They’d rather be rigorous than right « Statistical Modeling, Causal Inference, and Social Science

  9. If you were more diverse you would know that CE, common era, is the way that most Jews refer to the calendar. BCE, before common era, CE, common era. If you don’t want your dating to coincide with a specific religious figure, that is how you do it.

      • Boris Johnson has a good take on that issue.

        There was Christ, and if the BBC doesn’t want to date events from the birth of Christ then it should abandon the Western dating system. Perhaps it should use the Buddhist calendar, which says that it is the 2,555th year since the nirvana of Lord Buddha. Perhaps it should have a version of the old Roman calendar, and declare that this is the fourth year of the fourth consulship of Silvio Berlusconi. It could say that this year was 13,400,000 or whatever since the Big Bang, or maybe the BBC should switch to the Mayan calendar and announce that 2011 is the year 1 BC – before the catastrophe that is meant to engulf the planet.

        But if the BBC is going to continue to put MMXI at the end of its programmes – as I think it does – then it should have the intellectual honesty to admit that this figure was not plucked from nowhere. We don’t call it 2011 because it is 2011 years since the Chinese emperor Ai was succeeded by the Chinese emperor Ping (though it is); nor because it is 2011 years since Ovid wrote the Ars Amatoria. It is 2011 years since the (presumed) birth of Christ. I object to this change because it reflects a pathetic, hand-wringing, Lefty embarrassment about thousands of years of cultural dominance by the West.

        The simple fact is that the Roman empire was programmatic of most of our modern global civilisation, and the decision by Constantine in 330 AD to make Christianity the official religion was one of the most important moments in the history of that empire. That is why we have used this system for 1,500 years and more, and that is why it is accepted in China, Japan and just about anywhere you care to mention that this is the year 2011.”

    • Nice.

      One thing I’d point out though is that the criticism of their use of “migratory distance” is somewhat misdirected. Yes, genetic diversity mainly varies on the continental scale because of serial founder effects. But even within most continents, there’s negative correlation between diversity and distance (see figure 1 of the paper, they took data from Even if their assumed within-continent correlation is stronger than it is justified, that would not change the results.
      It would be more important to focus on the next gap in the reasoning: that low heterozygosity somehow leads to greater “cooperation” (and therefore, according to Ashraf/Galor, there should be more altriusm and cooperation in Colombia than in Finland, and there should be more cooperation between cowboys and Indians in the Wild West than in any randomly picked African village.)

    • Found an interesting site.

      Ashraf/Galor’s claim that diversity leads to cooperation is supported by a claim that there’s strong correlation between genetic diversity and interpersonal trust (lower diversity, higher trust), as measured by World Values Survey.

      Here’s a set of WVS maps for the “trust index”:

      There seems to be a U-shaped relationship, with low trust in Africa and South America and high trust in Northern Europe. Getting from here to p<0.05 significant correlation between diversity and trust is not easy. They had to control for over 10 different effects to do that (in other words, they had to bend the data till it gave in.) See table 9.

      Now, the next step of logic chain is easy. The statement that interpersonal trust is strongly correlated with income per capita at p<0.01, even controlling for a large number of other factors (table D.14), is not too surprising, even though it's not obvious in which direction the causation runs.

  10. Andrew, I am not sure I fully understand your point. Their hypothesis seems interesting to you, their empirical approach seems sound to you but you disagree with the interpretation of their results, is that correct?

    If so, how would you have restated their empirical results? Omitting reference to causality and just speaking in terms of correlations?


    • Tom:

      Yes, I think they should’ve just said: “Using this measure of genetic diversity, we see an interesting nonmonotonic pattern between country-level diversity and country-level income,” and gone on to explore this without the causal language. Or, if they wanted to talk about causality (which would be fine), I think they should be specific about potential interventions. They don’t have to be actual interventions that anyone’s done, but they should be clearly defined. “Increasing the genetic diversity of Bolivia” doesn’t count as a potential intervention for me, because there are many different ways that this could be done, and I’d think these different interventions would have much different effects on Bolivia’s per-capita income.

  11. Pingback: That controversial claim that high genetic diversity, or low genetic diversity, is bad for the economy | Curious Young Statistician.

  12. I find the comments of Gelman surprisingly shallow. The reply of Ashraf and Galor to the Anthropologists is applicable for the Gelman also.

    “It is unfortunate that our critics have not communicated with us directly before publicizing their viewpoint in order to clarify potential misconceptions that often arise with research that traverses different fields of intellectual inquiry. One germane example of how misconceptions can arise due to the methodological gaps between disciplines is the following. It is standard practice in economics for authors to interpret their empirical findings in the context of thought experiments. It is also common knowledge amongst economists that, depending on the context of the study, such thought experiments are not to be interpreted as policy directives. We feel that awareness of such methodological subtleties on the part of our critics may have prevented the non-scientific rhetoric in their letter”

    As was clarified by Ashraf and Galor the statement: “increasing Boliviaís diversity to the optimum level prevalent in the U.S. would increase Boliviaís per capita income by a factor of 5.4” is designed to quantify the effect of genetic diversity and not as a policy recommendation . The level of diversity in Bolivia is an outcome of 1000s of years in interaction between the initial level of genetic diversity and the environment and it cannot be engineered instantaneously.

    • Evy:

      Huh? You think I should contact the author of every paper before I blog on it? I think it’s their responsibility to get things right before publishing in the American Economic Review, not my responsibility to figure out what they are trying to say.

      In response to the specific point: yes, of course I understand they are talking about a thought experiment, not a policy directive. The point is that the thought experiment is not well defined. “Increasing Bolivia’s diversity” could be done in many different ways and these would have any different effects. An unclearly-defined thought experiment is not much of a thought experiment at all. The issue is not whether the intervention could be engineered instantaneously; the problem is that the intervention is not defined.

      • Andrew, I think you are really to amigable with this authors!! ¿Do you really, really think there is any reason at all to take that paper seriously? To me it looks like a bad joke. The data is bad (How can you get reliable data on population density in pre-Columbus America? Do it make any sense to use modern national boundaries to calculate areas for use as divisors? How much can you really conclude from a regression with only, in reality, four data points? and so on, and so on …)

        • Have you read the paper? The important part of the analysis is on the effect of genetic diversity on income per capita TODAY. It is not based on population density. Please read before posting nonsense.

        • I id not poste nonsense! If the population density part is not an important part of their argument, why is it in the paper at all?

      • Andrew, have you read the acknowledgment in the first page? It appears that this paper passed to scrutiny of 5 referees, and it was presented in about 50 leading universities in the world. With all due respect, a-priori, the likelihood that your comments are off-mark is higher than the likelihood that 1000s of people who heard the paper in seminar and conferences, and referees and editors who read it very carefully as part of your editorial duties. Humbleness perhaps, or this is too much to request from a blogger?

        Unfortunately, you have missed my point and also that of the authors. You have chosen to focus on an uninteresting element of the paper, that unfortunately, you have not properly understood. The paper suggests that events that occurred as early as 90,000 years accounts for 16% of the variations in income per capita. It suggests that there is an optimal level of diversity that reflects the trade-off between the cost and the benefits of diversity and this optimum has shifted in the process of development from the level in Japan to that in the US, etc. etc.

        As to you minor point about the thought experiment, genetic diversity in Bolivia was determined long ago. The thought experiment is well specified. Increasing the genetic diversity of Bolivia by one percentage point 500 years ago would have the same implications on economic outcome today regardless of the source of the change.

        • Evy:

          You write that I “have chosen to focus on an uninteresting element of the paper.” Hey—I didn’t make the authors put uninteresting things in their paper. “Uninteresting” is your word, not mine. The authors wrote of “increasing the diversity” of Bolivia. My point is that there are different ways to do this.

          And, regarding your appeal from authority: Sure, I’m just an unhumble blogger while this paper passed the scrutiny of 5 referees and was presented in 50 universities. On the other hand, mistakes do get published, even in top journals. I should know—I once published a false theorem in a top statistics journal. And none of the referees noticed it! Years later, someone pointed out to me that we had made a mistake.

          Back to the paper that passed the scrutiny of 5 referees. On the other hand, lots of people didn’t that paper was so great. If you scroll up in this thread, you’ll see that a commenter posted a link to a rebutting paper written by 18 authors, including 9 Harvard professors. Could 9 Harvard professors all be wrong? There are only 2 authors of the paper under discussion, and they are at WIlliams and Brown. Williams and Brown are fine, but they’re not Harvard.

          As you can see, this argument from authority is getting us nowhere. There’s real disagreement here. I feel no more embarrassment in agreeing with 9 Harvard professors who happen to be anthropologists, than you feel in agreeing with 2 non-Harvard professors who happen to be economists As I see it, Ashraf and Galor saw an interesting pattern and they explored it. That’s a great thing to do. And then they overstated their claims, which I don’t think is so great but it seems to be what it takes to get published in a top journal.

        • Andrew,
          I strongly disagree with your overly restrictive view
          of “thought experiments,” according to which
          any “thought experiment” that is not tied to a
          specific intervention “is not much of a thought experiment at all.”
          This restriction is one of several harmful doctrines
          that have emerged from the potential outcome paradigm,
          and has led its adherents to regrettable blunders

          Since I am not an anthropologist nor an economist,
          I cannot judge whether “increased diversity in Bolivia”
          belongs to the class of well-defined thought experiments.
          I would side nevertheless with Evy, Ashraf and Galor in their
          effort to liberate “thought experiments”, “change” and “effects”
          from the tyranny interventions.

          The following examples illustrate well-defined thought experiments
          that are not anchored in specific interventions.

          1. If the gravitational constant where to double,
          I would not be able to jump that high.
          2. Sex discrimination by recruiters
          (i.e., the direct effect of gender on hiring decisions)
          counts for 30% of gender disparity in the workforce
          3. Smoking is a major contributor to lung cancer,
          regardless of whether it is encouraged by peer
          pressure, cigarette advertisement or
          genetic disposition to smoking.

          In summary, my advice to students of causation is:
          Experimentation is merely one useful way of unveiling the
          workings of Nature, and not a very useful way of
          reasoning about it.

        • Judea:

          I don’t know why you’re talking about gravity, sex discrimination, and smoking given that we have a real example right here. To restate: my problem is not that “the diversity of Bolivia” (however it is defined by the researchers) cannot be manipulated, my problem is that there are many different ways to alter diversity of Bolivia, and these different interventions would have much different effects on the outcome of interest.

          As I discussed in one of these threads, I agree that sometimes there are multiple potential interventions that can yield similar effects on the outcome (as in your smoking example, presumably). My problem with the example under discussion here is that different ways to “increase diversity” would have much different effects on Bolivia’s income. I’m not disagreeing with your general principle, but in this particular case I don’t think you can go far by thinking that “increasing diversity” has any kind of constant effect. That in fact was my point to begin with.

        • Andrew,
          I am talking about gravity, sex discrimination, and smoking, because you made
          a general statement about ALL thought experiments. You insisted that if we do
          not have a well-defined intervention, we do not have a well-defined thought experiment,
          and any “thought experiment” that is not tied to a
          specific intervention “is not much of a thought experiment at all.”

          I brought up gravity, sex discrimination, and smoking, because in these areas
          we share scientific understanding that permits us to classify an “intervention-free”
          change as a well-defined concept, something you excluded from the realm of possibilities.
          Such common understanding may also exists among readers of Ashraf and Galor;
          I am not the one to judge.

          If you agree that NOT EVERY thought experiment needs a specific intervention to be
          “well-defined” then we are in agreement. But do you? Are you prepared to challenge
          Rubin-Holland’s ruling mantra of “No Causation without Manipulation”??

        • Judea:

          I didn’t say, “any thought experiment that is not tied to a specific intervention is not much of a thought experiment at all.”

          What I said was, “An unclearly-defined thought experiment is not much of a thought experiment at all.” I think we can come to agreement here by focusing on the word “unclearly.” For example, your smoking story seems clear enough to me. You’re talking about people smoking less, and you’re assuming (perhaps reasonably enough) that different routes to this behavior change will have essentially equal effects on cancer. In a setting like that, I personally still find it helpful to consider one or two or three potential interventions (if only to convince myself that the specific form of the intervention doesn’t really matter)—but if the form of the intervention doesn’t actually matter, then, sure, there’s no real need to specify it.

        • Andrew,
          I am glad we are converging on an agreement: For a “thought experiment”
          to be well-defined it need only be “clearly defined”, regardless of whether it
          is tied to a specific intervention, and regardless of whether the change involved
          can be brought about in multiple ways, each having a different effect.

          Copying your quote, I read:
          “… I understand they are talking about a thought experiment, not a policy directive. The point is that the thought experiment is not well defined. “Increasing Bolivia’s diversity” could be done in many different ways and these would have any different effects. An unclearly-defined thought experiment is not much of a thought experiment at all. The issue is not whether the intervention could be engineered instantaneously; the problem is that the intervention is not defined.”

          Our understanding now is that the intervention is NOT ESSENTIAL for qualifying a thought
          experiment as “well defined”. An intervention may or may not exist, it may be well-defined
          or ill-defined, it may have different modalities each having different effects — all that does not matter. A “thought experiment” has life of its own and
          it becomes “well defined” as soon as it is “clearly defined”, with or without an intervention.
          The examples of gravity, sex discrimination and smoking illuminate three instances
          in which thought experiments, as well as “causes” and “effects”, are clearly defined, hence
          well defined, without mentioning any intervention.

          Conclusion: The mantra “No Causation without Manipulation” is a relic of a bygone

        • The answer to whether an intervention needs to be specified is not a yes/no question. It depends on whether the causal effect is a path dependent (in the physics sense) function of the causal variable.

          If the causal effect is a path dependent function of the causal variable, then an intervention needs to be specified. If the causal effect is not a path dependent function of the causal variable, then specifying an intervention is not necessary.

          The effect of genetic diversity on the economy would be a good example of the former case. The effect of gravity on jumping height would be a good example of the latter case.

        • I oversimplified a bit – it’s a question of whether the causal effect is path dependent (full stop), not necessarily path dependence with respect to the causal variable.

        • I agree with revo11 that “whether an intervention needs to be specified is not a yes/no question”. But it also seems to me that thinking more concretely about the details of an intervention (as Andrew did with the paper in question) will be productive.

        • Paul, and revo11,
          Whether an intervention needs to be specified depends on whether we want to say something about
          the effect of a specific intervention or simply convey scientific knowledge.
          The examples of 1.gravity, 2. Sex discrimination, 3. effect of smoking demonstrate typical cases
          where specifying an intervention is not only unhelpful, but diversionary and stifling .
          Imagine a defendant in a sex discrimination trial insisting on knowing what kind of sex change intervention
          the plaintif has in mind when she/he complains about the ” effect of gender on hiring”.

        • Judea:

          You write:

          The examples of 1.gravity, 2. Sex discrimination, 3. effect of smoking demonstrate typical cases where specifying an intervention is not only unhelpful, but diversionary and stifling. Imagine a defendant in a sex discrimination trial insisting on knowing what kind of sex change intervention the plaintiff has in mind when she/he complains about the “effect of gender on hiring.”

          Due to my foxlike nature I respect that you do not find the potential outcome approach useful here, but let me assure you that I find it useful, indeed essential, to consider potential interventions.

          I have not ever been involved in a sex discrimination case so I will speculate. I think this is OK, given that you brought up the example.

          In this case, the potential interventions I am thinking about is not a “sex change” (as you put it) but rather different possible actions by the employer. If the employer is being sued for discrimination, I assume the claim is that things could’ve gone differently if he or she had behaved differently, or followed a different policy. Thinking this way is exactly in the potential outcome framework.

          Again, I respect that you don’t like thinking that way. But I do. To me, thinking about potential outcomes is much more direct than thinking about graphs. Maybe “diversionary and stifling” to you but not to me, Imbens, Rubin, etc.

        • Andrew, I think you are confusing potential outcomes (PO) and potential interventions (PI). I’ve noticed how in various discussion you have used these interchangeably.

          If you believe causal inference is only possible through manipulation then PI PO.

          Judea’s point is that PI –> PO but the reverse (PO –> PI) does not have to be true. If anything Judea’s approach is less restrictive in terms of thinking about PO.

          So if you like POs then I don’t see the problem here. I only see a problem if you believe there is no causality w/o manipulation.

        • The HTML hides the iff symbol.

          “If you believe causal inference is only possible through manipulation then PI PO.” above should read:

          If you believe causal inference is only possible through manipulation then PI –> PO and PO –> PI.

        • Fernando:

          No, I’m not confused. In this (hypothetical) sex-discrimination lawsuit, I am interested in potential interventions (different things the employer could have done) and potential outcomes (number of male and female employees, their salaries and working conditions, etc.). For me, both are needed in this case.

          In the classic Neyman/Rubin theory, the potential interventions have already been specified and it is the statistician’s job to model the potential outcomes. In practice, though, I think much of the value of the causal framework comes from the requirement to clearly specify the interventions.

          Consider an analogy to decision theory. in classical decision analysis, you come to the problem with the tree already specified, and you have to assign probabilities to all the branches and values to all the leaves. But, in practice, you often have to construct the tree, and one of the benefits of decision analysis is the requirement to think carefully about what are the decision and uncertainty options within that tree.

          Finally, to get back to potential interventions: as I and others have discussed in this thread, sure, there are examples where the problem is clear enough that you could imagine altering or fixing X directly with no need to specify an intervention. Judea gave an example of cigarette smoking and cancer, for example. But in settings such as “the diversity of Bolivia” or this hypothetical sex discrimination lawsuit, I think it is both necessary (for me) and extremely helpful to try to think about potential interventions.

          Finally, I have never made the statement, “there is no causality without manipulation,” or anything to that effect.

        • Andrew:

          I fully agree with your statement that “I think it is both necessary (for me [too]) and extremely helpful to try to think about potential interventions”. I just wanted to clarify that (for some of us) manipulability is not strictly necessary for causal inference.

          In passing, I find the way you are setting up the problem (“potential interventions (different things the employer could have done) and potential outcomes (number of male and female employees, their salaries and working conditions, etc.).”) confusing.

          Though I am no expert I like to think of discrimination as a form of reactivity. E.g. you expose a subject (e.g. employer) to an item (e.g. gender), and you get a reaction (e.g. salary offer). The hypothesis is that the item has an effect on the reaction.

          In principle you could test this by manipulating gender in otherwise identical CVs, etc… The main difficulty here is that employers hire people that have many other attributes besides gender but that are otherwise highly correlated with gender. Thus, even identically matched CVs may leave out unobservable characteristics associated with gender that may be driving the reaction rather than the gender itself. In turn this opens the definition question of whether by female you mean a narrow disposition of XY chromosomes, or a broader definition of female as “woman”, say.

          Supposing we have a well defined notion of gender (G), this could be modeled in a DAG by having an instrument (Z) and an outcome (Y) and two causal pathways: Z -> G -> Y and Z -> U -> Y. The problem (by assumption) is we cannot control G directly and so have to rely on Z. Then, to identify the effect of gender you need to control for U, but if U is either unmeasured or an “unknown unknown” (to the researcher) then no identification is possible conditional on this simple model and assumptions.

        • Fernando:

          I think we’re both speculating here, but . . . my impression is that if an employer is being sued for discrimination, the claim is that the employer should have been doing something different. That is the potential intervention I’m talking about: the “something different” the employer should’ve done. I see what you’re getting at with your graphs, and I agree that all this is relevant to the question, but I’d think that in this hypothetical lawsuit, I’d be interested in what might have happened had the employer done something different somewhere along the way.

        • Andrew:

          You might react to my previous comment by arguing that we never have a good (a priori) definition for gender, and so our definition should be based on what we can manipulate. E.g. “gender as specified on a CV”.

          I think that is a valid criticism. But then of course the question becomes whether reactivity to gender as specified in a CV is what we normally consider “gender discrimination”.

          One could go on forever, but this is why I think concepts are so important for causal inference. Indeed, I think this is related to your frustration with the Bolivia paper and so on.

        • Andrew:

          I think your way of thinking about it is interesting. I was thinking in terms of diagnosis (e.g. is gender causing salary). You are thinking about it (I think) as a cure (e.g. if employers have to delete all gender-related information from a CV before considering it, do we still get a bi-modal distribution in salary offers?). I think that is an interesting approach. We might not be able to manipulate G, but we might get rid of it.

        • I think the point of the person you are replying to is legitimate.

          Did you read the paper, and the critique? I assume you did not, because the fact that they may have made an inappropriate presentation of their regression results. Of course whenever scientists make claims like that about regressions they seldom mean it is as simple as just adding genetic diversity. But we could scrutinize your own hasty mistakes: Ashraf and Galor do mention that the addition of eurasian colonial genes pushed diversity toward the optimum. Yet what even makes you assume it means “white people” instead of Spanish or Portugese colonialists which many were? Are they “white” to you? So is your critique equally invqlid as their paper, now, over a silly presentation mistake??

          You have admitted in the past that you don’t exactly understand Instrumental Variabke rwgression. Is that why you limit your analysis of this paper to (in my opinion) the mundane issues

        • Sorry i am posting from my phone which is frustrating. I got cut off before i was done.

          …mundane issues, such as how appropriate a sentence long thought experiment was?

          What about their actual findings? They present some pretty atrong conclusions from thwir analysis. Do you share Jade’s(the author of the critique) opinion about their data? I thought the critique was weak sauce; they apparently had no idea what instrumental variable regression is because they offered no valid methodological critique.

          In fact it looks more like Jade Guedes et al.’s issue is twofold; they don’t feel. comfortable with the topic and implications. They also dislike the data, but have little to say about it other than Ashraf and Galor needed to talk to more of THEIR peopke, because to them anthropologists have monopoly rights over yhis research agenda. They didn’t point to any BETTER data that the authirs could have exploited. Perhaps there is none, but then that doesnt seem like a reason to not go through with the research anyway. Most research on North Korea has terrible data but the name of th game is “constrained efficiency.

          Overall I would have expected concise observations about their methodology and data from you, rather than a sardonic comment about their thought experiment and a glib remark about their intentions.

          What are your thoughts on the actual methodology? If it needs improvement exactly how would you improve it as a researcher? What about the data? If the data is an issue how can they work with it to reafh better conclusions?

          Are there no critiques of the sort and you also simply dislike the implications of what they have to say as Guedes et al. appear to?

        • The methodology is the most obvious problem with the paper. From Figure 1, it’s clear that the “instrument” of migratory distance might as well be a continuous indicator for distinct geographic regions. This blatantly violates the exclusion restriction, it’s not a valid instrument.

        • Chris:

          1. Yes, of course I consider Spanish or Portugese colonialists as white people. Don’t you?

          2. Of course i understand instrumental variables regression. We discuss it in chapter 10 of our book. What would make you think I’ve ever written that I don’t understand the method?

        • I’m agreeing that in _some_ cases it is not necessary.

          But is it ever necessary? I’d say there are cases where the answer is yes. The reason it is necessary is that some causal effects are dependent on the state-space path of the system to change the state of the causal factor. When the causal effect is not path-dependent (as in the case of your gravity example), then specifying an intervention is unnecessary. However, if it is path dependent, the issue is that there are heterogeneous causal effects, depending on the path taken to change the state.

          For example, suppose I am interested in the causal effect of journeying to the top of a mountain on the risk of death. The experiment is that I have 1000 individuals go to the top of the mountain and measure how many of them die. However, in one experiment, they go up climbing with their bare hands, in the another experiment, they ride up in a gondola. Of course, these two experiments will report different causal effects.

          You might argue that this is an unfair example, that there’s some hidden state that needs to be incorporated into the model of the causal effect (namely, how one gets to the top of the mountain). However, this is the sort of hidden state that confounds causal interpretations. People pick cause and effect variables with tons of hidden state all the time (this study Andrew is discussing is one example). The idea of those arguing for specifying an intervention is that the intervention helps make such hidden state visible.

          There are cases where specifying an intervention is unnecessary, and that is when the causal effect is not path dependent, i.e. any path taken to change the state of the causal factor is associated with the same causal effect.

          Incidentally, I don’t think this is even the biggest problem with this study, but since we’re discussing the intervention controversy I’ll refrain from sidetracking this thread.

        • To use your wording, I’m fine with “simply conveying scientific knowledge” if the causal effect is path independent. If it’s not, the experiment will mistakenly take its causal effect estimate of “a specific intervention” for universal “scientific knowledge”.

        • Rev011,
          From your using the phrase “if the causal effect is path independent”
          I conclude that you are a scientists, that you believe that Nature has
          pathways, and that we can model those pathways by a path diagram.
          I share this perception of Nature, which should make the communication
          between us extremely easy.

          Given a path diagram, and given that we are interested in the causal effect
          of X on Y, where X and Y are two variables represented in the diagram.
          Can you define “path independent causal effect” and how it differs from
          “path independent causal effect”.

          This should clarify much of the discussion on this topic.

        • Andrew,
          I think you are confusing the “potential outcome framework” with “counterfactuals”.
          Counterfactuals deal indeed with “what an employer would have done differently
          had the applicant been a male”, and the formal definition of that sentence (the one
          I use) DOES NOT involve
          a description of the intervention by which the applicant would change its gender,
          or the appearance of its gender.
          The “potential outcome framework” however, insists on including a description of such
          intervention in the definition of ALL counterfactuals. This insistence is stifling.

          Your heroes, Rubin, Angrist and Imbens, are very adamant about manipulability
          of the antecedant of the counterfactual. So adamant in fact, that they refuse meaning
          to any counterfactual Y_x unless X is manipulable.
          This refusal is so stifling, that it led to blunders in the form
          of paradoxical theories of mediation.
          Please read some of this paradoxical literature. For example,

          or Rubin (2004,2005) cited in the paper above.

          And please let us make a clear distinction between the “potenial outcome framework”
          in which counterfactuals are narrowly defined relative to specific interventions
          and the structural definition of counterfactuals, defined in terms of the response
          of Y to changes in X, regardless of how X was caused to change.
          See chapter 7 of Causality where such a formal definition is given

        • Judea:

          Good point. I myself was confused when I talked about potential outcomes and potential interventions, as opposed to counterfactuals and potential outcomes.

        • Judea:

          In your example, you first referred to a “sex change,” and now you are talking about “what an employer would have done differently had the applicant been a male.” But neither of these is the potential intervention that I was talking about! I was talking about “different possible actions by the employer,” by which I meant different actions or different policies, conditional on the employer’s inputs. I was not talking about the employer’s actions given different inputs.

          One of the benefits to me of the Neyman/Rubin framework is that it allows me to see these things clearly. Again, I respect that you don’t like this approach, but I find it extremely useful to frame causal questions in terms of potential interventions rather than just thinking abstractly about changes in X or whatever.

        • @judea Yes I can define this more precisely, but it’s a bit long to write out here. I’ll email you when I have a few moments to write my thoughts down.

        • Andrew,
          I can’t see the difference between the two sentences you contrast:
          (quoting) “I (Gelman) meant different actions or different policies, conditional on the employer’s inputs. I (Gelman) was not talking about the employer’s actions given different inputs.”
          Perhaps you can do us the favor and express these two
          sentences in formal counterfactual notation, so we can see the difference.

          More importantly, I think you have not internalized the distinction
          between “potential outcomes” as a restrictive framework and “counterfactuals” as a general mathematical relation, a distinction made several postings ago,
          backed by examples and citations.

          Let me repeat it then, for the sake of new bloggers who have not
          seen it before and who might be confused by your comments.

          The “potential outcome framework” insists on tying every
          hypothetical sentence to a specific intevention, and denies meaning
          (I repeat: denies meaning!!) to any counterfactual (e.g., Y_x)
          whose antecedant X=x is a proposition, not an intervention (like
          “The applicant would have been hired had she been a male”)

          In contrast, the structural framework permits the antecedant X=x
          of a counterfactual to be either a proposition (e.g., the applicant being a male) or
          an intervention (e.g., executing plan #27, instead of #35,
          by fining the employer $12,000)
          Both are well-define mathematically in the structural model.

          Admittedly, I find this conversation a bit strange. I repeat defining this
          distinction time after time and you do not refute it, but you still repeat your
          traditional preferences (quoting):”I (Gelman) find it extremely useful to frame causal questions in terms of potential interventions rather than just thinking abstractly about changes in X or whatever.” as if anyone in the structural camp
          prevents you from doing any framing you wish.

          I don’t understand why you speak so contemptuously at the more flexible framework
          which you put down as: “just thinking abstractly about changes in X or whatever”.
          If you truly cherish the option of “framing causal questions in terms of potential interventions” you should doubly cherish a more
          flexible framework that does not FORCE you to think about
          interventions when it is counterproductive, but ALLOWS you to frame
          things either in terms of “potential interventions” or in terms
          of event-dependent counterfactuals, e.g., “had the applicant been a male”.

          FYI, courts of law define hiring discrimination the counterfactual way;
          can you offer them an alternative definition in terms of “potential interventions”?

          I don’t see what drives you to prefer an inflexible
          framework over a flexible one, and I am sure many
          readers on this blog do not understand how the “potential
          outcome” framework continues to ignore the conversation
          we are having here today and survive despite this glaring

        • Judea:

          In this hypothetical sex discrimination lawsuit, I am assuming that the employer is being sued, hence I see the lawsuit claiming that, had the employer followed different policies, the outcomes (in terms of hiring, job conditions, etc.) would have been different. Thus, the different scenarios I am considering would be (a) employer follows the policy that he or she actually followed, or (b) employer follows a different policy (which lawsuit claims would lead to more equitable outcomes).

          You suggested an analysis of what the employer would have done with a different job applicant (a man instead of a woman). That might be interesting to know too (although I don’t think it’s so well defined, given that there may already have been many applicants of both sexes for the given job). But to me, your suggested analysis does not directly answer the question. To me, the question of the lawsuit is more directly answered by addressing the claim that a better outcome might had come, had the employer followed a different policy.

          Recall, this particular subthread arose because you claimed that, in this case, “specifying an intervention is not only unhelpful, but diversionary and stifling. Imagine a defendant in a sex discrimination trial insisting on knowing what kind of sex change intervention the plaintiff has in mind when she/he complains about the ‘effect of gender on hiring.'” My response, to which I still hold, is that considering different possible actions of the employer, is helpful—and I never suggested sex changes.

          That said, I agree that the term “intervention” might be misleading and even unhelpful here! That is, I find the Neyman/Rubin framework helpful, but not the word “intervention.” What I’m considering is potential different actions or policies by the employer. But these are not necessarily “interventions” imposed by the outside. (Let me repeat that I do not hold with the claim “no causation without intervention” or manipulation or whatever, so please don’t hang that one on me.)

          To summarize: In this as many other examples, I find it extremely useful to think as specifically as possible about the potential-outcome world being modeled. However, I agree with you (Judea) that the concept of “intervention” or “manipulation” can mislead in that it implies that in any problem there are some special variables that can be manipulated and some others that can only be observed. In your framework, every variable can be directly manipulated. In a network of electronic components and circuits, I think that can make sense, but in social science I don’t think so. (For example, I don;t think it means much to talk about altering “the genetic diversity of Bolivia” without saying how it might be done: different interventions to alter that genetic diversity could have much different outcomes.) But it’s turtles all the way down, and at some point we do have to consider potential outcomes that are not the product of any outside intervention or manipulation.

  13. Andrew. It is amusing that a statistician would use (even for rhetorical purposes) the argument of anthropologists who failed to understand basic statistical methodology.

    As Ashraf and Galor write: “On the conceptual front, our critics have raised several concerns. They challenge our findings that diversity can be beneficial for innovative activity, stating in their letter that our hypothesis must be fundamentally flawed because it “implies that the Maya and Aztecs should not have been able to achieve high population densities because of their low genetic diversity.” Such an inference is based on a misunderstanding of basic empirical methodology. In simple terms, it is equivalent to suggesting that if we were to observe a 100-year-old person who smokes then research that concludes that smoking is harmful must be flawed.”

    Surely you can relate to my concern, right? You must be laughing out load, as I did when I read the letter of the anthropologists, and ask yourself what is the level of training that is needed to become an anthropologist these days? And what is the future of the academia if these is the level of professors at Harvard?

    As the posted by the very prominent anthropologist, Henry Harpending in Evolving Economics (October 11-12, 2012)

    “First, on the Harvard side, most of the authors are from the anthropology department there. This suggests immediately that we should not pay a lot of attention to them, and indeed their arguments do not seem very solid. The response by the authors of the original paper does a good job of taking them down.”

    “Much of anthropology is not to be taken seriously because of just this failure to distinguish between what is true and what is desirable.”

    • Evy:

      I already gave my problems with the paper in the post above. One of your responses is that I must be wrong because the paper was read by 5 referees and presented at 50 universities. I don’t find that convincing. 5 referees can all be wrong. In addition, journals sometimes like to publish dubious but controversial claims on the theory that they deserve a hearing. And, for that matter, I agree that the claims in the paper under discussion deserve a hearing, in fact I think there’s a lot of interesting stuff there. I just think they go too far and overstate their claims. I hate to have to be the one to tell you this, but journals—even top journals—publish papers with errors, and they often publish papers that overstate their claims. We have discussed many such papers—including some from economics journals—in this blog over the years.

      I pointed out that several Harvard anthropology professors wrote a critical article. Your response is that anthropologist Henry Harpending from the University of Utah says not to trust anthropologists. This is like a version of the liar’s paradox: I’m not allowed to trust anthropologists because an anthropologist says they can’t be trusted.

      Bottom line: I respect that this paper was published in a top journal. I also respect that several researchers thought the paper was no good. I looked at the paper myself and saw some interesting things and also some serious problems. Commenters on this blog (see above thread) noted other serous problems, most notably in the genetic measures. Papers with serious problems get published all the time. This is one of them.

      Also, now I’ve learned that there’s an anthropologist from Utah who doesn’t like the anthropologists at Harvard.

    • Evy, what I said above does not depend to any degree on what these antropologists said! The most devastating of the critique against these economists is, what others have said above, that the specific type of genetic diversity measured by the data they have used is probably irrelevant. To repeat: they have used diversity (expected heterozygozity) as measured by meaningless DNA, “junk DNA”. This is used, (whemn it is proper to use it) because it is not under selection pressure, so can give information about an remote past. But it does not say __anything__ about diversity in meaningful genes, such as lactose tolerance. So it is simply not the right sort of data for what these authors wants!

      But I am really seeing all these frok another angle: I do really know something about Bolivia! and in this particular discussion that seems relevant. On the face of it, to name Bolivia as the most homogeneous country in the world do not make much sense. And, if some intervention could make Bolivia more homogemeous, these guys says that should make Bolivia prosper. Maybe. But it is the diversity itself which shoud have that effect! Now, to some examples. The part of Bolivia growing fastest by now is the lowlands Santa Cruz department. They have the last 50 years have a lot of influx of people from a lot of places, first and foremost from the Bolivian highland regions. But a very important migratory influx have been from Croatia. In a recent newspaper listing of “most important people of Santa Cruz”, exactly half had names from Croatia! So, the question: do the growth come from the knowledge (and capital) of the migrants? then we would expect what I just listed. Or do it come from “diversity” itself? Then a bigger part of the growth should come from other groups also! That doesn’t seem to happen. Same argument could be repeated with other groups: Japanese farmers in Santa Cruz (came in the years just after WWII when Japan was poor), maybe the most successful agricultural group there. Again, if the growth coame from diversity itself, and the the knowledge and other cutural traits the migrants are taking with them, then the success should be seen on their neighbours from other groups as well! But that is not seen.

  14. How diversity in “junk DNA” can affect innovations and trust? . Here is the relevant paragraph in the paper about it. The argument is so simple but it appears that you did not bother to read the paper before posting.

    It is relevant to note that the expected heterozygosity measure in the sample of 53
    HGDP-CEPH ethnic groups is based on microsattelites, i.e., DNA loci in nonprotein-
    coding regions of the human genome that do not directly result in phenotypic
    expression. Therefore, this measure of observed genetic diversity has the advantage
    of not being confounded by the forces of natural selection that may have operated
    on these populations since their prehistoric exodus from Africa. Importantly, however,
    the effects associated with heterozygosity in microsattelites capture the effects
    of diversity in phenotypically-expressed genomic material since the serial-founder
    effect, associated with the “out of Africa” migration process, is indeed reflected in
    other dimensions of within-group diversity, including diversity in various craniometric
    traits (Manica et al. Nature 2007).

    • “DNA loci in nonprotein-coding regions of the human genome that do not directly result in phenotypic expression. Therefore, this measure of observed genetic diversity has the advantage of not being confounded by the forces of natural selection …”

      Wrong. non-coding regions are not expressed as proteins, but they have tremendous phenotypic consequences affecting natural selection. Even laypeople reading the NYT have gotten past this ridiculously incorrect understanding –

      ENCODE is just the final nail in the coffin, things like regulatory consequences of non-coding regions have been known for decades.

      • Reading 101

        “DNA loci in nonprotein-coding regions of the human genome that do not DIRECTLY result in phenotypic expression.”

        “Regulatory consequences of non-coding regions” is an INDIRECT efect.

        • If you accept that junk dna has substantial indirect phenotypic consequences, then you’d also have to agree that the following statement is patently false “this measure of observed genetic diversity has the advantage of not being confounded by the forces of natural selection”.

        • READING 102

          READING 102


          I am glad that you conceded that you have misread the first sentence. That’s progress, I suppose, after your shallow bombastic statement.

          But disappointingly you failed, yet again, in elementary reading.

          1. There is no evidence about SUBSTANTIAL indirect phenotypic consequences”.

          2. “confounded”=”confused” (e.g., Webster dictionary).

          Therefore as the authors proposed “this measure of observed genetic diversity has the advantage of not being confounded by the forces of natural selection.”

          I hope Reading 103 will no be necessary.

        • But that’s the point – there are “SUBSTANTIAL” indirect phenotypic consequences. This has been known for a long time.

  15. LOGIC 101

    The presence of an effect of heterozygosity in microsattelites on phenotipic expression would simply reinforce the thesis. As I quoted above:
    “Importantly, however, the effects associated with heterozygosity in microsattelites capture the effects of diversity in phenotypically-expressed genomic material since the serial-founder effect, associated with the “out of Africa” migration process, is indeed reflected in other dimensions of within-group diversity, including diversity in various craniometric traits (Manica et al. Nature 2007).”

      • Then where does that leave your comment? It hardly sounds like a mature reaction to me; i rarely see grown adults trying to win arguments by calling their opponents high schoolers and implying they are biased?

      • Andrew,
        Thanks for clarifying what interventions you find useful in thinking about
        hiring discrimination.
        Quoting: the different scenarios I [Gelman] am considering would be (a) employer follows the policy that he or she actually followed, or (b) employer follows a different policy (which lawsuit claims would lead to more equitable outcomes).
        A comparison between these two strategies is indeed what the structural formalism
        facilitates, even from observational studies, after establishing the science behind the problem,
        namely, the causal connections that operate between the variables X (gender) Y(hiring) Z(qualifications)
        and more. This science can be expressed in counterfactual language, using Y_x, Y_xz , Z_x and more.

        But when I posed these counterfactuals as a research question:
        “what the employer would have done with a different job applicant (a man instead of a woman).”
        your reaction was (quoting):
        ” to me [Gelman], your suggested analysis does not directly answer the question. To me, the question of the lawsuit is more directly answered by addressing the claim that a better outcome might had come, had the employer followed a different policy.”

        Not really. Discrimination lawsuits are usually brought by individuals about individuals.
        Here is a court opinion on what constitutes discrimination:
        “The central question in any employment-discrimination case is whether the employer would have taken the same action had the employee been of a diffErent race (age, sex, religion, national origin etc.) and everything else had been the same.” (In Carson versus Bethlehem Steel Corp. 70 FEP Cases 921, 7th Cir. (1996)). (See discussion of direct effect, Causality, chapter 4)

        But we are not here to deal with discrimination lawsuits. The methodological lessons from this
        discussion are:
        1. Counterfactuals such as Y_x, Y_xz, Y_{x,Z_x’} … are often
        the important research questions, even though (please take note!)
        the antecedants ( x, xz, etc) are NOT interventions, but events
        (e.g., X = male, Z = highly qualified).

        2. The policy implications of the science (e.g., what equity would be
        achieved by eliminating hiring discrimination, or eliminating qualification
        disparities), important as they are in practice, need not be considered
        while we unveil the science behind the problem.

        3. Going back to where we started, “the effect of genetic diversity
        on economic development” can be studied as a scientific question
        without insisting on “how are you going to intervene and increase
        genetic diversity in Bolivia”, in the same way that we study “the effect
        of gender on hiring decisions” without asking “how are you going to intervene
        and change the antecedant, namely gender.

        4. It is misleading to suggest (quoting):
        “In your [Pearl’s] framework, every variable can be directly manipulated”
        So far, I have heard this allegation only from people who never used my framework,
        and they say it to absolve themselves from the guilt of not trying. Indeed, every
        child knows that not all variables can be directly manipulated, therefore,
        Pearl’s framework must be flawed from its basic core, and must be unworthy
        of anyone’s attention. Beautiful excuse, but false logic.

        Why? Because the structural framework does not assume that every
        variable can be directly manipulated in the physical world.
        What it does assert is that every counterfactual Y_x is WELL DEFINED
        within the model, whether or not X is physically manipulable. It also claims
        that the mathematical definition of Y_x invokes a SYMBOLIC operation the SIMULATES
        a hypothetical intervention on X and that, in the event that X is directly manipulable
        physically, Y_x can be used to predict the outcome of the manipulation do(X=x).

        For example, every causal effect E(Y_1 – Y_0) that the potential outcome
        framework is proud to estimate, is also estimable using the structural framework
        (using the definition above), with the additional advantage of making the assumptions
        and their testable implications transparent and meaningful. In addition, any
        causal effects E(Y_1 – Y_0) that the potential outcome
        framework refuses to estimate (because the antecedants X=1 and X=0 are non-manipulable,
        as in the effect of gender on hiring) can also be estimated (if identified) with
        no trepidation and no hangups.

  16. Andrew:
    If the research question is on y_x but you prefer to work on y_z (z could be some kind of an instrument for x) b’se that makes you think better about potential outcomes (the framework you are more comfortable with) wouldn’t it be informative to check for any testable implications when assuming that y_x implies y_z? If yes, why you think that SEM language wouldn’t be helpful here to express such knowledge?

    • CK, 

      I think you will be happy to know that the problem you posted has an elegant solution using the structural framework.

      If the research question is E(Y_x) and we can manipulate some Z, not X, it is possible to decide whether E(Y_x) can be estimated, bias-free, from the experimental distribution P(Y_z = y), the causal diagram, and the non-experimental distribution P(x,y,z).

      The following paper ( gives a complete characterization (if and only if)  for this problem in the non-parametric setting (linear, non-linear).  Note that instrumental variables do not identify E(Y_x) in the nonparametric case.

    • OK, I couldn’t resist…

      I took a quick look. First, I was interested to discover via Ashraf and Galor that Senegal had higher GDP per capita than did the United States in 2000! Look at their Figure 5, where the y axis is “Log Income per capita in 2000”. You read the details, and you find out that the y-axis is actually some kind of fitted values… The title and first sentence of the caption are deeply misleading.

      Ashraf and Galor try omitting sub-Saharan Africa, but I’d rather seem them control for it. This kind of omission is deeply suspicious. Similarly, I’d like to see whether the hypothesis survives continent/region dummies.

      Lastly, they mostly include “predicted diversity” as their central variable to test. But this variable is a purely geographical construct, as I understand it. Why don’t they just use actual diversity? I didn’t go fishing for their explanation, but this suggests that their regressions, if valid, say something about having intermediate geographic distance from East Africa being important rather than anything about genetics. A nice text of their theory is using the residuals and seeing what they explain.

      I’d type up a rebuttal myself, but Galor is the vain editor of JEG, a highly ranked journal, and I’d like to publish there. So instead I’ll merely cite him approvingly and continue with research with a more obsequious bent toward senior researchers.

      • Actual diversity isn’t used because he’s using an instrumental variables framework. The problem is, you only have a valid argument if you actually have an instrumental variable.

        I’ve been avoiding jumping into this but it’s kind of bugging me that in _all_ this discussion, there’s been very little discussion of whether the “migratory distance” is actually a valid instrument. From their own graphs, migratory distance is almost perfectly correlated with continents (it looks like if you controlled for continent, there wouldn’t be much of a trend at all). How can a variable that’s interchangable with a continent indicator possibly be considered exogenous?

        It doesn’t matter how many pages of robustness checks you do if you claim to do IV and don’t have a valid instrument.

  17. One very quivk and very short comment. Figure 5, like all other figures are scatter plots! Therefore one cannot infer what is the GDP per capita in the US and Senegal, and certainly not that the actual GDP of Senegal is higher than taht of the US! What one can see in the scatter plot is the relative position of countries after controlling for all the controls the authors mention and continental fixed effects.
    As to teh paper, i find it excellent, intriguing and incredibly robust.

  18. Pingback: A week of links - Evolving Economics

Comments are closed.