This is a curious statement, because the problem described of N=35 countries vs. N=3 countries in the example is exactly one of spatial autocorrelation. There are multiple ways to deal with that, and I guess it is theoretically possible that you do not have enough variation between groups to get a precise estimate of whatever quantity you are interested in. Nevertheless, as McElreath noted, the spatial distances can be defined on any variety of differences/similarities, including multidimensional similarity.

I am not sure the analogous time series example is exactly analogous, as what is described there is something of a temporal aggregation problem. There is of course a similar modifiable areal unit problem, but this should probably be thought of a distinct problem.

]]>Likewise, with those non-causal associations between US states in different regions — the associations are *real*, if perhaps uninteresting, so it isn’t a problem that we can conclude them with more confidence by considering more states per region. If they’re dominated by inter-region trends, then it would be inadvisable to apply them within a region (there could be a Simpson’s paradox situation). But that’s just a matter of knowing what your numbers mean, and what they don’t mean.

I think it is more informative to think of these sorts of problems as problems of unobserved confounder variables (here ‘culture’) rather than as small N. This encourages us to think about exactly what cultural variables might be important, how they might work and whether we can measure them. Thinking about these theoretical issues is likely to be more useful than focusing on questions of statistical inference.

I suspect Baron might reply to this statement by saying that thinking about unobserved confounders is about causality – and he is just interested in associations. My response is that if we are interested in whether the observed association will extend to some sort of broader population then these counfounders will influence this inference if they are different in the broader population than in the observed sample. For example, when we include East Asia in the sample.

]]>In any case, a zero sum outcome measure like the ratio of female/male high performers is not nearly as relevant an outcome measure as say the percentage of girls/boys that are high performers.

]]>There is also the issue of spatial and temporal scales. When we are binning data spatially or temporally we are free to use very small or large bins. The size of the bins seems related to the amount of uncertainty we have in each observation. For example, if we estimate median household income at the state level then we should have much less uncertainty than estimating the value at the county or zip code level. Similarly, household spending each year is less variable than monthly or daily spending. We can consider a trade-off between lots of data points each with high uncertainty or a few data points each with low uncertainty. I think if we can incorporate this uncertainty in each observation into our modeling process then it might in some sense level the playing field.

]]>Yup. A multilevel regression model is still a regression model, and, like any regression model, it can be improved if there is available external information that has not been included in the predictors yet.

]]>This Michael Porter??

]]>Using a MLM strikes me as not being as efficient as possible: we know these similarities decay with distance, whether temporal, physical, or genetic, and we can even construct the phylogenetic trees. Just lumping the countries together in a cluster ignores these differences which can be large or small, and provide purchase for regression on the residuals from what would be expected from their autocorrelation. (Iceland should not be considered as similar to Sweden as, say, Finland.) The autocorrelation should be modeled as directly as possible.

]]>I don’t think there are any good statistical models for how nations form (?), but another way to maybe accommodate spatial autocorrelation could be through gaussian process regression, with covariance matrix informed by pairwise geographic distances (through e.g. waypoints). Statistical Rethinking 13.4.1 has a nice walk through of how to fit this sort of model in Stan.

A Chinese Restaurant model could also be used to average over possible partitions of the countries, if one is uncertain about clustering and unwilling to assume that relatedness between countries is influenced much by their geographic closeness.

]]>“Within each country, societal issues related to gender roles result in varying degrees of economic distinctions between genders, this leads to differences in expectation of life-trajectories, and that leads to teachers having different expectations for child achievement. The result is that teachers teach children different material or spend different amounts of time explaining different things to boys vs girls. In addition there may be some inherent differences in the mean or variance of inherent talent among the populations of boys and girls, and a difference in what each child perceives as valuable and worth spending time on or what is interesting. These feedback effects develop through time as children who achieve at something tend to do more of that thing and less of another thing. The net result is through time as children age there is a widening gender gap in achievement among various topics”

Or whatever, that’s just some stylized idea of what people might think. But suppose it is what you think…Now start encoding that into a mathematical model.

mathematically we have several different academic/school related topics, perhaps math, language, sports, music, etc. We have in each country some attitude about whether each topic is “more male” or “more female”, we have associated effort and encouragement by society for each gender, we have children’s perceived gender role and level of interest, we have etc etc etc. These are the parameters we use to describe the process.

Next we need the process description:

rate of improvement in skills related to topic X for each child is functionally related to the inputs that go into skill development, including encouragement, individual child interest, time spent by the child, availability of instruction in the topic… And country or societal level parameters determine some of the encouragement, and some of the availability…. and society level parameters are similar within groups of countries…. and then across the world different groups of countries have some similarities as well…

In the end you’ll have a large model for worldwide educational variation across multiple topics in which there are thousands of parameters you are uncertain about. This is *the reality* of the problem. Now, because of these thousands of parameters, you’ll want to look for sources of data which can inform the quantities of interest: surveys of children’s interest, datasets on teacher populations: age, gender, subject they teach.. Data on spending in each country, time-series data tracking individual children’s achievement, time series data across different eras… whatever, each source of data is something you can potentially use to constrain the parameters within a given country, and thereby also constrain parameters within neighboring/similar countries, and thereby constrain parameters within continents… etc etc. But data won’t be uniformly available in all locations for all topics. So you’re going to have to work to provide reasonably well thought out priors for your parameters.

Next you’ll say: gee that’s all well and good, but I don’t have any of that information right now, and I do have this one great dataset with 18 data points in it across 3 countries, how can I make progress so that I can get grants and tenure? That’s a huge hard problem you’ve just described, I’d much rather just grab some dataset and calculate some p values…

And now we know why so little progress is made, because we’ve *institutionalized* non-science as if it were in fact the pinnacle of scientific achievement: knowledge from pretending everything comes out of random number generators and reified the idea that without really thinking about how things work very much we can just grab some small datasets and pretend that the data comes out of random number generators, and check to see if we can mathematically detect differences between RNG A and RNG B.

]]>In terms of model division, I saw an op-ed by Michael Porter yesterday that cited a ‘rigorous’ social index. Is such a thing possible? I’d say no but when he cherry picks murder rate, then I know that’s a bullshit op-ed. People treat these issues as though they’re epidemiology, as though ‘determining’ penetration rates in specific populations of virulent diseases with relatively known infection rates – depending on exposure modeling, etc. – is the same thing as taking something buried way below the surface, like some measure of genetic diversity, and applying that to a population. You can see a rough connection: some groups are perhaps more prone to certain infections given certain other factors, like the way it appears HIV spreads in Africa depending on rates of already existing infections (meaning some of the research shows a form of opportunism). But in general? Treading on Wansink territory.

]]>