What does it take, or should it take, for an empirical social science study to be convincing?

A frequent correspondent sends along a link to a recently published research article and writes:

I saw this paper on a social media site and it seems relevant given your post on the relative importance of social science research. At first, I thought it was an ingenious natural experiment, but the more I looked at it, the more questions I had. They sure put a lot of work into this, though, evidence of the subject’s importance.

I’m actually not sure how bad the work is, given that I haven’t spent much time with it. But the p values are a bit overdone (understatement there). And, for all the p-values they provide, I thought it was interesting that they never mention the R-squared from any of the models. I appreciate the lack of information the R-squared would provide, but I am always interested to know if it is 0.05 or 0.70. Not a mention. They do, however, find fairly large effects – a bit too large to be believable I think.

I didn’t have time to look into this one so I won’t actually link to the linked paper; instead I’ll give some general reactions.

There’s something about that sort of study that rubs me the wrong way and gives me skepticism, but, as my correspondent says, the topic is important so it makes sense to study it. My usual reaction to such studies is that I want to see the trail of breadcrumbs, starting from time series plots of local and aggregate data and leading to the conclusions. Just seeing the regression results isn’t enough for me, no matter how many robustness studies are attached to it. Again, this does not mean that the conclusions are wrong or even that there’s anything wrong with the researchers are doing; I just think that the intermediate steps are required to be able to make sense of this sort of analysis of limited historical data.

12 thoughts on “What does it take, or should it take, for an empirical social science study to be convincing?

    • I also thought there would be a link. But the wording from the correspondent looked familiar to me – I sent it to Andrew! 2 years ago! In what I sent him, there was this link: https://link.springer.com/article/10.1007/s10940-021-09497-7. However, I know I didn’t spend $40 to read the article, but I must have seen most, if not all, of the content. 2 years later, I can’t even recall whether this was, in fact, the article I was referring to. Also, I cited a “social media site” but the reference was on Marginal Revolution and it cited the published Springer article. Since I don’t look at social media unless directed there from something I read elsewhere, this whole exchange is puzzling. Even a 2 year lag for Andrew seems out of the ordinary.

      • MIssed that, sorry. I looked at the paper but it looked like too much of a time investment for me as well. I mostly just wanted to see something like pages of p-values after every sentence, but didn’t get to that section I guess.

  1. “What does it take, or should it take, for an empirical social science study to be convincing?”

    In general: Irrefutable evidence. If any aspect of the paper is questionable, then it’s not convincing. P values on their own, or, in many cases, even *any kind* of statistical data alone from a single analysis or sample, will almost never be enough to be convincing.

    But the question is as indicative of the social science disease as the papers themselves. The question is why do we expect a single paper to be convincing? That’s not the way it works in natural sciences. There’s so much that can go wrong in a single analysis that a single paper almost never does the job. Everything is tested over and over again in varying conditions.

    Even if a paper is *mostly* convincing, it’s conclusions are tested again and again as other people attempt to build on the research. If extending the work fails, then the original work gets scrutiny again. The problem in social science is that people want to go policy bezerk on a single instance of flimsy evidence.

    I can only imagine where SpaceX would be if it used the social science approach for building reusable rockets. Probably out of business from blowing huge amounts of cash testing weak hypothesis that failed.

    • While I agree in spirit with what you are saying, I think it is unrealistic to hold social science to those standards so broadly. Many social science questions – ones with important policy implications – simply do not permit the kind of experimentation and repeated analysis you are suggesting. To take but one concrete example: does removal of entry restrictions in telecommunications markets increase or decrease investment (and then, where and what types of investment are affected)? This is an area I’ve done some work in. There is observational data – quite limited in that each country has quite different conditions, and the differences between states is limited. Experiments are not feasible – at least not in the time frame that could be of value. Even if experiments were run and conclusions established, the world will have change by the time policy would be impacted, undermining those conclusions.

      What we are left with is competing attempts to analyze the existing inadequate data in different ways. Your skepticism about any single study is well-founded and I share it. However, the characterization that “Everything is tested over and over again in varying conditions” is far from achievable for many important policy questions. One unfortunate consequence is that the evidence (limited though it is) is often ignored and people rely on broad philosophical positions instead: either all regulation is bad and should be avoided, or these are natural monopolies and entry needs to be restricted. The evidence is certainly inadequate, but that does not make it worthless. This does not excuse analysis that is poorly done, sample data that is not representative of the population inferences apply to, or ulterior motives of researchers to push a particular agenda. Protecting against these practices usually requires some additional attempts at replication (in some sense). But holding up the standard from the natural sciences is just unrealistic for many of these questions.

      • Dale:

        Sure, in the physical world or “natural” science world there is also frequently a need to make decisions with limited data. I’m referring to research science more rather than to practical applications, but see below.

        But two thoughts to add to that:

        First, just because a decision needs to be made with poor data doesn’t mean we have to be “convinced” by the poor data. Decisions should be scaled and implemented in a way that reflects the quality of the information that supports them.

        Second, it doesn’t do any good to fulfill a “need” with a solution that doesn’t work. The problem in social science is that questionable research seems to cross into practical applications or be touted for them before its soundness is demonstrated. Then it fails. Then what’s the cost benefit? Implemented without thorough vetting resulting in failure is usually a pretty high cost with pretty low benefit, so whether it was “needed” in the first place becomes the operative question.

        But mostly I agree.

Leave a Reply

Your email address will not be published. Required fields are marked *