The AAA Tranche of Subprime Science

In our new ethics column for Chance, Eric Loken and I write about our current favorite topic:

One of our ongoing themes when discussing scientific ethics is the central role of statistics in recognizing and communicating uncer-
tainty. Unfortunately, statistics—and the scientific process more generally—often seems to be used more as a way of laundering uncertainty, processing data until researchers and consumers of research can feel safe acting as if various scientific hypotheses are unquestionably true. . . .

We have in mind an analogy with the notorious AAA-class bonds created during the mid-2000s that led to the subprime mortgage crisis. Lower-quality mortgages—that is, mortgages with high probability of default and, thus, high uncertainty—were packaged and transformed into financial instruments that were (in retrospect, falsely) characterized as low risk. There was a tremendous interest in these securities, not just among the most unscrupulous market manipulators, but in a world where a lot of money was looking for safe investments and investors were willing to believe the ratings agencies and brokers.

Similarly, the concerns about reliability and validity of published results come after years of rapid expansion in the world of scientific output. In published research studies, data of varying quality are thrown together, processed, and analyzed and formed into statistically significant aggregates that are combined into research papers. . . .

How Is a Research Paper Like a Mortgage?

The analogy is anything but exact, but we see two equivalents in the modern scientific process to the aggregation and skimming that led to tranches of mortgages being declared AAA (high-quality) bonds. The first step is statistical significance. Out of the primordial soup of all possible data analyses, the statistically significant comparisons float to the top. . . . he second step is publication in a scientific journal, ideally a high-prestige outlet . . . but, if not at a top journal, any outlet will do. The convention is to treat published claims as true unless demonstrated otherwise.

By analogizing to the mortgage crisis, are we saying all research is over-valued? Not at all. Nor were all subprime mortgages destined to fail. The problem happened when reliable and less reliable components were bundled into a AAA tranche. The analogy might be to believe all papers published in Nature just because they are published in a top journal, or to believe the results of all published medical trials that have randomization in their designs.

Lots more at the link. We were motivated to write this article after reflecting on the defensive reactions of various scientists regarding issues of scientific plausibility and replication; see, for example, here, here and here (the last of which gave rise to the Freshman Principle).

So, yes, we do think that this is a big issue. Here’s how Eric and I conclude our article:

We would be troubled to see a generalized skepticism of science take hold, in which even well-established findings are up for grabs. The reality is that we have to make personal and political decisions about health care, the environment, and economics—to name only a few areas—in the face of uncertainty and variation. It’s exactly because we have a tendency to think more categorically about things as being true or false, there or not there, that we need statistics. Quantitative research is our central tool for understanding variance and uncertainty and should not be used as a way to overstate confidence.

24 thoughts on “The AAA Tranche of Subprime Science

  1. It’s an interesting idea, the tranche, but I think you misidentify it. Nature and Science aren’t the ‘tranche.’ They are venues for heavily reviewed, but very exciting, findings. These two features weigh against each other. The review makes it less risky (it almost certainly does, actually) but the excitement requires that it be somehow risky. Ignoring that the novelty is inherently risky is silly, and commonplace.

    On the other hand, a real ‘tranche’ of science is the metareview, where a number of papers are bundled together and weighed. Some of these are very informal and almost certainly suspect. Just because x% of papers have some finding (particularly in a politically charged field) doesn’t mean much, especially if the papers use related methods and related standards of evidence, are published and reviewed by the same cadre of individual researchers and so on.

    A Cochrane review, on the other hand, is supposed to take the quality of the evidence in each paper, the relative independence of the evidence, and balance these together to assess the strength of the overall conclusion (http://www.cochrane.org/). If the review process fails, however, one could end up with a ‘tranche’ of subprime results; each result is weak and they are all autocorrelated.
    As has been implied many times, a collection of anecdotes does not sum to a randomized controlled trial.

    (At the same time, a big collection of anecdotes may be even better than a single RCT if the anecdotes effectively span a broad enough population and thus generalize more effectively. I had to throw that in.)

    I do agree that statistics is used to whitewash questionable results within individual papers. That’s a large problem.

    The ‘tranche’ effect is another problem, particularly with the vast number of review papers filling new journals. In some hot areas, it has been said that the reviews outnumber the primary literature. This is why I refuse to write vanilla reviews; unless I have something really new to say, I better at least have new data to present. If I have something new to say based on existing evidence, it is probably controversial, so I shouldn’t drag other authors onto my controversial position – I will write as a single author. However, my stance is unusual, in part because it is so much easier to rack up peer-reviewed publications by writing review after review with several co-authors each.

    As a final remark, I have mentioned Cochrane reviews, so I should state clearly – I think that well-done Cochrane-style reviews are very challenging to write and contribute substantially to the clinical literature. I would not want to slander them. They are not typically not done poorly even if they could be done ‘even better’ by digging more deeply into the underlying data (which would require data availability).

    • “The review makes it less risky (it almost certainly does, actually)”

      Ben, Do you have any evidence that peer review is useful? I have looked into it and could not find anything positive.

  2. I’m not sure exactly where people get the notion that senior tranches of subprime didn’t deserve AAA ratings. The AAA ratings for the senior tranches was still, in retrospect, mostly reasonable. For the junior tranches, though, not so much. I realize this is somewhat off-topic, since you’re only using this as a metaphor, but the mathematics and statistics behind the senior tranching, and tranching in particular, are still interesting. See, for example, http://www-2.rotman.utoronto.ca/~hull/downloadablepublications/AAArisk.pdf

    The proper analogy would be, if there are N articles published on a subject, what are the chances that the best of them is pretty good? This will depend not only on N, but on the underlying phenomenon being studies and the independence of the researchers publishing the articles, right? In the normal case, wouldn’t the best of a large lot of papers on a topic be pretty good?

    • Jonathan:

      You ask, “wouldn’t the best of a large lot of papers on a topic be pretty good?” The problem is that, in some subfields, the papers that appear in top journals and get a lot of attention are not the best papers; rather, they’re the papers that make the most dramatic claims.

      • I agree completely, but that’s where the analogy breaks down… the top tranche of a CDO gets the best mortgages ex post, not the best looking ones ex ante, or the ones underwritten by the best originators, or in the best areas, or whatever.

        The job of someone actually covering some subfield for a living is to critically judge the best papers, even if they appear in the best journals, and to pull from obscurity better papers in more obscure journals, not to mention doing good work themselves. Or one can do it as an informed outside observer, at the risk of being rejected for not being in the club. I don’t have a way to actually find the best papers — I’m just asserting that they’re pretty good. In fact, I can assert with great authority that they’re much better than the mean of those with which they’re being compared….

  3. BenK:

    The Cochrane Collaboration is an academic collaboration and so subject to the pressures and incentives and lack of audit type review Andrew raises – they might even be described as a mutual fund.

    > supposed to take the quality of the evidence in each paper, the relative independence of the evidence, and balance these together to assess the strength of the overall conclusion

    They do try, but for instance it’s not quality of evidence but apparent quality in the publish paper and even if one takes apparent as real, no one has yet figure out a credible way to make quality adjustments/allowances – see section 8.8.4 of their handbook or Greenland S, O’Rourke K. On the bias produced by quality scores in meta-analysis, and a hierarchical view of proposed solutions. Biostatistics 2001; 2: 463-471.

    > If the review process fails
    But it will fail if there is a small percentage of studies that miss-represent what happened to patients in (all) the trials and/or how that information was improperly processed and presented in the paper (unfortunately miss-represented studies can have high leverage). That percentage is not well known, but even Cochrane members own research has documented it as worrisome (e.g. how often supposed primary analyses are changed from REB applications to published papers).

    > so much easier to rack up peer-reviewed publications by writing review after review
    Good ones are hard to do and arguably of more value than undertaking new research un-informed of what a good review would suggest about how the new research be carried out.

  4. Bringing up the subprime mortgage crisis makes me wonder about the analogy to the role played credit default swaps? Specifically, is the problem of these ridiculous claims made even worse by some kind of leverage? I think so.

    1. A paper in the tabloids can lead to fame and more grant money. It provides a kind of career leverage. I’ve heard it said you can’t get a good position in bio without a paper in the tabloids, but that’s very limited hearsay.

    2. If it’s something important and not silly like Andrew’s examples, then there will probably be attempts to replicate, often at great expense in money and wasted career time.

    3. A paper in the tabloids is also likely to go through the PR department at a university and then be picked up by the popular press. Like the credit default swaps, this market is even less regulated than the tabloids. Just look at a week’s worth of Google News, subtopic Health to see what I mean. My favorite from just this second is “Smoking ‘pregnant’ mums ‘can turn babies gay'”, which seems to have been picked up by every UK media outlet. But are any pregnant mums going to start or stop smoking in an effort to control their offspring’s behavior? I doubt it.

  5. I’ve tried to develop this analogy too. (Actually, I’m not sure it’s just an analogy. I think the two phenomena, namely, the “laundering” of research results and credit risks respectively, have the same root: the loss of critical perspective in the face of short-term gain. Anyway…) In The Big Short, Michael Lewis points out this interesting problem that people who noticed that something was wrong with bundled mortgage bonds ran into:

    “You could buy them or not buy them but you couldn’t bet explicitly against them; the market for subprime mortages simply had no place for people in it who took a dim view of them. You might know with certainty that the entire mortgage bond market was doomed, but you could do nothing about it.” (p. 29)

    My point is: just consider how difficult it is to publish work that “takes a dim view” of a piece of published research. And how relatively easy it is to publish, ahem, dimwitted research results.

    • Betting against a bubble is all about timing, and typically a sure way to loose money. But at least in markets bubbles always burst. Ultimately fundamentals make pricking the bubble so attractive there is always some gambler willing to take the odds,and the whole thing comes crashing down. (Jacob Frankel had a nice model of chartists versus fundamentalists in currency markets I’ve always liked.)

      In science I am not sure this ever happens. Even if the science is all noise there is typically very little to be gained from pricking a bubble. A better option is to create a new bubble. After all, there is no central bank, or aggregate money constraint, there is no limit to the number of bubbles. If anything journals are forever expanding the bubble supply.

      For all I know we just go from one bubble to another without so much as a real anchor. Arguably much of social science is in a permanent state of “effervescence”.

      • Yes. (See also my comment below to Jeremy Fox.) One of the annoying things about research is that there’s no sense of limited resources, no “ecological” (let alone economic) sensibility. So nobody is properly indignant about all the waste that circulates in the system, all the pollution. Nor is anyone properly concerned about the opportunity costs involved in chasing after exciting but ultimately shoddy conclusions. If every researcher actually had a reason (by which I do mean incentive) to do his or her own work efficiently, and in a way that minimized the chance of errors being widely distributed at results, then there’d be a lot less getting published, and a lot more sense being made. Still, one does not want to stifle creativity, right? So sometimes I don’t know what to think.

        • Thomas:

          I don’t think the solution is to publish less but rather to publish more, a whole lot more. If people did not have to compete for fancy but limited journal space, they would have less incentive to bubble away. Just publish on the basis of the method, not the findings.

    • PS In science the equivalent to a central bank, or a limited money supply, is the scientific method. Being a public good it can be used to measure fundamentals, but not to limit the number of bubbles. Its consumption is non-rival, non-exclusive, and optional…

      • Let me return the compliment, and I’m also in complete agreement with you. There must be a way of making it in people’s interest to identify errors. But the only way to do that may be to make it more risky to make scientific errors, at least in print. I don’t think that would be such a bad thing. More solid, original research and less flighty, derivative research might be a good idea. Less “speculation”, greater focus on “fundamentals”. Maybe.

        But we don’t want researchers to start getting all inhibited about what they say in print either.

  6. Thanks. Fascinating analogy. Here’s something I blogged in 2009 about what was fundamentally wrong with the statistical models used to evaluate tranches of mortgages:

    Felix Salmon has a readable article in Wired called “Recipe for Disaster: The Formula that Killed Wall Street” on David X. Li’s wildly popular 2000 financial economics innovation, the Gaussian copula function, which was used to price mortgage-backed securities by estimating the correlation in Time to Default among different mortgages.

    Li has an actuarial degree (among others), and that appears to have been his downfall: he assumed mortgage defaults were like Time to Death to a life insurance actuary: largely random events that could be modeled.

    Steve Hsu’s website Information Processing has a 2005 WSJ article on Li’s Gaussian Cupola, for looking at events that are mostly independent but have a modest degree of correlation:

    In 1997, nobody knew how to calculate default correlations with any precision. Mr. Li’s solution drew inspiration from a concept in actuarial science known as the “broken heart”: People tend to die faster after the death of a beloved spouse. Some of his colleagues from academia were working on a way to predict this death correlation, something quite useful to companies that sell life insurance and joint annuities.

    “Suddenly I thought that the problem I was trying to solve was exactly like the problem these guys were trying to solve,” says Mr. Li. “Default is like the death of a company, so we should model this the same way we model human life.”

    Uh, maybe, maybe not. There just isn’t much in the field of life insurance where selling more life insurance increases the risk of death. The life insurance companies figured out the basics of moral hazard a long time ago: don’t let people take out insurance policies on their business rivals or their ex-wives to whom they owe alimony. No tontines. Don’t pay out on new policies who die by suicide.

    In contrast, giving somebody a bigger mortgage directly raises the chance of default because they need more money to pay it back. Giving them a bigger mortgage because you are requiring a smaller down payment, in particular, raises the risk of default.

    His colleagues’ work gave him the idea of using copulas: mathematical functions the colleagues had begun applying to actuarial science. Copulas help predict the likelihood of various events occurring when those events depend to some extent on one another. Among the best copulas for bond pools turned out to be one named after Carl Friedrich Gauss, a 19th-century German statistician [among much else].

    The Gaussian distribution (a.k.a., normal distribution or bell curve) works like this: Flip a coin ten times. How many heads did you get? Four. Write it down and do it again. Seven. Do it again. Five. As you keep repeating this flip-a-coin-ten-times experiment, the plot of the number of heads you get each time will slowly turn into a bell curve with a mean/median of five.

    Now, that’s really useful and widely applicable. Processes where you randomly select a sample will tend toward a bell curve distribution.

    But the Housing Bubble didn’t consist of fairly random events that everybody was trying pretty hard to avoid, like with life insurance. Instead, human beings were responding to incentives. The closest actuarial analogy might be the big insurance payouts that fire insurance companies got stuck with in the South Bronx in the 1970s when decayed businesses that were now worth less than their fire insurance payouts developed a statistically implausible tendency to burst into flames in the middle of the night.

    • Mr. Sailer: read the article I linked to above… It’s quite readable and gives the counterpoint to the Salmon article, which overstates the case rather wildly. Copulas don’t assume independence, and the proper modeling of correlation is critical. And the models (for the senior tranches, I stress) didn’t really do that badly. The error came in the lower tranches, and in leverage. Just my two cents.

    • Steve, the bell curve results from a series of independent additive events (to see it most obviously, go play with a quincunx). If there are feedbacks such that event 1 makes event 2 more/less likely (most biological systems) then you should expect a distribution with heavy tails.

  7. I’m not familiar with any financial equivalents of metanalytical review articles _during_ the subprime bubble. Perhaps they were conducted by the handful of individual winners like John Paulson described in Michael Lewis’s The Big Short? A major difference between academia and finance is simply public information vs. private information. Securitization just added more levels of secrecy. In practice, securitization turned out to be secretization, just as in Russia in the 1990s, privatization turned out to be piratization.

    • I published a short story in 2008 about two brothers-in-law who speculate on an exurban McMansion in 2005. The more dominant of the two, who has a tendency toward malapropisms, explains to his novice in-law:

      “In fact, I think I’m going to pick up one of these babies, too, and sell it in six months. We’ll be neighbors! Sort of. The mortgage company get a little snottier about down payments and interest rates when you tell them it’s an investment, so I’ll just check the “owner occupied” box. The broker doesn’t care. He gets his commission, then Countrywise bundles it up with a thousand other mortgages and sells it to Lemon Brothers. The Wall Street rocket scientists call this “secretization” because nobody can figure out what anything’s worth. It’s a secret.

      “Lemon sells shares in the package all around the world. The Sultan of Brunhilde ends up owning a tenth of your mortgage. Do you think the Sultan’s going to drive around Antelope Valley knocking on doors to see if you’re really living there?”

      http://isteve.blogspot.com/2010/03/unreal-estate.html

  8. Would Popper have bristled at analogies between falsifiability and the pre-crash production of AAA debt ratings? Probably not insofar as he would almost certainly have dismissed the comparison as invalid. That debt ratings have historically been paid for by munis and corporates is irrefutable testimony to their compromised status – a practice that is still true today. The implicit question raised by this post is the extent to which academia is similarly compromised – the AAA tranche in the university. The revolving door between the corporation and the university is only one aspect of it. From questionable clinical trials and legitimate challenges to the validity of peer review to research that doesn’t replicate, you don’t have to be a behavioral scientist to know that academics are as susceptible to cognitive biases as the rest of us — overconfidence, herding, group think that suppresses skepticism and debate — this is the trench warfare that characterizes the scientific process. Bees in their hive location decision-making process do a better job of it.

    Corporations can survive and profit because they standardize the monetization of virtually every aspect of their business…this includes the thinking and assumptions that go into their day-to-day routines. Questioning those assumptions amounts to a disruption in otherwise smooth operations and disruptions are to avoided as interfering with the business of generating profits.

    Are academics that different?

  9. Pingback: Around the Traps 24/1/14 | Vote-Often.com

Comments are closed.