Kaiser Fung’s review of “Don’t Trust Your Gut: Using Data to Get What You Really Want in Life” (and a connection to “Evidence-based medicine eats itself”)

Kaiser writes:

Seth Stephens-Davidowitz has a new book out early this year, “Don’t Trust Your Gut”, which he kindly sent me for review. The book is Malcolm Gladwell meets Tim Ferriss – part counter intuition, part self help. Seth tackles big questions: how to find love? how to raise kids? how to get rich? how to be happier? He invariantly believes that big data reveal universal truths on such matters. . . .

Seth’s book interests me as a progress report on the state of “big data analytics”. . . .

The data are typically collected by passive observation (e.g. tax records, dating app usage, artist exhibit schedules). Meaningful controls are absent (e.g. no non-app users, no failed artists). The dataset is believed to be complete. The data aren’t specifically collected for the analysis (an important exception is the happiness data collected from apps for that specific purpose). Several datasets are merged to investigate correlations.

Much – though not all – of the analyses use the most rudimentary statistics, such as statistical averages. This can be appropriate, if one insists one has all the data, or “essentially” all. An unstated axiom is that the sheer quantity of data crowds out any bias. This is not a new belief: as long as Google has existed, marketing analysts have always claimed that Google search data are fully representative of all searches since Google dominates the market. . . .

If the analyst incorporates model adjustments, these adjusted models are treated as full cures of all statistical concerns. [For example, the] last few chapters on activities that cause happiness or unhappiness report numerous results from adjusted models of underlying data collected from 60,000 users of specially designed mobile apps. The researchers broke down 3 million logged events by 40 activity types, hour of day, day of week, season of year, location, among other factors. For argument’s sake, let’s say the users came from 100 places, ignore demographic segmentation, and apply zero exclusions. Then, the 3 million points fell into 40*24*7*4*100 = 2.7 million cells… unevenly but if evenly, each cell has an average of 1.1 events. That means many cells contain zero events. . . . The estimates in many cells reflect an underlying model that hasn’t been confirmed with data – and the credibility of these estimates rests with the reader’s trust in the model structure.

I observed a similar phenonmenon when reading the well-known observational studies of Covid-19 vaccine effectiveness. Many of these studies adjust for age, an obvious confounder. Having included the age term, which quite a few studies proclaimed to be non-significant, the researchers spoke as if their models are free of any age bias.

Kaiser continues:

A blurred line barely delineates using data as explanation and as prescription.

Take, for example, the revelation that people who own real estate businesses have the highest chance of being a top 0.1% earner in the U.S., relative to other industries. This descriptive statistic is turned into a life hack, that people who want to get rich should start real-estate businesses. Nevertheless, being able to explain past data is different from being able to predict the future. . . .

And, then, Kaiser’s big point:

Most of the featured big-data research aim to discover universal truths that apply to everyone.

For example, an eye-opening chart in the book shows that women who were rated bottom of the barrel in looks have half the chance of getting a response in a dating app when they messaged men in the most attractive bucket… but the absolute response was still about 30%. This produces the advice to send more messages to presumably “unattainable” prospects.

Such a conclusion assumes that the least attractive women are identical to the average women on factors other than attractiveness. It’s possible that such women who approach the most attractive-looking men have other desirable assets that the average woman does not possess.

It’s an irony because with “big data”, it should be possible to slice and dice the data into many more segments, moving away from the world of “universal truths,” which are statistical averages . . .

This reminds me of a post from a couple years ago, Evidence-based medicine eats itself, where I pointed out the contradiction between two strands of what is called “evidence-based medicine”: the goal of treatments targeted to individuals or subsets of the population, and the reliance on statistical significant results from randomized trials. Statistical significance is attained by averaging, which is the opposite of what needs to be done to make individualized or local recommendations.

Kaiser concludes with a positive recommendation:

As with Gladwell, I recommend reading this genre with a critical eye. Think of these books as offering fodder to exercise your critical thinking. Don’t Trust Your Gut is a light read, with some intriguing results of which I was not previously aware. I enjoyed the book, and have kept pages of notes about the materials. The above comments should give you a guide should you want to go deeper into the analytical issues.

I think there is a lot more that can be done with big data, we are just seeing the tip of the iceberg. So I agree with Seth that the potential is there. Seth is more optimistic about the current state than I am.

Leave a Reply

Your email address will not be published. Required fields are marked *