Skip to content
 

Advice that’s so eminently sensible but so difficult to follow

When we suggest a new method, we are duty-bound to not just demonstrate that it works better than existing approaches (or is superior in some other way such as simplicity or cost). We also need to explain why, if this new method is so great, people aren’t already using it.

Various answers are possible, for example:

– The new idea is technically advanced, requiring a level of mathematical or engineering complexity such that it could not easily have been discovered by accident. Hence its novelty can be explained as a product of some particular historical process.

– The new idea is clever and unexpected, as with the mechanical device underlying Rubik’s Cube.

– The new idea could only exist given recent technological developments (perhaps hardware developments such as a new composite material or an ultralight battery, or software developments such as a new MCMC algorithm).

– The new idea usually isn’t so impressive but it shows its virtues in some previously hidden domain (for example, you wouldn’t have much need for relativity theory if you’re modeling sub-relativistic velocities).

– The new idea violates some principle or taboo (for example, until recently there wasn’t much formal research on weakly informative prior distributions because the literature on Bayesian statistics was divided between quests for noninformativity and assertions of complete subjective informativity).

– The new idea was actually out there all along but people didn’t use it because until recently they had no need for its benefits (for example, a method that offers a 10% improvement in speed but at the cost of requiring an elaborate computational implementation, this might not be worth it for a desktop regression but can come in handy if you’re analyzing billions of data points).

You can probably come up with some more. My point is that when we promote a new idea, we must—explicitly or implicitly—explain why our brilliant predecessors did not already discover and use it.

I thought about the above after reading this from John Cook:

According a recent biography of Henri Poincaré,

Poincaré … worked regularly from 10 to 12 in the morning and from 5 till 7 in the late afternoon. He found that working longer seldom achieved anything …

Poincaré made tremendous contributions to math and physics. His two-hour work sessions must have been sprints, working with an intensity that could not be sustained much longer.

I [Cook] expect most of us would accomplish more if we worked harder when we worked, rested more, and cut out half-work.

I agree, but . . . I’ve thought this for a long time, as I’m sure has almost anybody who works at a flexible job. It’s long been my goal to work intensely for whatever number of hours a week is possible and then relax the rest of the time, rather than spending hours and hours each week rearranging my files, responding to email, etc.

Yet this doesn’t always happen (hence it’s long been my goal etc.). I occasionally make some progress (for example, my strategy of reviewing journal articles immediately, just reading the article and writing the report in 15 minutes, or my strategy of not reading email before 4pm), but I still spend lots of time doing essentially nothing yet still hanging out at work. (Not to mention blogging, but that at least serves some socially useful purposes.)

Which brings us to the question: if this is such good advice, and such obviously good advice, why aren’t we doing it already? I think one reason is that it’s hard to work intensely, and once you’re at work it’s easier to spend your time quasi-goofing-off. I was once at a workshop where the person next to me was checking email on the laptop, literally more than once per minute. Seems pretty boring, but it beats working!

I’m reminded of the advice in the classic book, How to Talk So Your Kids Will Listen and Listen So Your Kids Will Talk. It’s all such clearly good advice, but somehow so difficult to carry out in real time.

P.S. One of the comments on Cook’s post led me to this website by computer scientist Cal Newport which is full of advice for students, along the lines of: “‘follow your passion’ is bad advice if your goal is to end up loving what you do.” I like what Newport has to say. Unlike the usual in-your-face internet self-help gurus, he doesn’t seem to feel the need to be obnoxious or to supply having-it-all parables. He does do a little of that B.S.—for example, a post entitled, “How to Get Into Stanford with B’s on Your Transcript”—but, even there, the substance of the post is interesting. Gladwellian, one might say, and I mean that in the best possible sense of the word.

10 Comments

  1. Entsophy says:

    Well I for one, followed Poincare’s habits precisely. I do all my work in two hours of concentrated effort right before it’s due.

  2. zbicyclist says:

    I might add: The new idea makes it more likely that your result may be shown to be artifactual.

    I’ve thought a LOT about the minimal use of holdout samples (and other cross-validation). This is partly because I work in a data-rich environment — if there is data on the topic at all, there is likely to be too much of it.

    So why don’t people always use holdout samples to validate? This is because the use of a holdout sample (or other cross-validation) makes it more likely that your result will disappear. You can’t morally present to a client or publish in a journal results you know in your heart aren’t true, so it’s easier just not to find out.

    I’m using cross-validation as an example of the principal here; holdout samples have, of course, been available as a technique for many decades.

    • K? O'Rourke says:

      I suspect thats why Epidemiologists and almost anyone else analyzing non-randomized studies – avoid trying to quantify the (residual) biases involved in an meaningful way.

      More generally it can be cultural such as professional statisticians just don’t plot things (e.g. marginal priors and posteriors) they properly deal with them analytically (or just remain silent).

      I did once have a work schedule somewhat like Poincare’s that my Director was fine with, but they warned me not to let anyone in the organization know about this as there would be negative repercussions. It is hard to convince oneself that somehow you are not letting anyone else including yourself down (like that earlier example about marks and being religious and not having pre-marital sex even when it is very unlikely to do any harm.)

      And today Poincare would likely be blogging between his two hours…

  3. Radford Neal says:

    when we promote a new idea, we must—explicitly or implicitly—explain why our brilliant predecessors did not already discover and use it.

    I don’t agree, because there are plenty of ideas that just didn’t get thought of, for no good reason.

    For example, Hamiltonian Monte Carlo, discovered in 1987, could easily have been discovered in 1960. It might possibly have been of less use, with the computers of the time, but it would have seemed potentially useful, and publishable, if anyone had thought of it. To take another example, Low Density Parity Check codes actually were discovered in the early 1960s, but largely forgotten until they were rediscovered by David MacKay and myself in 1996. And the big, big example is that Bayesian statisticians could have been using MCMC since about 1960, but apparently didn’t hear about it (or re-discover it themselves), for no particular reason.

    • Andrew says:

      Radford:

      1. I’d put HMC in the Rubik’s Cube category above: an idea that wasn’t discovered earlier because it’s clever: somebody had to think it up.

      2. I don’t know what low density parity check codes are, but I agree that MCMC could’ve been ported over earlier from physics to statistics. Here, though, I think there is a story, which is that the statisticians of 1960 wouldn’t have known what to do with MCMC. Consider the difference between typical physics applications and typical statistics applications. The physicists tend to work on large fixed problems (for example, the Ising model), and it makes sense for them to throw lots of resources to get an accurate solution. In contrasts, the statisticians tend to jump from problem to problem and search for general methods, hence it’s less appealing to have a very expensive method for a particular problem. If statisticians had known about MCMC in 1960, my guess is they would’ve used it to do simulation studies to evaluate the statistical properties of various methods, and they would’ve used it for the occasional stand-alone analysis (for example, the Mosteller and Wallace study of the Federalist papers) but not as a routine data analysis tool to set alongside regression, correlation, Anova, etc. Consider how MCMC ended up entering statistics: it was through a particularly complicated application area (image analysis and spatial models) and through an area where simulation was already being accepted (missing-data imputation).

      This is not to say that the statisticians of 1960 couldn’t have made some good use of MCMC, just that I think there are reasons it took as long as it did to enter the statistical toolkit.

      • Radford Neal says:

        Well, I’d say that HMC isn’t really all that clever, if you start out knowing about the Metropolis algorithm and about molecular dynamics simulations, and quite a few physicists in 1960 would have known about both. Of course it’s somewhat clever – if it weren’t, it wouldn’t count as a new idea at all…

        Another example where I have some involvement is arithmetic coding for data compression. It also was discovered, in very impractical form, back about 1960. It took about 20 years before practical versions were developed, for no particular reason – of course some cleverness is required, but nothing that was beyond the people in 1960 – and a few more years after that before it became well know (partly due to a tutorial/review paper, with software, that I co-authored).

  4. Ely Spears says:

    Reminds be a bit of the New Economics Foundation’s paper on alleged benefits of a 25 hour work week, to both employer and employee.

  5. Steve Roth says:

    Back when I used to run companies (I just have a couple of little ones now that don’t require more than a few hours a week), I’d go on long vacations with the family and always discover that in an hour of emails in the morning and a few phone calls, I could do 80% of what I spent six to twelve hours doing otherwise.

    OTOH, if I hadn’t spent all those hours when I wasn’t on vacation, maybe this would not have been true.

  6. Steve Sailer says:

    Successful novelists tend to write about four hours per day, most every day though.

  7. […] computer science professor; more available here; Andrew Gelman comments about work habits here. (Study Hacks, Gelman […]