Skip to content

“Luckily, medicine is a practice that ignores the requirements of science in favor of patient care.”

[cat picture]

Javier Benitez writes:

This is a paragraph from Kathryn Montgomery’s book, How Doctors Think:

If medicine were practiced as if it were a science, even a probabilistic science, my daughter’s breast cancer might never have been diagnosed in time. At 28, she was quite literally off the charts, far too young, an unlikely patient who might have eluded the attention of anyone reasoning “scientifically” from general principles to her improbable case. Luckily, medicine is a practice that ignores the requirements of science in favor of patient care.

I [Benitez] am not sure I agree with her assessment. I have been doing some reading on history and philosophy of science, there’s not much on philosophy of medicine, and this is a tough question to answer, at least for me.

I would think that science, done right, should help, not hinder, the cause of cancer decision making. (Incidentally, the relevant science here would necessarily be probabilistic, so I wouldn’t speak of “even” a probabilistic science as if it were worth considering any deterministic science of cancer diagnosis.)

So how to think about the above quote? I have a few directions, in no particular order:

1. Good science should help, but bad science could hurt. It’s possible that there’s enough bad published work in the field of cancer diagnosis that a savvy doctor is better off ignoring a lot of it, performing his or her own meta-analysis, as it were, partially pooling the noisy and biased findings toward some more reasonable theory-based model.

2. I haven’t read the book where this quote comes from, but the natural question is, How did the doctor diagnose the cancer in that case? Presumably the information used by the doctor could be folded into a scientific diagnostic procedure.

3. There’s also the much-discussed cost-benefit angle. Early diagnosis can save lives but it can also has costs in dollars and health when there is misdiagnosis.

To the extend that I have a synthesis of all these ideas, it’s through the familiar idea of anomalies. Science (that is, probability theory plus data plus models of data plus empirical review and feedback) is supposed to be the optimal way to make decisions under uncertainty. So if doctors have a better way of doing it, this suggests that the science they’re using is incomplete, and they should be able to do better.

The idea here is to think of the “science” of cancer diagnosis not as a static body of facts or even as a method of inquiry, but as a continuously-developing network of conjectures and models and data.

To put it another way, it can make sense to “ignore the requirements of science.” And when you make that decision, you should explain why you’re doing it—what information you have that moves you away from what would be the “science-based” decision.

Benitez adds some more background:

As I’m sure you already know, what and how science is practiced means different things to different people. Although pretty significant this is just one quote from her book:) I may be wrong but I think she is a literary scholar interested in epistemology of medicine. Here’s a few links to give you more context on the book:

  1. This book argues that medicine is not itself a science but rather an interpretive practice that relies on clinical reasoning.
  2. She makes it clear that medicine is not a science, but a science-using practice with a collection of well-honed skills involving a special familiarity with death.
  3. Here Montgomery shows, with example after example, just why we should see medicine, not so much as a science but rather as situational reasoning serving a practical end; an endeavour based upon, but distinct from, medical science
  4. She suggests that “science is a tool, rather than the soul of medicine” and that medicine “is neither a science nor an art. It is a distinctive, practical endeavor whose particular way of knowing . . . qualifies it to be that impossible thing, a science of individuals”.
You probably already know we memorize lots of facts, get very little training in statistics and philosophy, so asking a doctor if the practice of medicine is a science is a challenging question. I also think it’s a very important question and addressing it would benefit the field.

This all raises interesting questions. I agree that it would be a mistake to call medicine a science. As is often the case, I like the Wikipedia definition (“Science is a systematic enterprise that builds and organizes knowledge in the form of testable explanations and predictions about the universe”). Medicine uses a lot of science, and there is a science of medicine (the systematic study of what is done in medicine and what are the outcomes of medical decisions), but the practice of medicine proceeds case by case.

It’s similar with the practice of education, or for that matter the practice of scientific research, or the practice of nursing, or the practice of truck driving, or any job: it uses science, and it can be the subject of scientific inquiry, but it is not itself a science.


  1. Z says:

    “If medicine were practiced as if it were a science, even a probabilistic science, my daughter’s breast cancer might never have been diagnosed in time. At 28, she was quite literally off the charts, far too young, an unlikely patient who might have eluded the attention of anyone reasoning “scientifically” from general principles to her improbable case.”

    She seems to equate ‘reasoning scientifically’ with just using the prior probability of a diagnosis given age and medical history and not updating appropriately based on symptoms and test results.

    • Garnett says:

      I agree. It seems the issue is how narrow the clinician’s prior happens to be in each particular situation.

    • Karl says:

      Also, Andrew mentions briefly the cost/benefit tradeoffs here—I think the author is also maybe equating “probabilistic science” with something like “maximizing expected outcomes under uncertainty while trading off for cost.” If you do this, you can change the function you’re maximizing under uncertainty to be more risk-averse—there are always precision/recall knobs to be twiddled, even in the most Principled versions of reasoning here!

    • And equating science with naive empiricism/rationality.

      Science, as a well-defined term, ought to (by definition) encapsulate the optimal solution set to a given problem. Otherwise saying ‘science is wrong’ is a reasonable statement, which seems nonsensical to me.

  2. jrkrideau says:

    Medicine is not a science just as engineering is not a science. Both apply scientific knowledge often with amazing results but the typical doctor or engineer is not researching to create new knowledge.

  3. Oncodoc says:

    I think that there is a difference between the analytic approach to a problem and the synthetic process in medicine. Analysis literally means to cut apart, to study by isolating components of a problem into smaller pieces to study them better. This contrasts with the putting together, synthetic, process of diagnosis. The ideal diagnostician behaves like Sherlock Holmes who actually was based on a real doctor who was a remarkable synthesizer (putter together) of information. The patient complains of chest pain; I note that she is tall and slender and young; when I shake her hand I note long fingers, and I put these facts together to make a diagnosis of Marfan’s syndrome with aortic dissection as the cause of the pain. A woman in her twenties tells me about a breast lump; I note that she is an Ashkenazi and order further studies on the mass. In doing these things I do have to have a understanding of pre-test probability, epidemiology, and ability to rank outcomes. Missing the aortic dissection could result in such a disastrous outcome that I rank excluding this above a purely objective ranking of all competing diagnosis.
    The problem with the synthetic approach is the is hard to teach. Additionally, it is easy to become fixated on an idea where every bit of supporting evidence is treated as a gold nugget and nonsupportive data is ignored.

    • Jonathan (another one) says:

      Of course the parody of this synthetic approach was in the popular TV series House, in which, every week, they proceeded through 3 or 4 dramatically incorrect synthetic decisions, clearly making the patient worse, before arriving at a correct diagnosis.

  4. Jonathan says:

    My dad was a diagnostician. He told me it runs in two branches, typically in parallel: what you rule out and that which you identify. This starts with symptoms and there is a data problem there: patients don’t remember, don’t express themselves well, add interpretation without intending (like “I thought it might be …”) and tests often measure confounding issues like stress effects and they can’t identify what will happen (as in a mammogram can’t identify a cancer that will appear next week and which unfortunately spreads aggressively). As you go through symptoms, you go through possible diagnoses and some of that is informed by potential prognoses. Point: at root, it’s more like convicting a criminal than probabilistic because you need to rule things out beyond a reasonable doubt, often even as you treat what you think it is. The logical process is often akin to law because the cost of not ruling out the tail event might be death. (Oddly I thought of this earlier this week when I was in the ER going through tests to rule things out or in. Turned out it was kind of odd though not extreme, a condition typically seen in kids but which typically resolves without intervention, as it is. So I’m getting younger!)

  5. Dale Lehman says:

    All of the issues raised are interesting and worth discussing, but I think it would be better to focus on what (I believe) is the heart of the quote. Science can provide predictions (diagnoses or treatment) for individual patients, but these will always be probabilistic. The doctor, in the case cited, was treating an individual patient. In this case, the science, even well-done science, suggested that this 28 year old did not have breast cancer. The doctor, in this case, decided otherwise. We can always argue, post hoc, that the prior was no breast cancer and the posterior (given the clinician’s analysis) suggested breast cancer. But to do so merely becomes tautological and sweeps the issue under the rug.

    No legitimate application of science will ever provide a 100% certain prediction for an individual case. Physicians treat individual cases. The question is: how much should the practicing physician rely on the science? If the probability of diagnosis X is 80%, but the physician has a “gut feeling” that Y is the correct diagnosis, should he/she go with X or Y? This is an issue upon which science can cast some light – but cannot answer. I worry about the advances in “personalized medicine” that are likely to raise these issues repeatedly. Even if we replace the physician with an algorithm, we will have to program that algorithm to produce a diagnosis. What probability would lead to X vs. Y? What probability would dictate that further tests be done? etc. etc.

    Clearly, the answers involve ethics, economics, medical knowledge (both anectdotal and scientific), and plenty of science. The science should be valid – sloppy and poor science should not be included in the decision making. But no amount of “good” science can answer the question X or Y? A decision must be made. Philosophers may have more valuable input than scientists (at least as valuable input) for that decision. For example, who should make that decision – the patient, the physician, the insurer, the government, or what combination of these? Let’s not pretend that if we only do science well, it will be able to answer the question of how to diagnose this patient. It is relevant but it cannot alone answer the question no matter how solid the science is.

    • jack says:

      The doctor’s intuition is just a simple predictive model where he’s using the available evidence to make a judgment. And most humans are really bad at that — most doctors too (of course, the exceptional doctors are exceptional, but they are also expensive and you do go to see them, just rich people or rare patients do). For the general average joe doctor, which is that guys who is not a genius and just studied in school, with all his problems, it has been widely shown that we can do better combining human judgment with simple algorithms in almost every case.

      • Dale Lehman says:

        I can’t let this go uncontested. “Almost every case” is surely an overstatement. And, to declare the doctor’s intuition (even the average doctor’s intuition) a “simple predictive model” is similarly exaggerated. It is so simple that it was easy to program computers to intuit like humans?

        • jack says:

          That’s what we’re trying to do, but better. For most medical conditions, we are already there, but so far it’s far more expensive then the doctors, that’s why it’s not widespread (and there are legal issues too, mostly because doctors have protected their market in a way that benefits them and not the patients).

  6. Dzhaughn says:

    To makes sense of the quotation, I would replace “If medicine were practiced as if it were a science, even a probabilistic science,” with “If medicine were practiced on a decision-theoretic basis.” Or in other words “practiced on a value-maximizing and equal basis.”

    In other words, under plausible conditions, I might believe that any currently conceivable medical system that provides equal treatment to all patients and generally diagnosed and treated cases like her daughters breast cancer would entail mass poverty, would make us overall very much less happy.

    The plausible conditions depend on the details of her daughters case: if the daughter had pain and a tumor she could feel, that’s one thing; if her doctor just gives everyone extra tests for breast cancer because she can game or shame the insurance company into paying for them, that’s another.

    Separately, we might question the author’s belief in the counterfactual outcome. She seems confident that her daughter would have died at a young age from this cancer, but why? That’s what the doctor’s said. Well, they would say that, wouldn’t they? But let’s trust they are being professional. Suddenly, probabilistic models are compelling value estimates!

  7. Kevin Dick says:

    Also, there is a garden of forking paths issue. Sure, if she looks hard enough, she can find examples where going with the probabilities would have led to worse outcomes. But she doesn’t make an equal effort to find examples where going with the probabilities would have lead to better outcomes.

    Personal story. Neurosurgeon wanted to do a CAT scan on a family member 90 days after a concussions due to the outside chance there was a delayed brain bleed, with no clinical symptoms other than ongoing headaches (which a significant fraction of concussion sufferers experience). I said to wait while I ran the probabilities. Even ignoring the cost, the excess cancer death risk from the CAT scan was an order of magnitude more likely than uncovering a brain bleed, which would in probably produce more severe symptoms in the meantime and likely still be treatable.

    When presented with this analysis (including the primary literature that produced the probabilities), the neurosurgeon said, “Oh, we can do an MRI instead.” One wonders how many brain cancers (the biggest chunk of excess cancers from head CTs) that neurosurgeons will be treating in the next 30 years do to their unthinkingly profligate use of head CTs today. I’m sorry, I would much rather have a pieces of software produce the optimal diagnostic and treatment plan than a human. Even better, give me both.

    • Dale Lehman says:

      Garden of forked paths, indeed! Your personal story sounds like proof of why medical practice is not science. The probabilities you “ran” are elusive and subject to considerable uncertainty. I highly doubt the order of magnitude difference you cite. I have no doubt that an algorithm can outperform some (perhaps many) clinicians – and that indeed may be the case in your personal story. But if you look hard enough, you will find examples where the clinician is better. And, more importantly, whatever evidence you found will still result in a case where there are risks of the CT scan, risks of not doing the CT scan, and a decision that must be made under imperfect information. Your faith in software to make these decisions is precisely what worries me about the future of medicine. I will agree with your final sentence, however – “give me both.”

      • Martha (Smith) says:

        Dale: Your comments do not address the (to me obvious) question that Kevin’s example raises: Why not MRI as a default rather than CT? (I would guess perhaps cost? Or not? I don’t really know.)

        • Dale Lehman says:

          I don’t know the answer to your question and I don’t claim any expertise in this area. What I found (and can’t vouch for) is: “A CT Scan usually is best if a suspicion for a stroke or a bleed within the brain is being considered. If there was a previous head injury,then a cranial CT scan may be the scan of choice. MRI is superior than CT scan in evaluating soft tissue changes within the brain and nearby structures. It may note inflammation or swelling which may suggest further evaluation.”

          In this case under discussion, I am not claiming that the medical advice was correct. It may well have not been. It may have been influenced by ignorance, psychological biases, or monetary incentives. And, Kevin may indeed be correct about the choice that was made. But I don’t think we should be claiming (or even hinting) that good science will provide the answer for what to do in cases like the one Kevin describes. Certainly, good science provides information that is useful and necessary for making good choices, but those choices are likely to remain ambiguous. It is far from clear to me who should even make those choices.

  8. Carlos Ungil says:

    > Early diagnosis can save lives but it can also has costs in dollars and health when there is misdiagnosis.

    It can also have costs in dollars and health outweighing the benefit even when the diagnosis is correct.

  9. ghenly says:

    This is another area where Paul Meehl made important contributions, with his work investigating the comparative accuracy of clinical vs. statistical/actuarial prediction in psychology. I haven’t looked at this literature for 30+ years, but my recollection is that clinicians had fared poorly up to that point in time. Ironically, one study motivated by Meehl’s conjectures found that clinicians who were provided with the actuarial prediction still made less accurate predictions than the statistical/actuarial model.

  10. jerlich says:

    I haven’t read the context of the original quote, but it may be that the author is referring to the current practice in medicine of only adopting new treatments (or tests) if it is shown (using a flawed hypothesis test) to have a “significant” effect in the population. If we imagine breast cancer is actually 20 diseases, but is currently considered to be one, then a new test may cure 1/5th of a study group (one specific form of breast cancer), but fail to work on the population and then would be rejected as a new treatment.

    Because experiments are expensive, and we may not have the current tools to figure out what is special about a subgroup of patients that do respond to a treatment, people with “rare” causes of common symptoms are not well served by the system.

    Barring unlimited resources, I’m not sure there is a better way.

    • Clyde Schechter says:

      Breast cancer is indeed heterogeneous. Back in the 1960’s, pretty much all breast cancers were treated the same way: radical mastectomy, maybe with some radiation on top of that. But since that time, considerable progress has been made in identifying subtypes of breast cancer that respond better to different approaches. The first breakthrough came with understanding the difference between tumors that have estrogen or progesterone receptors on their cell surfaces and those that do not. Those that do respond well to estrogen inhibiting treatment and show diminished responsiveness to standard chemotherapy. Those that do not have these receptors do not respond much, if at all, to estrogen inhibitors, and tend to respond well to standard chemotherapy.

      More recently, a drug called trastuzumab has been found to be extremely effective in treating breast cancers where the cells have a HER2 receptor on their surface. Current research is focusing on identifying genomic variation among breast tumors, and understanding the pathways by which different mutations work to drive the malignancy. It is expected that as we learn more about this we will be able to develop and deploy highly tumor-specific treatments that are optimized for that particular type of tumor. This is the program of the much-hyped “personalized medicine.” Whether this will actually work in practice remains to be seen.

      Interestingly, it raises many challenges for research design. If a treatment that works only on relatively uncommon subtypes of tumor has only a moderate effect, mounting a study to identify that effect with sufficient certainty could prove difficult. Even now, treatment for a large subset of breast cancers is so highly effective that there is little room left for improvement and the required sample sizes are simply infeasible. This sort of situation could prove an even bigger stumbling block to testing narrowly targeted therapies.

  11. Glen M. Sizemore says:

    “…that impossible thing, a science of individuals”.

    GS: Yeah…impossible when you don’t use single-subject designs where applicable, and in medicine (and a NATURAL science of behavior) SSDs are applicable…with a vengeance. I’ve noticed that people here are not much interested in my claims about SSDs – perhaps the goal isn’t really ameliorating some of the problems with replication and the fact that so much that is published is a pile of…well, I won’t finish that thought.


    • Andrew says:


      We’ve discussed the benefits of within-subject designs many times on this blog.

      • Glen M. Sizemore says:

        Do you mean within-subject designs where the data are averaged across subjects? Not what I’m talking about.

        • Andrew says:


          I mean analyses where multiple measurements, maybe even multiple treatments, are performed on each person, just not at the same time. This can be done with just one person, or you can perform a single-subject experiment on many different people, in which case you can fit a multilevel model. It would not be appropriate to simply average the data across people. You’d analyze all the data together but not with simple averaging.

          Psychologists know all about within-subject and between-subject designs. Often people use between-subject designs because of concerns about “poisoning the well” in within-subject designs: the first measurement can affect the later measurement. But Eric Loken and I and others have argued that the gain in precision from within-person comparisons can in practice be much more important than the bias that is introduced. We’ve seen so many between-subject designs that are so noisy as to be useless.

          • Garnett says:

            Andrew: Can you point to the papers by you and Eric Loken and others have discussed this type of design?

            • Andrew says:


              Here’s a paper that some colleagues and I wrote analyzing a within-subject design in pharmacokinetics.

              I don’t recall that I’ve any papers on the general topic of between- and within-subject designs, but it’s come up a lot on the blog and in talks that I’ve given; for example, see slides 33-34 here.

              I thought you might want a general reference so I googled *within subject between subject designs* and came across this paper from 2012 by economists Gary Charness, Uri Gneezy, and Michael Kuhn which looks pretty good.

  12. Glen M. Sizemore says:

    I’m still a little confused…I am talking about experiments where each subject’s dependent measure(s) is (are) frequently measured (often daily in the laboratory), under constant conditions, and allowed to reach a stable-state. Some independent-variable is changed and the data allowed to reach a stable-state again. Then, the original conditions are reinstated (i.e., a so-called ABA design). An effect is judged to have occurred when the stable-state of the B condition is different from the A conditions, and the A conditions do not differ. Often it is the ranges that are important: it is not unusual to find that the ranges of the measures in the A and B conditions do not overlap, and this is true for all subjects (generally N is very small). Needless to say, there is a lot more I could say, but I would like to know if we are talking about the same thing.


    • Andrew says:


      The sort of experiment you talk about is done all the time in pharmacometrics. In that field, it would be nearly impossible to learn anything useful from just one outcome measurement per person. What Eric and I and others have claimed, is that such designs would make a big difference in many area of research, and that psychology researchers in particular have made a mistake by focusing on between-person designs, which sometimes appear to be safe and conservative, but which are not.

      • Glen M. Sizemore says:

        “The sort of experiment you talk about is done all the time in pharmacometrics.”

        GS: Really? “All the time”? Hmm…I never heard the term “pharmacometrics” even though I have been involved with behavioral pharmacology for more than 35 years. It must be a statistician’s term. In any event, though behavioral pharmacology benefitted greatly from the use of SSDs (beginning with the time when Peter Dews walked into Fred Skinner’s lab), the overwhelming majority of behavioral pharmacology papers report data pooled across individuals and use NHST – especially today. Indeed, most journals require NHST. AFAIK, the major outlets for drug data of the sort I am talking about remain the Journal of the Experimental Analysis of Behavior as well as Behavioral Pharmacology. The latter journal being closely connected with Dews. However, when I published a paper there about 15 years ago, I had to battle one reviewer (despite the fact that the editor was on my side) to get the paper published reporting individual-subject data.

  13. Anoneuoid says:

    Ok, but surely medical research is supposed to be science. Does anyone argue about that?

    Also this discussion seems to imply that science is all about measuring frequencies and guessing what will happen based on that. That is wrong, a lot (most?) science got done before that was even a thing.

    • Mark says:

      Medical research is “scientific”. But much of it would not qualify as pure science. For example, it is enough for clinical trials to show safety and efficacy, regardless of whether the mechanism of action is really understood. That kind of medical research doesn’t create any fundamental understanding about the nature of the physical world. Arguably it’s not even “applied science”. Overall, medical research is more like engineering research, rather than science research. (Though, not really exactly like engineering research either, because the requirements being met are more like a given.)

      • Anoneuoid says:

        it is enough for clinical trials to show safety and efficacy, regardless of whether the mechanism of action is really understood

        I agree this is something people say to justify their behaviour, but think it is wrong. You easily end up wasting each others time with research like this that ends up measuring who knows what (eg how hungry you can make rats) rather than whether a treatment works for a disease. I know at least one area where this has certainly been going on for 30-40 years.

        • Mark says:

          The point is that through a clinical trial you can learn whether a treatment works for a disease, even if the trial doesn’t help you understand why it works. If we have evidence that a treatment works, it would be unethical to not offer it. The “wrong behavior” would be to let people suffer because you think there is not sufficient explanatory depth for a clearly observable phenomenon.

  14. Keith O'Rourke says:

    > continuously-developing network of conjectures and models and data
    With an unknowable expiry date that should never be ignored.

  15. There is so much in the original post that can be commented on (as shown by the multitude of responses). I just wanted to add a few as someone who has spent a lot of time working in this area (probabilistic models of medical decision making).

    Unfortunately, the best clinician coupled with the best diagnostic software will occasionally make the wrong diagnosis and a patient will suffer. As the less than miraculous results of genetic testing have shown, the human body (and related diseases) is a too complex for us to understand more than a small fraction of the relationships between all the variables at our current state of knowledge. An interesting example of how people view risk is the debate over prophylactic mastectomies based on probabilities derived from statistical studies of genetic factors and family histories.

    Screening for breast cancer has been studied very thoroughly over the years with many sophisticated outcome models and analyses. With our current level of diagnostic accuracy and treatment efficacy, we are virtually at a point of equipoise between the costs and benefits of screening (the balance changes for different age groups).

    I would like to highlight what I view as the difference between “personal” and “public” decision making. A lot of decision theory addresses the latter. Even individual measurements of utility, such as standard gamble, pool the results in order to make public health decisions. Using the same approach for individual decisions is much less studied.

    Finally, the biases of physicians have been documented many times which is why I view that algorithmic software has a place in our medical system, but there is a very large caveat–namely the difficulty in quantifying and collecting the knowledge that physicians use.

  16. Brad Stiritz says:

    Very thoughtful post, Andrew. Here’s a pretty scary run-down that you may find interesting as well..

Leave a Reply