More on the role of hypotheses in science

Just to be clear before going on: when I say “hypotheses,” I’m talking about scientific hypotheses, which can at times be very specific (as in physics, with Maxwell’s equations, relativity theory) but typically have some looseness to them (a biological model of how a particular drug works, a political science model of changes in public opinion, etc.). I’m not talking about the “null hypothesis” discussed in classical statistics textbooks.

Last year we posted a discussion of the article, “A hypothesis is a liability,” by Itai Yanai and Martin Lercher. Yanai and Lercher had written:

There is a hidden cost to having a hypothesis. It arises from the relationship between night science and day science, the two very distinct modes of activity in which scientific ideas are generated and tested, respectively. . . .

My reaction was that I understand that a lot of scientists think of science as being like this, an alternation between inspiration and criticism, exploratory data analysis and confirmatory data analysis, creative “night science” and rigorous “day science.” Indeed, in Bayesian Data Analysis we talk about the separate steps of model building, model fitting, and model checking.

But . . . I didn’t think we should enthrone this separation.

Yanai and Lercher contrast “the expressed goal of testing a specific hypothesis” with the mindset of “exploration, where we look at the data from as many angles as possible.” They continue:

In this mode, we take on a sort of playfulness with the data, comparing everything to everything else. We become explorers, building a map of the data as we start out in one direction, switching directions at crossroads and stumbling into unanticipated regions. Essentially, night science is an attitude that encourages us to explore and speculate. . . .

What’s missing here is a respect for the ways in which hypotheses, models, and theories can help us be more effective explorers.

My point here is not to slam Yanai and Lercher; as noted, my colleagues and I have expressed similar views in our books. It’s just that, the more I think about it, the more I am moving away from a linear or even a cyclical view of scientific or statistical practice. Rather than say, “First night science, then day science,” or even “Alternate night, day, night, day, etc. to refine our science,” I’d prefer to integrate the day and night approaches, with the key link being experimentation.

But my perspective is just one way of looking at things. Another angle comes from Teppo Felin, who writes:

A small, interdisciplinary group of us wrote a response to Yanai & Lercher’s Genome Biology piece. It looks like their original piece has become a bit of a social media hit, with 50k+ downloads and lots of attention (according to Altmetric).

You can find our response [by Teppo Felin, Jan Koenderink, Joachim Krueger, Denis Noble, and George Ellis] here, the data-hypothesis relationship. We’re extremely surprised at their use (and interpretation) of the gorilla example, as well as the argument more generally. Yanai & Lercher in turn wrote a response to that, here. And then we in turn wrote another response, titled “data bias.”

We’re definitely fighting an uphill battle with our argument. Data is “hot” these days and theory passé. And the audience of Genome Biology is largely computer scientists and geneticists. They absolutely loved the “hidden gorilla” setup of the original Yanai-Lercher article last year.

I’m just shocked that this type of gimmicky, magic-like experimental approach continues to somehow be seen as valid and insightful. It’s just a form of scientific entrapment, to prove human folly and bias. Now we’ve had gorillas hidden in CT scans, in health data, and even in pictures of space. The supposed arguments and conclusions drawn from these studies are just plain wrong.

Here’s what Felin et al. have to say:

Data is critical to science. But data itself is passive and inert. Data is not meaningful until it encounters an active, problem-solving observer. And in science, data gains relevance and becomes data in response to human questions, hypotheses, and theories. . . . Y&L’s arguments suffer from a common bias where data is somehow seen as independent of hypothesis and theory. . . .

39 thoughts on “More on the role of hypotheses in science

  1. I think you could look at the data exploration/hypothesis duality as being much the same as writing/editing a book. Writing takes the most creativity (perhaps). During editing, you are critiquing (we could say testing) the material, but you could still have bouts of creativity during which you revise or add to the book.

    The two modes need to go together to produce a good book, and the same for research.

  2. I’d like to think that this discussion merely comprises two different ways of treating the same intellectual stance. It’s my hope that Y&L are not really such simpletons as to believe that banishing hypotheses from viewing data is either possible or desirable; rather, that “hypothesis bad” is sort of an instructive archetype for reminding us of how too much love for your initial impression can fog up your view of what data really tell you. In a complementary sense, the response article and your posts admonish us not to be too simplistic in thinking about scientific inquiry as so dichotomous. Absolutely, the one style of thought had better inform the other. The process MUST be synthetic or we get nowhere. As a nonprofessional, it appears to me that the two representations are much like comparing the utility of rigorous null hypotheses with Bayesian priors. Sometimes we are better off acting as if we have no advance knowledge of what may be happening; more often, admitting that we do have SOME insights in advance is more helpful. The key admonishment is to not let the priors blind us to data indications that the speculation is wrong or imperfect. Hopefully, models inform data, data inform models. That seems straightforward.

  3. “In this mode, we take on a sort of playfulness with the data, comparing everything to everything else. We become explorers, building a map of the data as we start out in one direction, switching directions at crossroads and stumbling into unanticipated regions.”

    Would you buy a map from these guys? How about this:

    “In this mode, we stop talking about zombies and get serious with the data, comparing everything to everything else. Once we have a handle on all relevant factors, we build a comprehensive, objective model that challenges our preconceived notions in every conceivable way.”

  4. Often we are best off putting things too stridently, so that by overstating the case, we strengthen the point. To say that “kind of/sort of” hypotheses can cloud data interpretation is not nearly as powerful as saying hypotheses are bad. Of course, the former is true and the latter not. Archetypes and schematics help us understand, even if we know that they are not truly representative of reality. They exist and are weaponized to clarify better than one could do by being purely descriptive.

    Metaphors are better than similes. Let us hope that the Y&L paper was a parable and not a user’s manual.

  5. “How odd it is that anyone should not see that all observation must be for or against some view if it is to be of any service!” Charles Darwin

  6. “To test this [that mental focus on a specific hypothesis would prevent a discovery], we [Y&L] made up a dataset and asked students to analyze it”

    Uh oh.

  7. “Hypothesis only” fundamentalists belong to PhD dissertation committees, not the real world where data consists of massive convenience samples.

    Wrt massive amounts of information, the formation of explicit hypotheses is not only onerous but also frequently impossible.

    In such cases, the data itself is the hypothesis.

      • Daniel:

        I’m thinking there’s a connection between your point and the idea of workflow, or performing multiple analyses on data. For example: I just went on Baby Name Voyager and started typing in names. This was as close to hypothesis-free data analysis as you can get. But after I saw a few patterns, I started to form hypotheses. For example, I typed in Stephanie and saw how the name frequency has dropped so fast during the past twenty years. Then I had a hypothesis: could it be alternative spellings? So I tried Stefany etc. Then I got to wondering about Stephen. That wasn’t a hypothesis, exactly, more of a direction to look. I had a meta-hypothesis that I might learn something by looking at the time trend for Stephen. I saw a big drop since the 1950s. Also for Steven (recall that earlier hypothesis about alternative spellings). And so on.

        My point is that a single static data analysis (for example, looking up Stephanie in the Baby Name Voyager) can be motivated by curiosity or a meta-hypothesis that I might learn something interesting, but as I start going through workflow, hypothesizing is inevitably involved.

        • Exactly. I think the issue may be that “hypothesis” as a word has been ruined by “hypothesis testing” So people think “a statistical hypothesis” is the same as “a choice of random number generator” to check the data against.

          That’s just the wrong way to think about it. A “hypothesis” is “an idea about what might be true about the world”. So “maybe Stephanie has morphed into an alternative spelling” is a hypothesis, but it has nothing to do with random number generators.

          Even “if I ask my friends maybe one of them will have a great recipe for Jamaican jerk chicken patties” is a statistical hypothesis.

          You might object “What isn’t a statistical hypothesis?” and that’d be anything you can’t answer by collecting data. “God exists” or “Blue is a really cute color” for example (though we could say “a large fraction of people think blue is a really cute color” and it’s now statistical).

      • “There’s no such thing as hypothesis free data analysis.”

        not that I think it’s a good idea but obviously – as we’ve seen over and over here – people can run massive statistical comparisons and mine for correlations, even if they’re not using NHST.

        Of course you *can* do it. The question is: what is the probability of meaningful relationships emerging? It must be nonzero, but it can’t be very high, since the fact that these potential relationships require data mining to uncover already strongly indicates that they’re either spurious or very weak.

        • These are just examples of unproductive analyses based on vague hypotheses though. Not all analyses are of equal quality, but they all involve some idea like “there is probably a correlation between several variables in this dataset” which is itself a statistical hypothesis, not a very useful one…

      • Daniel-

        I take your point but would suggest that ‘agnostic’ models can be and have been built with data that may or may not be ‘relevant’ to a predictive algorithm.

        Jim Simons, mathematician and founder of one of the most successful hedge funds, has written (in his autobiography) about how the people he hires are invariably highly creative thinkers many of whom integrate massive amounts of information from many, many sources with no prior knowledge of its relevance in an effort at gaining a competitive, predictive market edge.

        This would constitute truly exploratory data analysis and modeling.

      • “There’s no such thing as hypothesis free data analysis.”

        Maybe there’s no such thing as hypothesis free data analysis’ because there’s no such thing as data free collection. Maybe what choose to see, to measure, to collect, to count – maybe all data gathering is based on an idea, a preconception, and ‘hypothesis’ about what is relevant. Maybe we begin with an implicit assumption about what is relevant, meaningful, important. Genome data is not free of these preconceptions – what genome information is gather, from whom, at what time, under what conditions is all based on ideas however implicit. And if we start with ‘data collection based on hypotheses’, maybe the rest of the road is predetermined.

      • I always go back to Petr Skrabanek’s paper “The Emptiness of the Black Box” when this issue comes up. Nobody is being “playful” when they decide which dimension of data to mine for a correlation. The fact that they have chosen just one of the many available for their mining operation reveals that they are being disingenuous. The head fake is to pretend to be surprised when the straw man is thereafter toppled. It’s just a clever way of wishing into being a high “surprisal factor”.

    • “where data consists of massive convenience samples.”

      Here’s a suggestion: the more “convenient” the data, the lower the probability that useful information will emerge through analysis of it.

      • jim said,
        “Here’s a suggestion: the more “convenient” the data, the lower the probability that useful information will emerge through analysis of it.”

        Hmm. — what Jim said is what I’d call a hypothesis (not a suggestion), or (to use the math term) a conjecture.

  8. Y&L seems like a Rawlesian “veil of ignorance” view: look at the data without knowing what preconcieved notions we bring to it. For ethical issues this can provide a constructive way to examine alternative views; for analyzing data I am less convinced. I can see it as more useful AFTER an analysis, a way of playing devils’ advocate. But I think it is of much more limited use when first examining data. Perhaps for extremely controversial topics (such as gun control, abortion, etc.) it may serve to prevent our biases from overwhelming the analysis, but more generally (as others have said), data can not speak for itself.

  9. For what it’s worth, CS Peirce kept coming back to possibilities, actualities and comprehensibles (ways to make sense of the first two).

    At first these where thought of as being distinct but all thought was some mix of all three. Later they became phases of thought and using today’s quantum metaphors – disappeared and reappeared seemingly at random from one of the three to one other of the three.

  10. Coming from a history-of-physics background, this data-only idea seems artificially naive. Aristarchus, Copernicus, Galileo, Kepler, Newton, Faraday, Maxwell, Einstein, Schroedinger,… Try to imagine any of them, even Kepler, just letting the data talk to them as if they had no prior hunches. Nothing would have happened. Or try the other limit, absolutely knowing the full framework to use (Ptolemy): also nothing. It’s a dialectical mess of hypotheses (starting with whatever mental firmware leads us to construct “objects” from sense impressions) and observed surprises (muons,…). Why try to cram it into a one-sided formula?

  11. Y&L article make much more sense if you see it as some kind of psychological defence mechanism. Your are young and ambitious, you become a scientist to improve the world, but the only thing you do is boring number crunching and null hypothesis testing. This is useless waste of intelligence, but you cant admit it because everybody else is doing the same. So you have to invent some kind of night science, the exciting science, the intellectually challenging science. You have to invent Dr. Jekyll to justify mr. Hide.

    This is actually pretty depressing.

  12. I think a better way to view this alleged duality is to reinstate C.S. Peirce’s distinction between induction and abduction. Abduction is the inference to a hypothesis worth pursuing. It is not “hypothesis free” whatever that means, but the hypothesis does not sit in the same relation to the evidence as in induction. Rather, we view the evidence that we have as supporting the idea that the hypothesis is worth the cost of real study and experimentation.

  13. Two comments:
    1) I forget the name of the person who made the comment below (that seems relevant to this discussion); I just recall that she was a math professor at Bryn Mawr (not, not Emmy Noether).

    “I have eyes to see where I’m going, and feet to get me there. That is the role of initiation and proof in mathematics, no more and no less.”

    2) Thoughts I have had lately about drug research practices (in particular, in response to discussions about vitamin D): I think that drug (and other medical) research often gets stuck in a “formulaic” paradigm focused excessively on RCT’s. For example, drug RCT’s often test one particular dosage of the drug. But I can’t help but think that for many drugs, appropriate dosage may depend on patient weight or BMI or some other measure of body size (or some measure of drug assimilation) — yet I rarely (if ever) have heard of clinical trials (or other research on drugs) that make drug dose vary depending on such measures. Intuitively, it seems reasonable that appropriate drug dosage might depend on one of these factors. Yet I haven’t heard of “plausible reasoning” that would support this (or would indicate which measure of body size would be most relevant). I would guess that suitable plausible reasoning would depend (at least in part) on things like knowing the physiology of how the drug disperses in the body.

    • Martha:

      Regarding your item 2, I think this is done in pharmacology all the time. There’s a lot of work done of modeling of dosage, and when a clinical trial is designed, they will design the dose recommendation too, which can depend on patient characteristics.

        • > Then why do we get stuff like this

          Seems pretty unfair to try to body-slam this point with one counter-example, especially when the linked article says:

          > That 100-microgram dose ultimately became the one authorized for mass use in dozens of countries. But Moderna scientists later showed that a half-dose seemed to be just as good as the standard dose at stimulating immune protection.

          Sounds like the dosing work to me.

          Also what does dimensionless dosing mean? Is dose itself dimensionless (it’s just one unit of a thing — and the details of that are somewhere else)?

        • I was NOT trying to body slam anything I was hoping someone from inside the sausage factory would give me insight into why fairly basic principles seem from the outside to be ignored when it comes to dosing?

          Dimensionless dosing is where you express the dose as a ratio between the amount of the drug given (per unit time if it’s repeated) and the important controlling quantity for the process involved. This could be the total mass of the person, or the mass that’s transported per time across the kidney membranes or the mass of the liver or what have you. In the end the ratio is dimensionless (a pure number without units) since the choice of what units to measure something in is arbitrary it must be that there is a way to express dose such that whatever units you use the dose is invariant to those units. That’s what a dimensionless number is

        • The units of dosing depend upon the pharmacokinetics and pharmacodynamics of the particular drug – it’s complicated. Here’s some examples:
          1. Morphine – interacts with a nervous system receptor. Receptor numbers fall with age so lower dose given to the elderly. Receptors down-regulated in opioid users so higher doses needed. Body mass not so important.
          2. Gentamicin – distributed to total body water so dosing based on ideal body weight (not much water in adipose tissue). Need to achieve a steady state level above that needed to be bacteriocidal. Prolonged high levels are toxic to humans and excreted by the kidney so dosing interval adjusted in kidney failure.
          3. Benzodiazepines – binds to receptors that are easily saturated so ceiling effect where higher doses have no greater effect.
          4. Vaccines – also ceiling effect where higher doses produce no greater immune response
          5. Penicillin – non-toxic so very large doses can be given with impunity.
          6. Some drugs accumulate in extra-cellular fluid space so dose is based on body surface area (which correlates with ECF volume).

          All this just off the top of my head.
          Normally dosing is determined with some care in early drug development (Phase 2 & 3).
          With vaccines one would normally try a number of different doses and then track persistence of antibodies maybe for 1 to 2 years. This approach obviously not possible with this stupid pandemic so I guess they picked a large dose to be sure, unnecessarily large as it turned out.

        • Nick,

          Thanks for the elaboration and examples. To elaborate a bit on the particular situation (vitamin D) that I am concerned with: There does seem to have been some recent concern that appropriate dosage should reflect body characteristics. In particular, the recommended maximum dosage has been increased, to take into account the problem that obese individuals needed dosage greater than what was previously prescribed. However, the way this is stated doesn’t indicate that the “maximum safe” dosages may not be safe for everyone. Indeed, some researchers describe some versions of vitamin D as having a narrow margin between help and harm.

        • mg/kg not dimensionless enough for ya? (Or maybe EVER was about mRNA vaccines specifically?)

          Anyway, from a scientific perspective you may want to include the time dimension somewhere. And from a scientific perspective weight-based dosage is sometimes worse than body-surface-area-based dosage or whatever.

        • mg/kg is sort of dimensionless (technically it should be 1e6 * kg/kg), that’s the one area where we do see dimensionless dosing. I guess what I meant is something like

          (dose/doseinterval) / (Ak * Cb * D)

          where Ak is the area of the kidney tubule surfaces, Cb is the effective / desired concentration in the blood, and D is the membrane transport coefficient (mass/area/difference in concentration/time). If your drug is active in the blood and excreted through the kidneys, you’d use this measure and you’d want a dose of something of order 1, because at dose 1 the steady state is that the concentration in the blood would be Cb, the desired active concentration. Obviously since there’s some dynamics you’d probably be dosing 2 or 3 perhaps but that’s still O(1)

          I would bet that the typical guy who’s writing Stan code to do pharmacokinetics might have a sense of the importance of the dimensionless formulation like this (Say Nick Adams above) but I don’t think this is how things are explained to doctors, or pharmacists, or especially the nurses who are giving out the doses. I don’t think people are given formulas in which say Ak can be calculated from sex, weight, height, and a measure of kidney dysfunction, or where D is tabulated for different types of drugs.

          Similarly if you’re interested in a drug that’s metabolized in the liver you would probably calculate dosage as something like

          dose/interval / (E*(Cl/Cb)*Cbeff*Vl)

          Where E is the enzymatic activity (mass/time/concentration) (cl/cb) is the relative concentration in the liver to the blood, Cbeff is the effective concentration of the drug in the blood and Vl is the volume of the liver.

          Obviously there are dynamics so some of these things aren’t constant, so for example (cl/cb) would pick some “typical” value, and E would also pick a typical value. You’d have a statistical formula for volume of liver as a function of age, sex, weight. You’d have a tabulated value for E for each drug, with modifications for different health conditions.

          In the end, it’d be the responsibility of a dosing program to take in the necessary information about the patient, calculate the E,(Cl/Cb),Vl and know the Cbeff for the drug, and then output the dose/interval information.

          Instead, what I’ve typically seen is “take one of these three times a day if you’re an adult” and the same thing goes for an 18 year old college female weighing 97 lbs and a male age 35 highly active athlete weighing 195.

          Maybe it’s just I’m fortunate enough to not have needed the kinds of drugs that have careful dosing, like IV antibiotics or something.

          It still seems to me like they should have been scaling dose of the vaccine by something like an estimate of total immune system cells in the body, estimated from bone marrow volume or something similar. Then it’d be basically moles / cell which is dimensionless (counts/count * constant)

        • Sorry there are some dimensional inconsistencies in the above, it’s late, I’m tired, but the point is there are thoughtful ways to make dimensionless ratios that are meaningful for the process and it’s well understood in physics and chemistry that all processes are governed by things that must be expressible in dimensionless form (because of the argument that it can’t possibly affect the world when you change the choice of what you consider to be your units)

        • As has been mentioned before re: Vaccination, there was not enough time to determine an effective minimum dose, so they erred closer to the side of maximum safe dose.

          In practice for actual doctors prescribing medications, the prescribing information is based on whatever measurement is reasonable, that clinical trials have indicated are important. So, certain medicines are dosed by mg/kg, and other’s aren’t. Those mg/kg dosing guidelines are figured out based on the type of pharmacokinetics you described (determined by drug concentration measurements (in sera, or whatever site is the target of the medication) during clinical trials)).

          At the bedside doctors modify dosages according to recommendations in dosing information, based on how the medication is processed in the body and the functioning of those organs in the patient. They are as often decreased due to need to prevent harm to poorly functioning organs… the dosing recs may be calculated with a wide margin for safe concentrations, in the average population, but someone with damaged kidneys or failing liver needs a much smaller margin.

          I know my husband used to modify dosages literally at the bedside based on liver function tests (he touted this need as the reason he was good at mental arithmetic). Since electronic records started I don’t think that he needs to do that anymore (or can make fellows or residents do it).

          WRT vaccine dosages, they don’t dose vaccines that are given every year. Theoretically the evoked immune response and the immune system pathway that is triggered to create the response, is well understood in something like the flu virus and it’s vaccine. And with the possible exception of people with allergies or immune dysfunctions, everyone gets the same dose of vaccine. Well, not true… everyone over 6 gets the same dose.

          We may have been able to figure out some ml/kg dose of vaccine but not until way after we already had most people vaccinated. And I’m not sure there is a way to correlate immune response to any easily measurable body characteristic. We think we know so much about all of these things, but we know far less than we think we do. So while dimensionless units may be theoretically possible. They are in practice actually impossible.

Leave a Reply to John Williams Cancel reply

Your email address will not be published. Required fields are marked *