This one’s important: How to better analyze cancer drug trials using multilevel models.

Paul Alper points us to this news article, “Cancer Conundrum—Too Many Drug Trials, Too Few Patients,” by Gina Kolata, who writes:

With the arrival of two revolutionary treatment strategies, immunotherapy and personalized medicine, cancer researchers have found new hope — and a problem that is perhaps unprecedented in medical research.

There are too many experimental cancer drugs in too many clinical trials, and not enough patients to test them on. . . . there are more than 1,000 immunotherapy trials underway, and the number keeps growing. “It’s hard to imagine we can support more than 1,000 studies,” said Dr. Daniel Chen, a vice president at Genentech, a biotechnology company. . . .

Take melanoma: There are more than 85,000 cases a year in the United States, according to Dr. Norman Sharpless, director of the Lineberger Comprehensive Cancer Center at the University of North Carolina, who was recently named director of the National Cancer Institute. . . . “We used to have trials not long ago that had 700 patients per arm,” Dr. Sharpless said, referring to the treatment groups in a study. “That’s almost undoable now.”

Today, “trials can be eight patients.”

This reminds me of my general view that conventional clinical trials are a bad model for research, that the idea of a definitive randomized study does not work so well in the modern world.

In the the article, The failure of null hypothesis significance testing when studying incremental changes, and what to do about it, I argue that when effects are small, black-box randomized trials are simply not going to work. On one hand, you’ll need huge sample sizes to reliably detect small effects; on the other hand, the world is changing and once you get that huge sample size your target may have moved.

In that paper I discuss two examples from social and behavioral science, but I have every reason to believe the same issues arise in medical research. And, indeed, Kolata’s article emphasizes that many of the treatments being considered are very similar to each other, which suggests that the right approach is a coordinated set of trials analyzed using a multilevel model, rather than a set of independent trials analyzed separately, which leads to a play-the-winner rule followed by the winner’s curse in which the treatment that happens to perform best in a noisy environment ends up overrated.

So there’s a convergence of practical and statistical issues. As usual, the problem with the conventional approach is not just p-values, it’s the whole null hypothesis significance testing attitude that just doesn’t fit with the problems under study and the decisions that need to be made.

P.S. Alper adds:

Possibly because it was covered by my insurance ,possibly because ultrasound is noninvasive and possibly because John McCain was found to have glioblastoma above his eye, earlier this week I biked to a local hospital to have a growth on my forehead looked at. While the technician went out to fetch the radiologist, I noticed that the text on the ultrasound screen said “left eye” when in fact it was over my right eye. As has been noted by many others, hospitals are dangerous places and a surprising number of surgeries are done on the wrong body part.

Damn!

P.P.S. Kolata’s article is excellent but I do have one complaint: every expert quoted in the article is a doctor. The topic is medical research, so, sure, doctors are experts. But these questions are not just medical: they involve statistics, they involve economics, they involve politics too. So I don’t think docs should be the only people interviewed.

34 thoughts on “This one’s important: How to better analyze cancer drug trials using multilevel models.

  1. At least in my corner of medicine, it is hardly ever a “play-the-winner” scenario when medications are concerned. Due to the cost of healthcare, studies are primarily run by the pharmaceutical companies. These studies are expensive: more than $5,000 per patient before the cost of the medicine or intervention. Instead of comparing to the winner, the treatment that is used as the control is the oldest and least expensive (and likely not the most effective). So everyone uses the stand-by comparison of the oldest drug in a non-inferiority design — even though no-one prescribes that drug by itself or as first-line therapy.

    When its not industry-sponsored, the patient (and insurance) usually pays for the treatment and is randomized. So there is a very difficult challenge of recruitment. This results in large, multicentered trials where each center may have a few patients enrolled. These will hardly ever be coordinated with other trials due to the difficulty in recruiting enough patients for one trial and administrative hassles that go into managing one trial, let alone coordinating with other trials.

    If the treatment is surgery, then there are even surgeon-degrees-of-freedom which are never(?) accounted for. I know that my skill at surgery A and B likely differ greatly and so the risk of complications and outcomes likely differs too.

  2. Many of the newly developed drugs have well-characterised mechanisms of action (sometimes even the mechanisms for which they were designed), so the trials are not always about the individual drugs, but about the usefulness of modulation of the biological mechanism (the “target”) as a treatment. It would be a mistake to assume that the question relates more to statistics than it does to biology.

    • Do they? It is quite easy to NHST your way to an entirely incorrect mechanism. I’ve mentioned before that to this day most preclinical papers fail to perform the most basic step of quantitative analysis like plotting x vs y when they claim x is causing y.

      Once they do have such a plot then the next step is to figure out why the y = f(x) curve looks the way it does (ie what would be some good candidates for f, and what process would explain that?). Then how often is the x vs y curve generated by an independent group to double check it is actually something worth explaining? These last two steps are nearly non-existent.

      • Take immunotherapy for example. How many studies are ruling out activity via Fc-receptors? Its relatively well known you need to do this for staining but afaik is rare in the case of looking at a biological effect. How many of these antibodies are actually raised towards proteins with natively unstructured domains, so they may to interact with any amyloid in the tissue?

      • Yes and no. I used to be in this field before switching specialty and I anticipated the problems discussed in the article, among other problems that I don’t think are widely appreciated yet.

        Often, these drugs have a very well-characterized proximal mechanism that has been linked to a target biological pathway with a predicted causal effect on the disease process. Sometimes this is very straightforward, for instance, a group found an activating mutations in a known oncogene in some types of cancer, and want to test a drug that directly inhibits this mutation. The mutation would have been picked because it has a known inhibitor or is otherwise part of the “druggable genome” and because of studies that either overexpress or inhibit the mutation using some other mechanism (other drug classes not suitable for use in humans, RNAi, CRISPR-Cas9, etc). Or for immunotherapy, something like the PD-1 inhibitors targets a fundamental component of the immunological synapse. So often, a direct target of the drug is known, and there is a strong link between that target and some part of the disease process. The most attractive targets have usually been identified by multiple competing groups.

        Of course, there’s a lot of screening going on to get to this point, with many forking paths and other statistical and biological problems. But there is a very large incentive to perform mechanistic validation of the top hits with multiple lines of evidence in order to publish in the highest-impact journals. Some of these validation experiments are of the classical biology type (the effect size is supposedly so large that quantification or statistics is not even done, though this is slowly improving) and usually done in the academic setting, at least initially before either spinning off a start-up company or being acquired by a larger company for either further development or clinical studies. It’s not like all there is supporting the candidate drug is some dose-response or other y=f(x) curve.

        The mechanism may be wrong, or at least incomplete, for several reasons. First is shoddy biology – this is a matter of skill, time, and resources: using appropriate controls and model systems, and so forth. Mechanistic studies in biology are very hard to do convincingly. When working in a “hot” area, there is always the risk of getting scooped, not to mention that the costs of the experiments can blow up very quickly, so there are strong pressures to be efficient. All too often this means skipping some important validation experiments. Sometimes this is because the controls just weren’t considered, because it is “standard practice in the field”, or because the researcher would rather wait and see if a reviewer asks for it. This is a shame when done intentionally but it is also pragmatic, so not terribly uncommon. It is also important to get the ball rolling early because of how long it takes to translate preclinical findings into a FDA-approved treatment.

        Another is off-target effects, which sometimes are as important or more so than the intended effect. For instance, many of the small molecule drugs used in oncology actually inhibit multiple targets because they bind with various affinities to relatively a well-conserved epitope in a whole class of proteins (a “dirty” drug). This can be beneficial in a “two birds with one stone” sort of way, and in fact “dirty” drugs tend to be the most efficacious. Being too pure can also be a problem, such as with the well-known COX-2 inhibitor story (i.e. Vioxx and Celebrex). There has been a lot of work in the past 10 or more years on this topic.

        If we set that aside and assume the biology was solid and that off-target effects are negligible, there are other problems. While the proximal mechanism is usually completely worked out, biological systems are so complex that it is not always clear what this means for the entire system. Also, hitting the target or off-target effects in non-target cells can change the tumor microenvironment in either beneficial or harmful ways. Cancer cells in a tumor often have several different phenotypic states, and the drug may have paradoxical effects in, for instance, the cancer stem-like cells. Or there could be an effect in the (myo)fibroblasts or other cells in the tumor milieu. There are a lot of academic groups working on these issues to at least improve predictions about these other effects, which is important but not enough.

        An understanding beyond the proximal steps of the proposed mechanism of action is essentially impossible to completely work out. There is just too much biology that is not well-characterized on both the cellular and molecular levels, and some of the steps are difficult to study with current biological tools. This challenge is why biology can be fun and interesting. It is also why clinical trials are absolutely necessary no matter how much pre-clinical work has been done. Done right, examining the responses and side effects that were observed can help refine some parts of the mechanism or uncover new biology.

        In the case of shoddy biology, even those first few steps can be wrong.

        And as target populations get smaller and smaller (as mentioned in the article), it will become more difficult to learn from these clinical studies. It’s been recognized for years that traditional clinical trial designs are not suitable for “precision medicine”, but what to do moving forward is in my opinion still unresolved. Certainly, the fact that several treatments with similar proximal mechanisms are being tested in a variety of different but related populations is a perfect set-up for a meta-analytic approach, but I think there is are some real dangers here when it comes to interpretation (among other issues).

        I personally am not a fan of the approach that some of the leaders in the field are taking, like basket trials, though admittedly it has been several years since I was in the field so maybe there have been some breakthroughs in trial design I am not familiar with. I think that there is a lot more pre-clinical work that should be done so that the limited number of patients that are both eligible and willing to participate in clinical research can be used more efficiently. In other words, informative beyond just the primary endpoint. Bench to bedside is meant to be a cycle.

        • Often, these drugs have a very well-characterized proximal mechanism that has been linked to a target biological pathway with a predicted causal effect on the disease process.

          Perhaps its just that I don’t use the term “well-characterized”. What does it mean to you? To me it would mean:

          1) Some direct replications have been done so that we can be confident that if x is done we will see y. There is still danger of extrapolating, but as long as the methods are repeated we know what will be observed
          – I suspect instead there will be very few, if any, direct replications (these are “non-novel”) and a lot of “oh you used cells from 1 month old rats grown on substrate R, we used cells from 2 month old rats grown on substrate S” and the like to explain any differences.

          2) Some kind of quantitative model of what is happening, even for the most controlled scenarios.
          – There are N1 molecules of receptor A on cell type Z under these conditions, each covering X nm^2 of the surface at density D mol/nm^2. Ligand L was maintained in solution at concentrations C1-Cn which correspond to receptor occupation fractions f1-fn. For each occupied receptor there N2 molecules of signal transducer B are generated which goes on to activate N3, N4, etc effector enzymes C, D, etc.
          – It doesn’t need to be perfect, just good enough give us order of magnitude values so this stuff can be sanity checked. In most cases I suspect we’ll have no idea what these reaction rates, concentrations, etc should be. Basically we are missing all the info required to come up with a model to begin with.

          That would be a start, then it becomes time to do the same for “off target effects”, etc.

        • Here is an example of a quantitative bio model I like:

          …most of the chemical parameters remain unknown. Because many of the reaction coefficients in Fig. 1 B are also unknown…

          https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1366631/

          I would say for a drug that is supposed to work via directly or indirectly influencing rhoA activity, we need to know all those parameters for it to be well characterized. And that is at a minimum, since perhaps once the parameters are constrained we find out this model does not work at all. We probably need to check competing models that will require other parameters, etc. It would be an interesting project to check the literature and see how much knowledge has been accumulated about these parameters compared to 2005.

  3. I mostly agree with you and your general sentiment. It’s hard to speak in generalities about all of drug development, but I can at least give you the perspective of a cancer biologist. I’ve seen you comment on this blog about cancer modeling before, so hopefully this is useful. I’m most directly familiar with kinases and signal transduction in cancer – I worked on the EGFR family receptor tyrosine kinases, which has very strong links to cancer and many successful FDA-approved inhibitors that work through very different mechanisms for lung, breast, and colorectal cancers among others. For context, my PhD thesis was in cancer biology, I also have an MD, and I work at an academic research hospital.

    The biological meaning of “well-characterized” refers to simply mapping out the connections (whether direct or indirect), which in itself is very challenging to do convincingly. This initial step takes between 1 – 4 years for most labs. New modulators are found all the time, so these connections should be considered more of a broad sketch than a definitive answer (despite whatever the authors may say). It’s just a working model for the mechanism of action.

    A typical approach would be this. Let’s assume some sort of prior information (literature review, screen, patient mutation data, etc) led the cancer biologist to identify protein HIT as a druggable candidate for cancer CA. First, using a typical cell line for CA that’s easy to manipulate, there will be an attempt to link HIT to some sort of phenotype: cell proliferation rates, cell death, differentiation, cell invasion, etc. This would be done by either overexpressing HIT (many different ways to do this) or knocking it down (usually genetically), and conventionally done in triplicate (biology in general has the poorly defined definitions of “technical” and “biological” replication). Suppose in this case, HIT overexpression increases proliferation rates, and knockdown with RNAi decreases proliferation rates. The next step would be to look at the state of whatever is known to be downstream of HIT, as well as the known mediators of proliferation in that cell line. So let’s say that transducer B and the MAPK pathway both seem to be activated when HIT is overexpressed, and inactivated when it is repressed. Next, they knockdown B and show that the MAPK pathway is no longer activated when HIT is overexpressed, and that both B knockdown or MAPK inhibition are able to rescue the HIT overexpressing cells from the increase in proliferation. In this case, the working model would be HIT -> B -> MAPK -> proliferation. The HIT -> B -> MAPK is what I meant by the proximal mechanism. This would not be enough to get published in anything other than the lowest tier journals, and even those journals would typically want a second independent cell line to be used. In order to get into a mid-to-upper level journal, they would need to demonstrate the phenotype and basic pathway in at least a mouse model, and now it’s becoming more technically feasible to test human-in-mouse xenografts (i.e. a panel of real patient-derived tumors). For a druggable target in cancer, there should also be some analysis of clinical data: histology, genomics, gene expression patterns, etc, with some correlation to a clinical outcome suggesting a target population. For something promising enough to go to clinical trials, if it’s novel there will likely be other labs that will try to refine the biology (e.g. after IDH1 mutations were identified, multiple groups raced to publish in Cell/Science/Nature about the mechanism), and if it’s not novel, then there is usually a large related body of work already on the subject.

    There are a lot of problems with that typical approach, but taken together it is unlikely that it is all spurious from NHST or other purely statistical issues because of how many different lines of evidence are needed prior to testing in humans in modern times. It’s like the swiss cheese analogy used in medical errors: there are holes, sometimes big holes, in each layer, but they all would have to line up for something to slip through. A cancer biologist (if they trust the data) would conclude that HIT inhibition reduces cell proliferation in cancer CA via the MAPK pathway. And at this level, there are usually more important qualitative criticisms than quantitative criticisms: i.e. how strong is the evidence linking HIT -> B -> MAPK, are there pieces missing in the middle, what kind of feedback exists, what about MAPK-independent effects and do those crosstalk with the MAPK pathway downstream, and so forth. Also, the pathway may be different in other cell types or even phenotypic states of the cancer cells. For this sort of classical cancer biology, this high degree of “biological uncertainty” seems to overwhelm the statistical uncertainty or quantitative uncertainty.

    The quantitative modeling of those connections is often missing or at least incomplete, and I agree with you that it would be informative to at least have some basic modeling of that working model. It can be helpful for understanding whether there are significant pieces missing above, and as a sanity check like you say. Not enough biologists have experience with systems/computational biology or any quantitative modeling for that matter. Quantitative modeling like with the RhoGTPase paper above is useful for understanding how subnetworks can act like computational modules and lead to nonlinear phenotypic responses. This was supposed to be how systems biology would revolutionize the field, but while it’s improved understanding of network architecture, a lot of that potential has yet to be realized. And not for lack of trying; computational biology was very hot (in terms of grant funding) when it was emerging, it’s just that there is too many unknown unknowns. The methods have come a long way and there is a lot of modern literature on exactly this sort of approach, but it has been challenging to translate this type of knowledge into clinical benefit.

    From a biologists perspective, building an empirically-validated quantitative model would be very expensive (time and money) depending on how much of the network is being modeled, almost certainly missing some factors of variable importance, unlikely to be generalizable, and still not enough to make reliable predictions. So the working model is left as qualitative: the drug works (at least in part) through this mechanism, but the details may not be complete, and there could even be other more significant mechanisms of action that haven’t been elucidated yet. Quantitative modeling may help refine and improve the working model, but not likely to completely invalidate it, as long as the initial biology was correct.

    Usually, things like the number of molecules/cell and binding coefficients have been worked out biochemically, but that’s not enough to predict behavior of the system. Take the EGFR family as an example, a prototypic receptor tyrosine kinase, which has 4 family members that can either homodimerize or heterodimerize when activated, and then activates a large number of signaling pathways. For simplicity, assume just EGFR is expressed, and only 1 ligand expressed, and we are only concerned with EGFR activation of MAPK signaling. The basic biochemical steps are known: Ligand binds to the extracellular domain, which dimerizes, causing the kinase domain to dimerize asymmetrically, which then phosphorylates the C-terminal tail at a large number of sites, creating binding sites for adaptor proteins, which assemble a variety of complexes that then create other signals. This all happens during internalization and trafficking of the receptor complex. The number of EGFR molecules expressed on the surface are known for many cell types (though with clustering in lipid rafts), as is the rate of synthesis, recycling, and degradation. Ligand affinities, binding constants between the extracellular domains, binding constants for the intracellular domains, level of kinase activity, and so forth have also been characterized by multiple labs using a variety of techniques. However, the biochemistry of EGFR-EGFR interactions is still being worked out by multiple labs after more than 30 years of intensive study. For instance, it is not proven how extracellular dimerization results in kinase-domain dimerization, or how the C-terminal tail (which is intrinsically denatured) then gets phosphorylated with different patterns with different qualitative effects. Both qualitative and quantitative pieces are missing.

    Binding of a ligand to EGFR does not produce a linear response in EGFR activation, which is surprisingly difficult to define in a general way, and ligands with different affinities can produce qualitatively different activation states in a way that is not fully understood. There are also different splicing isoforms and other complexities in a real cancer cell. In order to develop a quantitative model of EGFR signaling to MAPK, you would have to gloss over a lot of this biology. That’s still useful and can show interesting phenomenon about the system. See for instance https://www.nature.com/articles/srep38244 (sorry if it’s behind a paywall for you). But it’s not essential prior to clinical testing given how many other layers of biological (and also medical) uncertainty will still remain.

    Personally, I have problems with the standard approach to drug development, but that’s perhaps for another time and not directly related to this topic. I’m generally skeptical of the biology. I’m also not discussing the “me too” drugs that pharmaceutical companies are always trying to develop. I could go on and on, but I’ve spent enough time here and I hope I have at least explained how a typical cancer biologist thinks about this topic, and that this was helpful.

    P.S.: Regarding your other comment above about antibodies, it is standard practice to compare the biological activity of an experimental antibody with isotype control antibodies that should have similar nonspecific and Fc-receptor binding as the experimental antibody, as well as immunohistological evidence of which parts of tissue are binding the antibody. It is also common to have molecular-level evidence as well: immunoprecipitation, immuno-electron microscopy, FRET, etc. In the case of EGFR-family antibodies used in cancer treatment, there are mutations that cause resistance to those antibodies by changing the epitope, arguing for a specific antibody-antigen effect. Amyloid and intrinsically unstructured domains are two different things, but in general, an antibody to an intrinsically unstructured domain will only have a strong affinity to that domain by inducing a conformational state in and near the epitope, and this can have variable degrees of sequence-specificity. How specific the antibody is depends on many factors, but these are usually used as tools for basic biology, and I’m not aware of any clinically tested antibody raised against an intrinsically disordered epitope. It would still be possible to be highly specific, and no reason to suspect it will have higher binding to amyloid.

    • Thanks for taking the time to compose your thoughts. This is a great example of tacit knowledge — which is hard to find in the literature and why google scholar university is far inferior to actually working in a field.

    • > It’s like the swiss cheese analogy used in medical errors: there are holes, sometimes big holes, in each layer, but they all would have to line up for something to slip through.

      Nice analogy, I have not seen it before. Thanks for taking the time to write that, it is an interesting read.

      • Wow! Thank you for an outstanding pair of comments. I would like to think that my brief comment at the start was intended to capture all that you added, but it didn’t. This is the bit that captures what I had in mind: “For this sort of classical cancer biology, this high degree of “biological uncertainty” seems to overwhelm the statistical uncertainty or quantitative uncertainty.”

        • > this high degree of “biological uncertainty”
          But the only known cure for high biological uncertainty is randomization that works (for group comparisons) at rate sqrt(n per group)…

        • You misunderstand the intent of the quote. The biological uncertainty is uncertainty regarding the relevance of findings from one biological model to predicting the behaviour in another. Statistics tell us nothing about that, and randomisation does not help.

      • The swiss cheese analogy is also a good way to think about catastrophic events such as submarine accidents and the bursting of the 2008 housing bubble. When everything lines up, watch out.

    • Overall I see where you were coming from. I was trained to think the same way, but have concluded it is very wrongheaded. Also I think this is the fault of NHST, which has its tentacles much deeper into your thought process than you realize. I’ll split up the response and hopefully get through it all.

      Paragraphs 1-4 could be “Problems with the typical Biomedical Approach”, Paragraphs 5-6 I’ll call “Application of quantitative modelling to biology”, 7-8 will be “EGFR signaling”, and 10 is “Issues with anti-bodies as research tools”.

      Problems with the typical Biomedical Approach

      First, using a typical cell line for CA that’s easy to manipulate, there will be an attempt to link HIT to some sort of phenotype: cell proliferation rates, cell death, differentiation, cell invasion, etc. This would be done by either overexpressing HIT (many different ways to do this) or knocking it down (usually genetically), and conventionally done in triplicate (biology in general has the poorly defined definitions of “technical” and “biological” replication). Suppose in this case, HIT overexpression increases proliferation rates, and knockdown with RNAi decreases proliferation rates. The next step would be to look at the state of whatever is known to be downstream of HIT

      You’ve skipped over the entire hard part. How are “cell proliferation rates, cell death, differentiation, cell invasion, etc” being measured?

      What will happen is you will think you are measuring something like cell proliferation but actually you are measuring ATP levels, staining with a certain antibody, volume of a tumor, or whatever. The actual measurement may be altered for reasons other than, or in addition to, proliferation.

      There are a lot of problems with that typical approach, but taken together it is unlikely that it is all spurious from NHST or other purely statistical issues because of how many different lines of evidence are needed prior to testing in humans in modern times.

      The problem with NHST isn’t spurious results, its that only a “null model” is tested. This model is (usually, if its not then I have no problem with it at this time) different from the “research model”, much more precise than the research model, and its rejection used to draw conclusions about the research model. In some magical way the precision and testing of the null model get conferred onto the vague and untested research model. The mathematical details of how this is done (p-values vs bayes factors vs whatever happens in the brain while looking at a graph) are pretty much irrelevant.

      A cancer biologist (if they trust the data) would conclude that HIT inhibition reduces cell proliferation in cancer CA via the MAPK pathway.

      It isn’t a matter of trusting the data. It is a matter of interpreting the data and distinguishing between different explanations for the observations.

      It’s like the swiss cheese analogy used in medical errors: there are holes, sometimes big holes, in each layer, but they all would have to line up for something to slip through.

      I’ve heard similar before and don’t believe this is true in practice. It is quite easy to just keep running assays and eventually come up with one of these qualitative pathways (eg, HIT -> B -> MAPK)[1], always finding some (legitimate) excuse to ignore data you dislike. Without attaching numbers to the reaction rates, concentrations, etc it is impossible for anyone to sanity check this.

      Also I take as a principle that the expression level of every single gene is going to have non-zero effect on proliferation, etc. This may be very small and very indirect, but it will be there. So it isn’t even a matter of “does this pathway exist”, it is a matter of “how relatively common is this pathway”. The wrong question is being asked of the data.

      Finally, interpreting such data in terms of a “pathway” (as opposed to a network) is probably inadequate to begin with. Sure, “you have to start somewhere”, “bio is messy”, etc. I don’t see how that means terms like “well-characterized” should be thrown around.

      Also, the pathway may be different in other cell types or even phenotypic states of the cancer cells. For this sort of classical cancer biology, this high degree of “biological uncertainty” seems to overwhelm the statistical uncertainty or quantitative uncertainty.

      Sure this happens too. However, I would say uncertainty about the relationship between what was measured and what the researcher wants to know is a bigger, more fundamental problem to be dealt with first. The reason extrapolation is so dangerous is that there is no good model for what is going on. The ability to precisely accurately extrapolate may even be my definition of a “well characterized” phenomenon.

      [1] See the discussion and figure 3A here: https://www.ncbi.nlm.nih.gov/pubmed/12242150

    • The quantitative modeling of those connections is often missing or at least incomplete, and I agree with you that it would be informative to at least have some basic modeling of that working model. It can be helpful for understanding whether there are significant pieces missing above, and as a sanity check like you say. Not enough biologists have experience with systems/computational biology or any quantitative modeling for that matter. Quantitative modeling like with the RhoGTPase paper above is useful for understanding how subnetworks can act like computational modules and lead to nonlinear phenotypic responses. This was supposed to be how systems biology would revolutionize the field, but while it’s improved understanding of network architecture, a lot of that potential has yet to be realized. And not for lack of trying; computational biology was very hot (in terms of grant funding) when it was emerging, it’s just that there is too many unknown unknowns. The methods have come a long way and there is a lot of modern literature on exactly this sort of approach, but it has been challenging to translate this type of knowledge into clinical benefit.

      From a biologists perspective, building an empirically-validated quantitative model would be very expensive (time and money) depending on how much of the network is being modeled, almost certainly missing some factors of variable importance, unlikely to be generalizable, and still not enough to make reliable predictions. So the working model is left as qualitative: the drug works (at least in part) through this mechanism, but the details may not be complete, and there could even be other more significant mechanisms of action that haven’t been elucidated yet. Quantitative modeling may help refine and improve the working model, but not likely to completely invalidate it, as long as the initial biology was correct.

      Application of quantitative modelling to biology

      This was supposed to be how systems biology would revolutionize the field, but while it’s improved understanding of network architecture, a lot of that potential has yet to be realized. And not for lack of trying; computational biology was very hot (in terms of grant funding) when it was emerging, it’s just that there is too many unknown unknowns.

      The way physics is set up you have the theorists and the experimentalists. The experimentalists collect data to test and distinguish between the models of theorists. In this way not everyone needs to be an expert in everything, the people collecting the data are less likely to have a strong bias for/against a given theory, etc.

      In bio you have the modellers who do their job. Then you have the experimentalists that aren’t collecting the data needed to constrain/test/compare the models (eg reaction rates, number of cells in a tissue, division rates, stuff like that). Instead they are out there doing NHST saying A is positively correlated to B, C is negatively correlated to D, etc. So there is really no effort being made to reduce the number of known unknowns, let alone turn the unknown unknowns into known unknowns. The people supposed to be doing that have been distracted into ruling out null hypotheses and thinking they are making discoveries.

      The methods have come a long way and there is a lot of modern literature on exactly this sort of approach, but it has been challenging to translate this type of knowledge into clinical benefit.

      I agree. Without an effort to constrain the models with data it is unlikely anyone will benefit.

      From a biologists perspective, building an empirically-validated quantitative model would be very expensive (time and money) depending on how much of the network is being modeled, almost certainly missing some factors of variable importance, unlikely to be generalizable, and still not enough to make reliable predictions.

      There is plenty of money sloshing around for this, just right now most of is wasted on these sloppy NHST studies.[1,2] If instead we devote that to constraining various biological parameters, relying on direct replications by independent groups of multiple ways to estimate the same values, I’m confident that great progress will be made quickly. This actual scientific approach has never even been tried but is already dismissed.

      Quantitative modeling may help refine and improve the working model, but not likely to completely invalidate it, as long as the initial biology was correct.

      I disagree. Quantitative modelling is crucial to distinguishing between different explanations for what you are observing. It is just infeasible to rule out every other way that A could be leading to an increase/decrease of B, there are too many. Then you have every way that this relationship could change due to context (ie be “modulated”, or even reversed). While I don’t know since there is very little data, I suspect that many of the qualitative models will turn out to be “not even wrong”. Eg HIT -> B -> MAPK, sure. But also:
      HIT -> C -> MAPK
      HIT -> MAPK -> B
      HIT -> B -> C -> MAPK
      HIT -> B -| MAPK
      HIT -> B -| GTPase
      HIT -| B -> GTPase

      Etc. It will all depend on the context and timeframe looked at. It seems you are trying to argue on the one hand that “biology is so complex”, yet also say that we have “well characterized” the effects of throwing some new chemical into the mix. I think this has not been done in the case of the most controlled and simple biological systems, let alone a living organism.

      Anyway, I think I can now paraphrase what is meant by “well-characterized”. It means “well-characterized relative to other things that have been researched the same way”. It is recognized that this level of characterization is only the very tiny tip of an iceberg of better understanding, but it just seems impossible to do better given current political/financial constraints.

      [1] http://www.sciencemag.org/news/2015/06/study-claims-28-billion-year-spent-irreproducible-biomedical-research
      [2] I’m sure you can find all the papers on low levels of cancer research reproducibility, etc

      • I was waiting for this to appear after posting… Those first two paragraphs are just the overall quote of what I was responding to, and should not be there. Mods should free to delete them along with this comment.

      • I’ve been holding off on this as it will reveal me to be an idiot ( https://www.youtube.com/watch?v=vW3ti9mLnDg ); but I don’t think the solution to the cancer problem lies in assembling a bunch of regressions that collectively, and thus vaguely, point in a direction slightly starboard or port of our current heading.

        Did you see the story about small cell lung cancer tumors creating their own stomachs and intestines? Genes are a toolbox and not Newtonian billiard balls. If true then cancer is more like “The Thing” than the red at the center of a target. Accordingly, when it comes to cancer it’s Psi and not Theta that counts; and the solution is coherence and not averaging.

      • Sorry for the delayed response, but I have been busy recently.

        Just to clarify, my intention was to outline the thought process of a typical cancer biologist (not my own), so that the underlying logic is made more apparent. Your last paragraph is spot on. I think, however, there is still some miscommunication here.

        Regarding the improper use of the null model in biology, I completely agree with you. There is a lot of “yes/no/partially so” in the way that results are interpreted, as well as experiments designed to show “necessary/sufficient” conditions. A lot of classical molecular and cellular biology lacks a formal decision-theoretic framework (other than NHST here and there). Instead, interpreting the significance of any given assay, or even the strength of a paper, are big grey areas left up to the individual.

        It’s a problem that contributes to all of the problems that you bring up. I think we’ve seen time and again that many researchers, after glimpsing a hint of a discovery that would be really interesting or groundbreaking in their experiments, will often lower their standards for evidence when making these decisions. Even for those who are well-intentioned, this can lead to ignoring contrary evidence in an irrational way (like ignoring an assay as a “technical” failure when it may have been correct), rejecting criticisms from others, overstating conclusions, and the like. Add the pressures of tenure or other promotions, grants, finding jobs for your students, reputation, sunk cost fallacy, etc and you get a lot of misguided, faulty, and irreproducible science. This can happen in any field of science, and definitely also plagues biomedical research.

        On the other hand, my arguments above were predicated on there being strong biological evidence. By this, I mean that every finding is supported by several different lines of evidence. This is indeed “the entire hard part” and I skipped over it intentionally. For something like cell proliferation, for instance, I would expect to see at a minimum the doubling rates from a cell growth assay, a cell cycle analysis, expression of cell proliferation markers, and a cell tracing experiment (e.g. genetic or BrdU labeling). All of this would be in addition to a mechanistic link to known cell proliferation pathways (e.g. MAPK above). Assuming those studies were done correctly, it would be fairly convincing evidence that HIT -> cell proliferation. There may (and undoubtedly will) be other effects, but there are two direct measurements of cell growth (doubling and cell tracing), two indirect measurements (cell cycle and markers), and a plausible mechanism. It is generally understood that the plausible mechanism is incomplete and thus highly context dependent, for instance there are many examples of genes that are tumorigenic in one setting and tumor suppressors in another.

        As a side note, in the toy example in my other comment, again assuming that the biology was done properly, the HIT -> B -> MAPK -> proliferation pathway is supported by several lines of evidence showing HIT -> proliferation (regardless of ATP levels, volume, etc) since increasing HIT increased proliferation and decreasing HIT decreased proliferation, then showing that MAPK signaling was necessary for these effects on proliferation, and showing the B was necessary for both the proliferation and MAPK findings. Quantitative modeling will not invalidate those results unless they were done improperly (and there are more direct ways to test that). But it certainly can help refine and build the working model complementary to direct wetlab experiments.

        This context dependency is a result of the “biological uncertainty” that is inherent in the field. We are not able to open up a biological system like a radio and know every single component and how they are put together. Some of the direct interactions between highly studied proteins are still not fully understood. This is why most biologists would have a problem with spending resources designing experiments to “constrain biological parameters.” Without knowing all of the pieces and what parameters to measure, not withstanding how to measure them, this will mean that any measured parameter is also context dependent. In my EGFR example above, just establishing the biological parameters for each step in receptor dimerization and activation would be extremely challenging without glossing over several layers of regulation.

        I like your point about experimentalists and theorists in physics, and I agree that biology would benefit from a more coordinated approach. Any time a model is built, there needs to be a clear purpose, and the assumptions need to be justified for that purpose. In this case, the assumptions would include which context to study and which parameters should be measured in that context for that purpose, and the experimentalists would work on how those measurements would be done. I think this is very intriguing and I actually work with physicists right now doing something similar for medical imaging. I’m not in a place to do it now, but maybe in the future we could collaborate and test how this would work in practice?

        • Thanks, lots here that is good and still I hope to get to the EGFR stuff. At this point I followed the lit back to when the oligomerization (now apparently just dimerization?) idea was first proposed in the late 1970s. Some of the reasoning is quite convoluted (eg cyanogen bromide treated EGF has less affinity than non-treated EGF but if some specific antibody is added to cells after a certain number of passages in the same culture then the rate of DNA synthesis is still increased by a similar amount, therefore we are seeing intermolecular activation…), so I am still trying to parse it well enough to come up with an alternative interpretation of those early results (after which we can see how to play with it to make the later results fit). It may take a bit.

          This context dependency is a result of the “biological uncertainty” that is inherent in the field. We are not able to open up a biological system like a radio and know every single component and how they are put together.

          The main claim of the paper was that even if you could “open up a biological system like a radio”, the methodology would still fail to figure it out:

          One of these arguments postulates that the cell is too complex
          to use engineering approaches. I disagree with this argument
          for two reasons. First, the radio analogy suggests that an
          approach that is inefficient in analyzing a simple system is
          unlikely to be more useful if the system is more complex.
          Second, the complexity is a term that is inversely related to the
          degree of understanding. Indeed, the insides of even my simple
          radio would overwhelm an average biologist (this notion has
          been proven experimentally), but would be an open book to an
          engineer.The engineers seem to be undeterred by the complexity
          of the problems they face and solve them by systematically
          applying formal approaches that take advantage of the everexpanding
          computer power. As a result, such complex systems
          as an aircraft can be designed and tested completely in silico,
          and computer-simulated characters in movies and video games
          can be made so eerily life-like.Perhaps, if the effort spent on formalizing
          description of biological processes would be close to
          that spent on designing video games, the cells would appear
          less complex and more accessible to therapeutic intervention.

          https://www.ncbi.nlm.nih.gov/pubmed/12242150

          maybe in the future we could collaborate and test how this would work in practice?

          I am very hesitant to take on any bio projects I am not in total control over, but who knows what could make sense in the future.

        • I was not trying to get into discussing EGFR directly, only to give an example of how there are still fundamental uncertainties about how it works. Methods have come a long way since the 70s, with cloning and site directed mutagenesis, crystal structures of key domains, molecular dynamics simulations, FRET, single-molecule imaging, and so forth. There is some evidence of higher order oligomerization but it remains controversial, so more work has been done on dimerization since it’s easier to study. For example, cross-linking experiments could give a false positive because the receptors cluster in lipid rafts at high local densities, and symmetry in crystal structures can be an artifact of the crystallization process. See https://doi.org/10.1038/ncomms13307 (should be open access) for a recent attempt to experimentally show oligomerization.

          On a related note, there is a tendency in biomedical science to discount or ignore studies from over 10 or so years ago. I fell into this trap for a while too, and it’s even taught in college (e.g. you need to cite at least 3 references from the past 5 years in your report). It’s too bad, because there’s a lot of treasure to be mined by looking back at the old literature with a modern understanding.

        • Yes +1 !!

          Something i was/am puzzled about to this day. I have also heard that reviewers/editors/journals sometimes ask for “more recent” references.

          It could be an important part of the giant academic/publishing scam in the way that this practice may result in:

          1) impact factor (IF) abuse and/or support (e.g. if i am not mistaken IF is only measured over a few years, which seems very strange to me. If anything i reason journals/authors should get credit for publishing science that holds up over time, not just 2/3/4/5 years)

          2) stating that science is always “self-correcting”, and “developing”, and “needs more research”, while it could be the case that it is merely running in circles or putting a new name/label on already familiar/reserached things (e.g. “grit”/”conscientiousness”)

          This to me makes no sense, and yet another reason why journals/editors/peer-reviewers/universities may be directly responsible for many things that are wrong in science.

Leave a Reply to Eric Cancel reply

Your email address will not be published. Required fields are marked *