In general I prefer plotting the function to reporting coefficients. When I work with nonlinear functions I often use radial basis expansions, because they’re so flexible. Rather than emphasizing the formula I’m using, which is really just a generic thing, I’d tend to draw spaghetti plots of posterior draws of the function. In a 2D context, I might do small multiple plots of the surface, or draw curves at discrete levels of say the B value.

So, you’re estimating f(I,B) as a continuous function, but for display purposes you might plot f(I1,B), f(I2,B) each as functions of B alone.

To me it makes sense to start with the idea that you want to estimate a 2D function f(I,B) and then maybe make the conscious choice of doing some simplification. It shouldn’t be the case that we start with the simplified ideas and then treat them as essentially “the way it’s done”.

]]>Yea in theory I don’t disagree that you should think about estimating f(I,B) directly. I myself built a package to estimate HETE effects using nets: https://github.com/Ibotta/mr_uplift.

But binning can be effective in practice for a few reasons.

1) smaller parameter space. fitting spline might be better in theory but if you have limited data it might be more efficient to bin data first

2) interpretability / sql predictions. might be easier to explain to people what the model is doing with if logic then 2.3x^5+-12x^4+…

So if there’s an interaction I’ll assume for example that both I and B affect H in a nonlinear way… and so it’s perfectly possible to simply fit a nonlinear 2D function. The discretization seems like kind of a poor-man’s technique when your software won’t allow you to fit anything other than linear functions. By discretizing say B you can then fit linear functions to “low, medium, and high” which allows you to basically fit a kind of piecewise function to the behavior… but you’re better off just thinking about f(I,B) as a nonlinear function and fitting it directly.

]]>I think the point he’s making is that the interaction can be nonlinear. One way to estimate that would be to bin into order to estimate that nonlinearity.

]]>I’m still not getting it… is “whether they have some illness” the outcome or a covariate?

I could see the outcome being say hospitalization, as a function of illness and vit B…

H ~ f(I,B)

or the illness itself as a function of vit B

I ~ g(B)

Either way, I don’t see why B would need to be discretized

]]>Fair enough. Let’s take the Vitamin B example. We have repeated measurements of 100 individuals for whether they have some illness, which we model as an IID varying intercept. We also have estimates of Vitamin B levels, a continuous measurement which is of primary interest for the study. We suspect that Vitamin B interacts differently with different individuals, which ordinarily might be handled through a varying slope, but here, it would require discretizing the Vitamin B variable in some way to create the slopes.

]]>+1

]]>Nice example to illustrate the point.

]]>Can you give me an example problem, something more concrete? like sticking with your vitamin B example. Perhaps I could better understand what you mean.

]]>If you aren’t struggling to fit your model, it isn’t complex enough to meaningfully describe your real-world problem.

]]>Well, some software does this automatically and some does not. My point was that people don’t know the “right” way (if there is a consistently best one) to interact a continuous variable with varying intercepts and are probably reluctant to discretize toward that end.

]]>if you’re going with a smoothing spline, it seems like you might as well smoothing spline the unbinned data. I think binning is mostly useful as a way for people who don’t know how to specify nonlinear models well to nevertheless fit nonlinear models.

]]>If you want to model that sort of interaction, one option could be to discretize the continuous variable into a few, meaningful bins that could serve as potential slopes to vary against the modeled group (e.g., deficient, average, too much instead of a raw measurement range). But many have been drilled to believe that “binning” continuous variables is per se bad and results in loss of information, not gain. In truth, especially with the rise of penalized smoothing splines, discretization can often produce more accurate results, particularly if it is capturing non-linearities or allowing an interaction that would otherwise be missed. But I suspect it goes against the training of many analysts to even try it.

It would be helpful for a paper (or blog post!) to confront this mentality directly and show when it turns out to be good advice and when it actually stands in the way of a better model.

]]>I think your point about mechanistic modeling is important. A statistical model can detect “effects”, but ultimately it is just a fancy way of redescribing the data. Too often (as in a lot of branches of social science and medical research), the statistical model is treated as the end goal, when really it is just the starting point.

]]>+1

I’m reminded of Tamiflu. If you take it early enough after you get the flu, the disease does not develop. If you are even a few hours late, the drug is useless and you get sick. But because drugs are scored by average treatment effect, and we don’t know how long folks were sick before they took it, the efficacy of Tamiflu is described as “shortens the duration of illness by 12 hours” or something like that. The odds that your illness will be reduced by 12 hours is close to nil.

]]>Agree with the post and the comments. One observation:

I work in medicine, often trying to model treatment outcomes for patients. As you might imagine (or, maybe not, if you aren’t familiar with how medicine “really” works) the heterogeneity in response is GIGANTIC for most treatments.

I often say that the easiest way for us to improve outcomes is to stop treating patients who won’t get better. While this seems obvious (and almost trivial/circular in reasoning), it is shockingly difficult to implement. Even if we can identify the “poor” candidates (which, we often can) it is nearly impossible to get the MDs *AND* the patients to go along with this approach.

Some of this may be societal – lots of folks just want a pill/surgery/device to make the problem go away, and are willing to try anything, even if it has an extremely low likelihood of working. Some of this is economic – Hospitals/Clinics/MDs get paid to treat people, not counsel them on the fact that there is no treatment that will work.

In any case, I am 100% convinced that there are incredibly useful lessons from the Social Sciences modeling – where RCTs and scientifically proven mechanisms are rare – that are directly applicable to the clinical medicine world. The reality is that only a tiny fraction of conditions have solid evidence (or even crappy NHST type evidence ;)) supporting specific treatments for individual patients.

]]>Peter:

I agree with what you wrote. I think one thing that you could add is that it’s difficult to estimate variability in treatment effects (recall the magic number 16), and in statistics we’re often trained to think that if something can’t be measured or estimated precisely, that it can be safely ignored.

]]>Link added; thanks.

]]>Indeed… it’s not really the varying treatment effects. it’s the mechanistic modeling. knowing that something varies but not why leads you to average treatment effects… but knowing that something varies and having ideas of why leads you to mechanistic modeling of the why, and now we’re doing science.

]]>