This just confusing if you literally think parameters are greek letters. E[F] is a parameter as well.

]]>putting it all together, can we say that

a ‘strategy’ to identify a causal estimand from your hypothetical causal model is a combination of:

1) the “data collection design” (whether to stratify/block & sample size) + 2) “unverifiable assumptions about counterfactuals” (eg presence/absence of links in a dag, selection on observables)

The analyst can stop after steps 1+2 if they just care about non-parametric identification with a valid ‘identification strategy’ above

But if the analyst makes further assumptions, 3) picking a particular parametric function, they now consider “likelihood identification” of a parameter.

Here the parameter should be related to the causal estimand thru choices of the 1+2+3

]]>Nice.

]]>(In causal inference, we’re usually interested in non-parametric identification, i.e. whether there exists a function of the data that converges to the causal effect of interest without any parametric assumptions on the data generating process. That’s why I didn’t mention anything about the likelihood that’s necessary for causal identification.)

]]>They’re somewhat related. In both cases, identification means that you can construct an estimate from data that converges to what you’re interested in as you collect more data.

In the case of a statistical parameter, this happens when your likelihood function has a unique maximum.

In the case of causal inference, this happens when certain typically unverifiable assumptions about counterfactuals hold (e.g. no confounding, or some set of instrumental variable assumptions, etc.)

]]>I’ve explored these issues here https://onlinelibrary.wiley.com/doi/full/10.1111/evo.12406 and here https://www.biorxiv.org/content/early/2017/12/04/133785. In the 2014 paper, I try to model the size of the error due to confounding based on simple models of data in observational designs. In the second, I try to really distinguish between the two different parameters that are estimated (theta – the conditional “effect” or regression parameter and beta – the causal effect parameter). Pound away (I welcome constructive criticism!).

]]>Typo – bias-variance _trade-off_ methods

]]>Krzys:

Yes its just an unbelievably contorted semantic mine field – we had to deal with it somewhat here “formal bias-variance methods in hierarchical (put as many terms as you want that are used for the same method)” – https://www.researchgate.net/publication/10601089_On_the_bias_produced_by_quality_scores_in_meta-analysis_and_a_hierarchical_view_of_proposed_solutions

Bob Carpenter will also have to deal with it in his talk http://statmodeling.stat.columbia.edu/2018/03/16/bobs-talk-berkeley-thursday-22-march-3-pm/

]]>in a second thought, perhaps I should not minimize the importance of understanding the differences between confounding bias and selection bias. if we have data on confounders, we can adjust in the analysis but with selection bias, it would be almost impossible to correct it in the analysis – few fancy models by inference but lots of juggling and assumption making. it is a great threat to the external validity of a study; therefore, we invest so much resource to prevent selection bias up front in design and data collection phase of the study.

]]>yes, of course, there are many ways selection bias can be introduced such as Berksonian, missing data, lost to follow-up etc. but they are called selection bias not confounding. it is about how these terminologies defined in epidemiology/causal inference, not whether selection bias confounds the association. That’s why I called it nitpicking because we understand the context but the terminology has implications when communicating in pubic.

]]>Sure selection bias can lead to confounding variables. There are different forms of selection, not all determined by the researcher

]]>But we are estimating—we’re estimating expectations of functions of the model parameters,

$latex \displaystyle \mathbb{E}\left[ f(\theta) \right] = \int_{\Theta} f(\theta) \, p(\theta | y) \, \mathrm{d}y$

Now the confusing part is that even though this marginalizes out the estimate of $latex \theta$, if we take $latex f$ to be the identity function, the expectation is the posterior mean. The posterior mean is the standard Bayesian point estimate of $latex \theta$, because it minimizes expected squared error in the estimate. But if we take $latex f(\theta)$ to be something like an indicator function, e.g., $latex f(\theta) = \mathrm{I}[\theta_1 > \theta_2]$, the expectation is an event probability. In this case, we are estimating the event probability, not the parameters.

]]>I agree with Anders substantially. Statistical biases may also entail causal biases. Here I din’t know why it is couched as ‘statistical’ vs.’causal. I advance this b/c it’s not necessarily in the actual study that one gathers the types of biases, including causal ones, implicated. It is in informal conversations their detection may be more feasible.

]]>i was referring to this sentence “….. versus selection bias (confounding, nonrandom samples, etc).”

]]>this is a great post.

are you able to comment on the relation between an identification strategy for a causal estimand to parameter identification in a statiatical model?

are these just unfortunately unrelated concepts that share a common keyword.

]]>Well said!

]]>Ayse:

In the above post I wrote, “the linked post above where we discuss the problems of selection bias.” And in the linked post we did discuss the problems of selection bias. Perhaps the confusion is that there were many linked posts.

]]>???????? ditto

]]>Suppose the “theta” you are discussing above is an observable parameter. In causal inference, we are not interested in theta, but in some unobservable causal parameter, for example the average treatment effect. If you have confounding bias, the problem is NOT that E(theta.hat|theta) is not equal to theta – it very well may be equal, but that question is perpendicular to whether there is confounding. Rather, the key issue is whether theta is equal to the average treatment effect, i.e. whether your identification of the causal effect is biased.

Statistical biases and causal biases have very different properties, and it is very confusing when they are discussed as if they are the same phenomenon. For reasonable statistical estimators, the statistical bias is generally small and quantifiable, and it may be sensible to argue that statistical unbiasedness is “not a good thing” – this is not something I take a position on. However, it is very misleading when this argument is used in favor of reducing the emphasis on unbiased identification.

Not only do biases due to confounding and selection bias tend to be much larger than biases due to biased statistical estimators, it is also generally not possible to quantify their magnitude in a given study. Therefore, for any given estimate, you generally have no idea how wrong it could be. In my view, this makes it impossible to interpret the results of the study, or to use it in decision making.

Randomized trials guarantee that you get unbiased identification of the intention-to-treat effect. This ensures both that the bias is 0, and that you can know that the bias is 0. In my view, the latter is more important. If a study design existed such that you could guarantee that the bias was at most “X”, then that design would have most of the advantages of a randomized trials: What I am defending is my ability to have a reasonably accurate impression of how wrong/biased the results could be. That is going to be almost impossible in the presence of confounding bias and selection bias.

Note that even in randomized trials, you could in principle estimate the ITT effect using a biased statistical estimator. In practice, there is no point in doing this, since you can just use the non-parametric sample ITT effect as an unbiased statistical estimator. The point I am trying to make, is that the advantage of randomization is NOT to avoid deviations from E(theta.hat|theta)

]]>