Mimicking Randomized Controlled Trials to Learn End-to-End Patient Representations through Self-Supervised Covariate Balancing for Causal Treatment Effect Estimation

29 Sep 2021 · Gino Tesei, Stefanos Giampanis, Beau Norgeot ·

A causal effect can be defined as a comparison of outcomes that result from two or more alternative actions, with only one of the action-outcome pairs actually being observed. The gold standard for causal effect measurements is Randomized Controlled Trials (RCTs), in which a target population is explicitly defined and each study sample is randomly assigned to either the treatment or control cohorts. The great potential to derive actionable insights from causal relationships has led to a growing body of machine-learning research applying causal effect estimators to Real World Data (RWD) in the fields of healthcare, education, and economics. The primary difference between causal effect studies utilizing RWD and RCTs is that for RWD the study occurs after the treatment, and therefore we do not have control over the treatment assignment mechanism. This can lead to massive differences in covariate distributions between control and treatment samples, making a comparison of causal effects confounded and unreliable. Classical approaches have sought to solve this problem piece meal, first by estimating treatment assignment and then treatment effect separately. Recent work extended part of these approaches to a new family of representation-learning based algorithms, revealing that the lower bound of the expected treatment effect estimation error is determined by two factors: the standard generalization-error of the representation and the distance between the treated and control distributions induced by the representation. Here we argue that to achieve minimal dissimilarity in learning such distributions, as it happens for RCTs, a specific auto-balancing self-supervised objective should be used. Experiments on real and simulated data revealed that our approach consistently produces less biased errors than previously published state-of-art methods. We demonstrate that our reduction in error can be directly attributed to the ability to learn representations that explicitly reduce such dissimilarity. Additionally, we show that error improvements between our approach and previously published state-of-art methods widen as a function of sample dissimilarity between treated and untreated covariate distributions. Thus, by learning representations that induce distributions analogous to RCTs, we provide empirical evidence to support the error bound dissimilarity hypothesis as well as providing a new state-of-the-art model for causal effect estimation.

PDF Abstract