Targeted Machine Learning for Causal Inference

Fri, Feb 7, 2020, 12:00 pm
399 Julis Romo Rabinowitz
Center for Statistics and Machine Learning
Department of Politics


We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for infinite dimensional models. TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through an initial estimator (e.g., ensemble super-learner) of the relevant functional of the data distribution. The asymptotic normality and efficiency of the TMLE relies on the asymptotic negligibility of a second-order term. This typically requires the initial estimator to converge at a rate faster than n-1/4. We propose a new estimator, the Highly Adaptive LASSO (HAL), of the data distribution and its functionals that converges at a sufficient rate N-1/3 regardless of the dimensionality of the data/model, under almost no additional regularity. This allows us to propose a general TMLE that is asymptotically efficient in great generality. We demonstrate the practical performance of HAL and its corresponding TMLE for the average causal effect for dimensions up till 10. We also present a nonparametric bootstrap method for inference taking into account the higher order contributions of the HAL-TMLE. Finally we discuss recent advances in our understanding of undersmoothed HAL as a general MLE.


Mark van der Laan, Ph.D., is a Professor of Biostatistics and Statistics at UC Berkeley. His research interests include statistical methods in genomics (i.e., computational biology), survival analysis, censored data, targeted maximum likelihood estimation in semiparametric models, causal inference, data adaptive loss-based super learning, and multiple testing. His research group developed loss-based super learning in semiparametric models, based on cross-validation, as a generic optimal tool for estimation of infinite dimensional parameters, such as nonparametric density estimation and prediction based on censored and uncensored data. Building on this super learning methodology, his research group developed targeted maximum likelihood estimation of a target parameter of the data generating distribution in semiparametric models, as a new generic optimal methodology for statistical inference. These general statistical approaches are applied across a large variety of applications such as in the analysis of clinical trials, assessment of (causal) effects in observational studies and the analysis of large genomic data sets.

Original event source here.