Sparse statistical estimators are increasingly prevalent due to their ease of interpretability and superior out-of-sample performance. However, sparse estimation problems with an L0 constraint, restricting the support of the estimators, are challenging (typically NP-hard, but not always) non-convex optimization problems. Consequently, academics and practitioners commonly turn to convex L1 proxies, such as Lasso and its variants, as a remedy. Although the L1 models are solved fast, they may lead to biased and/or dense estimators and require substantial cross-validation for calibration.
In this talk, we focus on two estimation problems: i) sparse regression and ii) sparse and smooth signal recovery. The first one is known to be NP-hard; we show that the second one is equivalent to a submodular minimization problem and, hence, it is polynomially solvable. For both problems, we derive a sequence of strong convex relaxations. These relaxations are based on the ideal (convex-hull) formulations for rank-one/pairwise quadratic terms with indicator variables. The new relaxations can be formulated as conic quadratic or semidefinite optimization problems in an extended space; they are stronger and more general than the state-of-the-art models with the reverse Huber penalty and the minimax concave penalty functions. Furthermore, the proposed rank-one strengthening can be interpreted as an unbiased, non-separable, non-convex, sparsity-inducing regularizer, which dynamically adjusts its penalty according to the shape of the estimation error function without inducing bias for the sparse solutions. Computational experiments with benchmark datasets show that the proposed conic formulations are solved fast and result in near-optimal estimators for non-convex L0-problems. Moreover, the resulting estimators also outperform L1 approaches from a statistical perspective, achieving high prediction accuracy and good interpretability.
This talk is based on papers with Andres Gomez & Shaoning Han.
Alper Atamturk is the Earl J. Isaac Chair in the Science and Analysis of Decision Making, professor and chair of the Department of Industrial Engineering and Operations Research at the University of California, Berkeley. He serves as the chair of INFORMS Optimization Society, Berkeley site director of the National Artificial Intelligence Institute for Advances in Optimization, director of Berkeley Computational Optimization Lab, co-editor of Mathematical Programming, associate editor for Discrete Optimization, Mathematical Programming Computation, and Journal of Risk. Alper is a Fellow of the Institute for Operations Research and Management Science, and a Vannevar Bush Faculty Fellow of the US Department of Defense. He serves as an advisor to a number of companies, universities, and government agencies.
- Center for Statistics and Machine Learning