Elementary Estimators for High-Dimensional Linear Regression Eunho Yang EUNHO @ CS . UTEXAS . EDU Department of Computer Science, The University of Texas, Austin, TX 78712, USA Aur´ elie C. Lozano ACLOZANO @ US . IBM . COM IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA Pradeep Ravikumar PRADEEPR @ CS . UTEXAS . EDU Department of Computer Science, The University of Texas, Austin, TX 78712, USA Abstract dimensional regression parameter are assumed to be non- zero, group-sparse constraints, and low-rank structure with We consider the problem of structurally con- matrix-structured parameters, among others. strained high-dimensional linear regression. This has attracted considerable attention over the last The development of consistent estimators for such struc- decade, with state of the art statistical estimators turally constrained high-dimensional linear regression has based on solving regularized convex programs. attracted considerable recent attention. A key class of esti- While these typically non-smooth convex pro- mators are based on regularized maximum likelihood es- grams can be solved by the state of the art op- timators; in the case of linear regression with Gaussian timization methods in polynomial time, scaling noise, these take the form of regularized least squares esti- them to very large-scale problems is an ongoing mators. For the case of sparsity, a popular instance is con- and rich area of research. In this paper, we at- strained basis pursuit or LASSO (Tibshirani, 1996), which tempt to address this scaling issue at the source, solves an ℓ 1 regularized (or equivalently ℓ 1 -constrained) by asking whether one can build simpler possibly least squares problem, and has been shown to have strong closed-form estimators, that yet come with statis- statistical guarantees, including prediction error consis- tical guarantees that are nonetheless comparable tency (van de Geer & Buhlmann, 2009), consistency of the to regularized likelihood estimators. We answer parameter estimates in ℓ 2 or some other norm (van de Geer this question in the affirmative, with variants & Buhlmann, 2009; Meinshausen & Yu, 2009; Candes & of the classical ridge and OLS (ordinary least Tao, 2006), as well as variable selection consistency (Mein- squares estimators) for linear regression. We an- shausen & B¨ uhlmann, 2006; Wainwright, 2009; Zhao & alyze our estimators in the high-dimensional set- Yu, 2006). For the case of group-sparse structured linear ting, and moreover provide empirical corrobora- regression, ℓ 1 /ℓ q regularized least squares (with q ≥ 2 ) tion of its performance on simulated as well as has been proposed (Tropp et al., 2006; Zhao et al., 2009; real world microarray data. Yuan & Lin, 2006; Jacob et al., 2009), and shown to have strong statistical guarantees, including convergence rates in ℓ 2 -norm (Lounici et al., 2009; Baraniuk et al., 2008)) 1. Introduction as well as model selection consistency (Obozinski et al., 2008; Negahban & Wainwright, 2009). For the matrix- We consider the problem of high-dimensional linear regres- structured least squares problem, nuclear norm regularized sion, where the number of variables p could potentially be estimators have been studied for instance in (Recht et al., even larger than the number of observations n . Under such 2010; Bach, 2008). For other structurally constrained least high-dimensional regimes, it is now well understood that squares problems, see (Huang et al., 2011; Bach et al., consistent estimation is typically not possible unless one 2012; Negahban et al., 2012) and references therein. All imposes low-dimensional structural constraints upon the of these estimators solve convex programs, though with regression parameter vector. Popular structural constraints non-smooth components due to the respective regulariza- include that of sparsity, where very few entries of the high- tion functions. The state of the art optimization methods for Proceedings of the 31 st International Conference on Machine solving these programs are iterative, and can approach the Learning , Beijing, China, 2014. JMLR: W&CP volume 32. Copy- optimal solution within any finite accuracy with computa- right 2014 by the author(s). tional complexity that scales polynomially with the number
Recommend
More recommend