m estimation under high dimensional asymptotics
play

M-Estimation under High-Dimensional Asymptotics DLD, Andrea - PowerPoint PPT Presentation

M-estimation Our Paper Isometry Between (M)-estimation & Lasso M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics M-estimation


  1. M-estimation Our Paper Isometry Between (M)-estimation & Lasso M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

  2. M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso ∗ “An out-of-the-park grand-slam home run” Annals of Mathematical Statistics 1964 ∗ Richard Olshen DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

  3. M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

  4. M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso M-estimation Basics Location model Y i = θ + Z i , i = 1 , . . . , n Errors: Z i ∼ F , not necessarily Gaussian. “Loss” Function ρ ( t ) eg t 2 , | t | , − log( f ( t )),. . . n � ( M ) min ρ ( Y i − θ ) θ i =1 Asymptotic Distribution √ n (ˆ θ n − θ ) ⇒ D N (0 , V ) , n → ∞ . Asymptotic Variance: ψ = ρ ′ : � ψ 2 d F V ( ψ, F ) = � ( ψ ′ d F ) 2 Information Bound 1 V ( ψ, F ) ≥ I ( F ) DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

  5. M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso The One-Step Viewpoint One-Step Huber Estimates in the Linear Model P. J. BICKEL* Simple "one-step" versions of Huber's (M) estimates for equivalence holds in the more general context of the the linear model are introduced. Some relevant Monte Carlo results obtained linear model for general 46. in the Princeton project [1] are singled out and discussed. The large Typically the estimates obtained from (1.1) are not sample behavior of these procedures is examined under very mild scale equivariant.1 To obtain acceptable procedures a regularity conditions. scale equivariant and location invariant estimate of scale 1. INTRODUCTION 6 must be calculated from the data and 6 be obtained as the solution of In 1964 Huber [7] introduced a class of estimates n (referred to as (M)) in the location problem, studied their E, Off (Xi - 0) = O(1.4) asymptotic behavior and identified robust members of j-1 the group. These procedures are the solutions 8 of equa- where tions of the form, (x) = (x/a) . (1.5) n The resulting 6 is then both location and scale equi- E +(X}-') -0, (1.1) variant. The estimate 6 can be obtained simultaneously ' by with solving a system of equations such as those of Xi = 0 + El, * ?Xn = + E. and where El, ** En 2 [8, p. 96] or the "likelihood Huber's Proposal are unknown independent, distributed identically errors equations" which have a distribution F which is symmetric about 0. n ;\ If F has a density Xj - f which is smooth and if f is known, E 4 ) 0X O' then maximum likelihood estimates if they exist satisfy a-1 (1.6) = -f'/f. (1.1) with F6 n Xi - 6 Under successively milder regularity conditions on V ' were and F, Huber showed in [7] and [8] that such consistent and asymptotically normal with mean 0 and where x (t) = t4* (t) Or, we may choose 6 indepen- -1. variance F)/n where dently. For instance, in this the K(#k, article, normalized inter- JASA 1975 F) =1 2 quartile range, dt K(VI, /(t)f(t) f(t)do(t)I . (1.2) l = (X(n-[n/4]+1) - X([,/4]))/24'-1(3/4), (1.7) and If F is unknown but close to a normal the symmetrized interquartile range, distribution with mean 0 and known variance in a suitable sense, Huber { i - m I/(D-1(3), 62 = median (1.8) in [7] further showed that (M) estimates based on are used where X(l) < ... < X(n) are the order statistics, if Itl < K #K(t) = t 4' is the standard normal cdf and m is the sample median. DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics = Ksgnt if Itl >K (1.3) If 6 -+ a(F) at rate 1/v'n and F is symmetric as hy- pothesized, then the asymptotic theory for the location have a desirable minimax robustness property. If K is model continues to be valid with K (t, F) replaced by finite these estimates can only be calculated iteratively. cf. [7].) We shall show the con- It has, K(#6( (F)), F). (E.g., (in however, been observed by Fisher, Neyman and text of the linear model) under mild conditions that the others that if F is known and ' = ((- f'/f), the estimate one-step "Gauss-Newton" approximation to (1.4)-O obtained by starting with a Vn consistent estimate 6 being the only unknown-behaves asymptotically like and performing one Gauss-Newton iteration of (1.1) is the root. asymptotically efficient even when the MLE is not and The estimates corresponding to OK have a rather ap- is equivalent to it when it is (cf. [13]). One purpose of pealing form and, of course, all of these Gauss-Newton this note is to show that under mild conditions this I In this article location (scale) invariance refers to procedures which remain * P.J. Bickel is professor, Department of Statistics, University of California, unchanged when the data are shifted (rescaled). The term "equivariant" is in ac- Berkeley, Ca. 94720. This research was performed with partial support of the cord with its usage in [2]. Thus, ; location and scale equivariant means that O.N.R. under Contract N00014-67-A-D151-0017 with Princeton University, and ;(aX1 + b, * , aX. + b) = aJ(Xi, * *, aX,) + b and a scale equivariant means N00014-67-A0114-0004 with the University of California at Berkeley, as well as that 3(aXi, * *, aX.) = Ia * *, Xn). that of the John Simon Guggenheim Foundation. I(Xi, The author would like to thank P.J. Huber, C. Kraft and C. Van Eeden and D. Relles for providing him with a Journal of the American Statistical Association reprints of their work on this subject; W. Rogers III for programming the Monte Carlo computations of Section 3, which appeared in the June 1975, Volume 70, Number 350 Princeton project; and a referee who made Tables 1 and 2 reflect numerical realities. Theory and Methods Section 428

  6. M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso Regression M-estimation: the One-Step Viewpoint Regression model Y i = X ′ i θ + Z i , Z i ∼ iid F , i = 1 , . . . , n Objective function of (M): n � ρ ( Y i − X ′ R ( ϑ ) = i ϑ ) i =1 ( M ) min ϑ R ( ϑ ) θ n any √ n -consistent estimate of θ : One-step estimate: ˜ θ 1 = ˜ θ n ] − 1 ∇ R | ˜ ˆ θ n − [Hess R | ˜ θ n . Effectiveness: ˆ θ true solution of M-equation: θ 1 − ˆ θ 1 − ˆ θ ) ′ = o ( n − 1 ) E (ˆ θ )(ˆ DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

  7. M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso Driving Idea of Classical Asymptotics The M-estimate is asymptotically equivalent to a single step of Newton’s method for finding a zero of ∇ R starting at the true underlying parameter. Goes back to Fisher, ‘Method of Scoring’ for MLE. DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

  8. M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso Derivation of Asymptotic Variance Formula Approximation to One-Step: 1 θ 1 = θ + B ( ψ, F )( X ′ X ) − 1 X ′ ( ψ ( Z i )) + o p ( n − 1 / 2 ) ˆ � ψ ′ dF . Observe that where B ( ψ, F ) = Var (( X ′ X ) − 1 X ′ ( ψ ( Z i ))) ∼ ( X ′ X ) − 1 A ( ψ, F ) � ψ 2 dF . Hence if X i , j ∼ N (0 , 1 where A ( ψ, F ) = n ) θ i − θ i ) → A ( ψ, F ) Var (ˆ B ( ψ, F ) 2 = V ( ψ, F ) DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

  9. M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso Asymptotics for Regression, I DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

  10. M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso Asymptotics for Regression, II PJ Huber, Annals of Statistics 1973 DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

  11. M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

  12. M-estimation Classical M-estimation Our Paper Big Data M-estimation Isometry Between (M)-estimation & Lasso DLD, Andrea Montanari M-Estimation under High-Dimensional Asymptotics

Recommend


More recommend