the additive model revisited
play

The additive model revisited Sara van de Geer January 8, 2013 but - PowerPoint PPT Presentation

The additive model revisited Sara van de Geer January 8, 2013 but first something else (Les Houches) Additive model January 8, 2013 1 / 30 The additive model revisited Sara van de Geer January 8, 2013 but first something else (Les


  1. The additive model revisited Sara van de Geer January 8, 2013 but first something else (Les Houches) Additive model January 8, 2013 1 / 30

  2. The additive model revisited Sara van de Geer January 8, 2013 but first something else (Les Houches) Additive model January 8, 2013 1 / 30

  3. Contents Sharp oracle inequalities Structured sparsity Compatibility (restricted eigenvalue condition) Semiparametric approach Partial linear models Nonparametric models (Les Houches) Additive model January 8, 2013 2 / 30

  4. Sharp oracle inequalities Let S ∈ S be some index set and {F S } S ∈S be a collection of models. Moreover let L ( X , f ) be a loss function and R ( f ) := E L ( X , f ) . We say that the estimator ˆ f satisfies a sharp oracle inequality if with large probability � � R (ˆ f ) ≤ min min R ( f ) + Remainder ( S ) . S ∈S f ∈F S Non-sharp oracle inequalities are of the form: with large probability � � R (ˆ f ) − R ( f 0 ) ≤ ( 1 + δ ) min ( R ( f ) − R ( f 0 )) + Remainder δ ( S ) min , S ∈S f ∈F S where δ > 0 and f 0 := min R ( f ) . f ∈∪ S ∈S F S (Les Houches) Additive model January 8, 2013 3 / 30

  5. Sharp oracle inequalities with structured sparsity penalities High-dimensional linear model: Y = X β 0 + ǫ, with Y ∈ R n , X and n × p matrix and β 0 ∈ R p . We believe that β 0 can be well approximated by a “structured sparse” β . Let Ω be some given norm on R p . Norm-penalized estimator: � � β := ˆ ˆ � Y − X β � 2 β Ω := arg min 2 / n + 2 λ Ω( β ) . β ∈ R p Aim: (Sharp) sparsity oracle inequalities for ˆ β . (Les Houches) Additive model January 8, 2013 4 / 30

  6. Notation: for β ∈ R p and S ⊂ { 1 , . . . , p } β j , S := β j l { j ∈ S } . Example ℓ 1 -norm p � Ω( β ) := � β � 1 := | β j | ❀ Lasso j = 1 The ℓ 1 -norm is decomposable : � β � 1 = � β S � 1 + � β S c � 1 ∀ β ∀ S . (Les Houches) Additive model January 8, 2013 5 / 30

  7. Definition We say that the norm Ω is weakly decomposable for S if there exists a norm Ω S c on R p −| S | such that for all β ∈ R p , Ω( β ) ≥ Ω( β S ) + Ω S c ( β S c ) . Definition We say that S is an allowed set (for Ω ) if Ω is weakly decomposable for S . (Les Houches) Additive model January 8, 2013 6 / 30

  8. Example The group Lasso norm: T � � | G t |� β G t � 2 , β ∈ R p , Ω( β ) := � β � 2 , 1 := t = 1 where G 1 , . . . , G T is a partition of { 1 , . . . , p } into disjoint groups. It is (weakly) decomposable for S = ∪ t ∈T G t with Ω S c = Ω . Thus, for any β , S := ∪{ G t : � β G t � 2 � = 0 } is an allowed set. (Les Houches) Additive model January 8, 2013 7 / 30

  9. Example From Micchelli et al. (2010) Let A ⊂ [ 0 , ∞ ) p be some convex cone. Define p � β 2 � 1 j � Ω( β ) := Ω( β ; A ) := min + a j . 2 a j a ∈A j = 1 Let A S := { a S : a ∈ A} . Definition We call A S an allowed set, if A S ⊂ A . Lemma Suppose A S is an allowed set. Then S is allowed, i.e. S is weakly decomposable for Ω . (Les Houches) Additive model January 8, 2013 8 / 30

  10. We use the notation � v � 2 n := v T v / n , v ∈ R n . Definition Suppose S is an allowed set. Let L > 0 be some constant. The Ω -eigenvalue (for S ) is � � � X β S − X β S c � n : Ω( β S ) = 1 , Ω S c ( β S c ) ≤ L δ Ω ( L , S ) := min . The Ω -effective sparsity is 1 Γ 2 Ω ( L , S ) := Ω ( L , S ) . δ 2 (Les Houches) Additive model January 8, 2013 9 / 30

  11. The dual norm of Ω is denoted by Ω ∗ , that is | w T β | , w ∈ R p . Ω ∗ ( w ) := sup Ω( β ) ≤ 1 We moreover let Ω S c be the dual norm of Ω S c . ∗ (Les Houches) Additive model January 8, 2013 10 / 30

  12. A sharp oracle inequality Theorem Let β ∈ R p be arbitrary and let Let S ⊃ { j : β j � = 0 } be an allowed set. Define � � , λ S c := Ω S c � � λ S := Ω ∗ ( ǫ T X ) S / n ( ǫ T X ) S c / n . ∗ Suppose λ > λ S c . Define � λ + λ S � L S := . λ − λ S c Then � 2 � � X (ˆ β − β 0 ) � 2 n ≤ � X ( β − β 0 ) � 2 ( λ + λ S ) Γ 2 n + Ω ( L S , S ) . Related results: Bach (2010). (Les Houches) Additive model January 8, 2013 11 / 30

  13. What about convergence of the Ω -estimation error? (Les Houches) Additive model January 8, 2013 12 / 30

  14. Theorem Let β ∈ R p be arbitrary and let Let S ⊃ { j : β j � = 0 } be an allowed set. Define � � � � , λ S c := Ω S c λ S := Ω ∗ ( ǫ T X ) S / n ( ǫ T X ) S c / n . ∗ Suppose λ > λ S c . Define for some 0 ≤ δ < 1 � λ + λ S �� 1 + δ � L S := . λ − λ S c 1 − δ Then n + δ ( λ − λ S c )Ω S c (ˆ � X (ˆ β S c ) + δ ( λ + λ S )Ω(ˆ β − β 0 ) � 2 β S − β ) � 2 � ≤ � X ( β − β 0 ) � 2 ( 1 + δ )( λ + λ S ) Γ 2 n + Ω ( L S , S ) . (Les Houches) Additive model January 8, 2013 13 / 30

  15. Special case where Ω = � · � 1 Theorem (Koltchinskii et al. (2011)) Let for S ⊂ { 1 , . . . , p } λ 0 := � ( ǫ T X ) � ∞ / n . Define for λ > λ 0 L := λ + λ 0 . λ − λ 0 Then � � � X (ˆ β − β 0 ) � 2 � X ( β − β 0 ) � 2 n + ( λ + λ 0 ) 2 Γ 2 ( L , � β � 0 ) n ≤ min . β ∈ R p (Les Houches) Additive model January 8, 2013 14 / 30

  16. Compatibility (restricted eigenvalue condition) Recall that for the ℓ 1 -norm 1 Γ 2 ( L , S ) = δ 2 ( L , S ) , with � � δ ( L , S ) := min � X β S − X β S c � n : � β S � 1 = 1 , � β S c � 1 ≤ L . We have | S | Γ 2 ( L , S ) ≤ κ 2 ( L , S ) , where κ 2 ( L , S ) is the restricted eigenvalue (Bickel et al. (2009)). (Les Houches) Additive model January 8, 2013 15 / 30

  17. Consider the case S = { 1 } , and write X 1 := X S , X 2 := X S c . Let X 1 ˆ PX 2 be the projection (in R n ) of X 1 on X 2 and X 1 ˆ AX 2 := X 1 − X 1 ˆ PX 2 be the antiprojection. Define γ 0 := arg min {� γ � 1 : X 1 ˆ ˆ PX 2 = X 2 γ } . Then clearly δ ( L , { 1 } ) = � X 1 ˆ γ 0 � 1 . AX 2 � n ∀ L ≥ � ˆ When n < p one readily sees that γ 0 � 1 . δ ( L , { 1 } ) = 0 ∀ L ≥ � ˆ (Les Houches) Additive model January 8, 2013 16 / 30

  18. Suppose now that the rows of X are i.i.d. with sub-Gaussian distribution Q . Let X 1 PX 2 be the projection of X 1 on X 2 in L 2 ( Q ) and X 1 AX 2 := X 1 − X 1 PX 2 . Let � · � be the L 2 ( Q ) -norm. Define γ 0 := arg min {� γ � 1 : X 1 PX 2 = X 2 γ } . � log p / n small Then with large probability, for L δ ( L , S ) ≥ ( 1 − ǫ ) � X 1 AX 2 � ∀ L ≥ � γ 0 � 1 . and moreover, � log p ( X 1 AX 1 ) T ( X 1 PX 2 ) / n ≍ . n (Les Houches) Additive model January 8, 2013 17 / 30

  19. Oracle inequalities for parameters of interest High-dimensional linear model: Y = X 1 β 0 1 + X 2 β 0 2 + ǫ, β 0 1 ∈ R q , β 0 2 ∈ R p − q , and the entries of ǫ i.i.d. sub-Gaussian. Suppose the rows of X are i.i.d with sub-Gaussian distribution Q . We are interested in estimating β 0 1 . Lasso estimator: � � β = (ˆ ˆ β 1 , ˆ � Y − X 1 β 1 − X 2 β 2 � 2 β 1 ) := arg min 2 / n + λ � β 1 � 1 + λ � β 2 � 1 . β 1 , β 2 (Les Houches) Additive model January 8, 2013 18 / 30

  20. Notation Let X 1 PX 2 be the projection of X 1 on X 2 in L 2 ( Q ) , and define ˜ X 1 := X 1 − X 1 PX 2 = X 1 AX 2 . Let Σ 1 := E ˜ X T 1 ˜ X 1 / n , and let ˜ Λ 2 1 be its smallest eigenvalue. Define � � C 0 := arg min � C � 1 , ∞ : X 1 PX 2 = X 2 C , where � C � 1 , ∞ := max 1 ≤ k ≤ q � γ k � 1 , C := ( γ 1 , . . . , γ p − q ) . (Les Houches) Additive model January 8, 2013 19 / 30

  21. Condition 1 1 / ˜ Λ 1 = O ( 1 ) �� � n Condition 2 � β 0 � 1 = O ( 1 ) and s 1 := � β 0 1 � 0 ∨ 1 = o . log p (Les Houches) Additive model January 8, 2013 20 / 30

  22. Theorem � Take λ ≍ log p / n. Then � ˆ β − β 0 � 1 = O P ( 1 ) . If moreover � C 0 � 1 , ∞ = O ( 1 ) ( i . e . ℓ 1 − smoothness of the projection ) , then � � � log p � ˆ β 1 − β 0 1 � 1 = O P s 1 = o P ( 1 ) . n Special case: q = 1 (recall q = dim ( β 1 ) ). Then s 1 = 1 and hence �� log p � | ˆ β 1 − β 0 1 | = O P . n (Les Houches) Additive model January 8, 2013 21 / 30

  23. The high-dimensional partial linear model Joint work with Patric M¨ uller . Additive model: Y = X β 0 + g 0 ( Z ) + ǫ, with ǫ ⊥ ( X , Z ) . We assume that the entries of ( X , Z ) ∈ R p × Z are i.i.d. with distribution Q and that the entries of ǫ are i.i.d. sub-Gaussian. We will assume that g 0 has a given “smoothness” m > 1 / 2 and that β 0 is sparse, with X β 0 is “smoother” than g 0 . Estimator: � � (ˆ � Y − X β − g ( Z ) � 2 2 / n + λ � β � 1 + µ 2 J 2 ( g ) β, ˆ g ) := arg min , β, g where J is some (semi-)norm on the space of functions on Z . (Les Houches) Additive model January 8, 2013 22 / 30

  24. Notation We write ˜ X := XAZ := X − XPZ where XPZ := E ( X | Z ) . X T ˜ The smallest eigenvalue of E ˜ X / n is denoted by ˜ Λ 2 . The largest eigenvalue of E ( XPZ ) T ( XPZ ) / n is denoted by Λ 2 P . � · � is the L 2 ( Q ) -norm. (Les Houches) Additive model January 8, 2013 23 / 30

Recommend


More recommend