The additive model revisited Sara van de Geer January 8, 2013 but first something else (Les Houches) Additive model January 8, 2013 1 / 30
The additive model revisited Sara van de Geer January 8, 2013 but first something else (Les Houches) Additive model January 8, 2013 1 / 30
Contents Sharp oracle inequalities Structured sparsity Compatibility (restricted eigenvalue condition) Semiparametric approach Partial linear models Nonparametric models (Les Houches) Additive model January 8, 2013 2 / 30
Sharp oracle inequalities Let S ∈ S be some index set and {F S } S ∈S be a collection of models. Moreover let L ( X , f ) be a loss function and R ( f ) := E L ( X , f ) . We say that the estimator ˆ f satisfies a sharp oracle inequality if with large probability � � R (ˆ f ) ≤ min min R ( f ) + Remainder ( S ) . S ∈S f ∈F S Non-sharp oracle inequalities are of the form: with large probability � � R (ˆ f ) − R ( f 0 ) ≤ ( 1 + δ ) min ( R ( f ) − R ( f 0 )) + Remainder δ ( S ) min , S ∈S f ∈F S where δ > 0 and f 0 := min R ( f ) . f ∈∪ S ∈S F S (Les Houches) Additive model January 8, 2013 3 / 30
Sharp oracle inequalities with structured sparsity penalities High-dimensional linear model: Y = X β 0 + ǫ, with Y ∈ R n , X and n × p matrix and β 0 ∈ R p . We believe that β 0 can be well approximated by a “structured sparse” β . Let Ω be some given norm on R p . Norm-penalized estimator: � � β := ˆ ˆ � Y − X β � 2 β Ω := arg min 2 / n + 2 λ Ω( β ) . β ∈ R p Aim: (Sharp) sparsity oracle inequalities for ˆ β . (Les Houches) Additive model January 8, 2013 4 / 30
Notation: for β ∈ R p and S ⊂ { 1 , . . . , p } β j , S := β j l { j ∈ S } . Example ℓ 1 -norm p � Ω( β ) := � β � 1 := | β j | ❀ Lasso j = 1 The ℓ 1 -norm is decomposable : � β � 1 = � β S � 1 + � β S c � 1 ∀ β ∀ S . (Les Houches) Additive model January 8, 2013 5 / 30
Definition We say that the norm Ω is weakly decomposable for S if there exists a norm Ω S c on R p −| S | such that for all β ∈ R p , Ω( β ) ≥ Ω( β S ) + Ω S c ( β S c ) . Definition We say that S is an allowed set (for Ω ) if Ω is weakly decomposable for S . (Les Houches) Additive model January 8, 2013 6 / 30
Example The group Lasso norm: T � � | G t |� β G t � 2 , β ∈ R p , Ω( β ) := � β � 2 , 1 := t = 1 where G 1 , . . . , G T is a partition of { 1 , . . . , p } into disjoint groups. It is (weakly) decomposable for S = ∪ t ∈T G t with Ω S c = Ω . Thus, for any β , S := ∪{ G t : � β G t � 2 � = 0 } is an allowed set. (Les Houches) Additive model January 8, 2013 7 / 30
Example From Micchelli et al. (2010) Let A ⊂ [ 0 , ∞ ) p be some convex cone. Define p � β 2 � 1 j � Ω( β ) := Ω( β ; A ) := min + a j . 2 a j a ∈A j = 1 Let A S := { a S : a ∈ A} . Definition We call A S an allowed set, if A S ⊂ A . Lemma Suppose A S is an allowed set. Then S is allowed, i.e. S is weakly decomposable for Ω . (Les Houches) Additive model January 8, 2013 8 / 30
We use the notation � v � 2 n := v T v / n , v ∈ R n . Definition Suppose S is an allowed set. Let L > 0 be some constant. The Ω -eigenvalue (for S ) is � � � X β S − X β S c � n : Ω( β S ) = 1 , Ω S c ( β S c ) ≤ L δ Ω ( L , S ) := min . The Ω -effective sparsity is 1 Γ 2 Ω ( L , S ) := Ω ( L , S ) . δ 2 (Les Houches) Additive model January 8, 2013 9 / 30
The dual norm of Ω is denoted by Ω ∗ , that is | w T β | , w ∈ R p . Ω ∗ ( w ) := sup Ω( β ) ≤ 1 We moreover let Ω S c be the dual norm of Ω S c . ∗ (Les Houches) Additive model January 8, 2013 10 / 30
A sharp oracle inequality Theorem Let β ∈ R p be arbitrary and let Let S ⊃ { j : β j � = 0 } be an allowed set. Define � � , λ S c := Ω S c � � λ S := Ω ∗ ( ǫ T X ) S / n ( ǫ T X ) S c / n . ∗ Suppose λ > λ S c . Define � λ + λ S � L S := . λ − λ S c Then � 2 � � X (ˆ β − β 0 ) � 2 n ≤ � X ( β − β 0 ) � 2 ( λ + λ S ) Γ 2 n + Ω ( L S , S ) . Related results: Bach (2010). (Les Houches) Additive model January 8, 2013 11 / 30
What about convergence of the Ω -estimation error? (Les Houches) Additive model January 8, 2013 12 / 30
Theorem Let β ∈ R p be arbitrary and let Let S ⊃ { j : β j � = 0 } be an allowed set. Define � � � � , λ S c := Ω S c λ S := Ω ∗ ( ǫ T X ) S / n ( ǫ T X ) S c / n . ∗ Suppose λ > λ S c . Define for some 0 ≤ δ < 1 � λ + λ S �� 1 + δ � L S := . λ − λ S c 1 − δ Then n + δ ( λ − λ S c )Ω S c (ˆ � X (ˆ β S c ) + δ ( λ + λ S )Ω(ˆ β − β 0 ) � 2 β S − β ) � 2 � ≤ � X ( β − β 0 ) � 2 ( 1 + δ )( λ + λ S ) Γ 2 n + Ω ( L S , S ) . (Les Houches) Additive model January 8, 2013 13 / 30
Special case where Ω = � · � 1 Theorem (Koltchinskii et al. (2011)) Let for S ⊂ { 1 , . . . , p } λ 0 := � ( ǫ T X ) � ∞ / n . Define for λ > λ 0 L := λ + λ 0 . λ − λ 0 Then � � � X (ˆ β − β 0 ) � 2 � X ( β − β 0 ) � 2 n + ( λ + λ 0 ) 2 Γ 2 ( L , � β � 0 ) n ≤ min . β ∈ R p (Les Houches) Additive model January 8, 2013 14 / 30
Compatibility (restricted eigenvalue condition) Recall that for the ℓ 1 -norm 1 Γ 2 ( L , S ) = δ 2 ( L , S ) , with � � δ ( L , S ) := min � X β S − X β S c � n : � β S � 1 = 1 , � β S c � 1 ≤ L . We have | S | Γ 2 ( L , S ) ≤ κ 2 ( L , S ) , where κ 2 ( L , S ) is the restricted eigenvalue (Bickel et al. (2009)). (Les Houches) Additive model January 8, 2013 15 / 30
Consider the case S = { 1 } , and write X 1 := X S , X 2 := X S c . Let X 1 ˆ PX 2 be the projection (in R n ) of X 1 on X 2 and X 1 ˆ AX 2 := X 1 − X 1 ˆ PX 2 be the antiprojection. Define γ 0 := arg min {� γ � 1 : X 1 ˆ ˆ PX 2 = X 2 γ } . Then clearly δ ( L , { 1 } ) = � X 1 ˆ γ 0 � 1 . AX 2 � n ∀ L ≥ � ˆ When n < p one readily sees that γ 0 � 1 . δ ( L , { 1 } ) = 0 ∀ L ≥ � ˆ (Les Houches) Additive model January 8, 2013 16 / 30
Suppose now that the rows of X are i.i.d. with sub-Gaussian distribution Q . Let X 1 PX 2 be the projection of X 1 on X 2 in L 2 ( Q ) and X 1 AX 2 := X 1 − X 1 PX 2 . Let � · � be the L 2 ( Q ) -norm. Define γ 0 := arg min {� γ � 1 : X 1 PX 2 = X 2 γ } . � log p / n small Then with large probability, for L δ ( L , S ) ≥ ( 1 − ǫ ) � X 1 AX 2 � ∀ L ≥ � γ 0 � 1 . and moreover, � log p ( X 1 AX 1 ) T ( X 1 PX 2 ) / n ≍ . n (Les Houches) Additive model January 8, 2013 17 / 30
Oracle inequalities for parameters of interest High-dimensional linear model: Y = X 1 β 0 1 + X 2 β 0 2 + ǫ, β 0 1 ∈ R q , β 0 2 ∈ R p − q , and the entries of ǫ i.i.d. sub-Gaussian. Suppose the rows of X are i.i.d with sub-Gaussian distribution Q . We are interested in estimating β 0 1 . Lasso estimator: � � β = (ˆ ˆ β 1 , ˆ � Y − X 1 β 1 − X 2 β 2 � 2 β 1 ) := arg min 2 / n + λ � β 1 � 1 + λ � β 2 � 1 . β 1 , β 2 (Les Houches) Additive model January 8, 2013 18 / 30
Notation Let X 1 PX 2 be the projection of X 1 on X 2 in L 2 ( Q ) , and define ˜ X 1 := X 1 − X 1 PX 2 = X 1 AX 2 . Let Σ 1 := E ˜ X T 1 ˜ X 1 / n , and let ˜ Λ 2 1 be its smallest eigenvalue. Define � � C 0 := arg min � C � 1 , ∞ : X 1 PX 2 = X 2 C , where � C � 1 , ∞ := max 1 ≤ k ≤ q � γ k � 1 , C := ( γ 1 , . . . , γ p − q ) . (Les Houches) Additive model January 8, 2013 19 / 30
Condition 1 1 / ˜ Λ 1 = O ( 1 ) �� � n Condition 2 � β 0 � 1 = O ( 1 ) and s 1 := � β 0 1 � 0 ∨ 1 = o . log p (Les Houches) Additive model January 8, 2013 20 / 30
Theorem � Take λ ≍ log p / n. Then � ˆ β − β 0 � 1 = O P ( 1 ) . If moreover � C 0 � 1 , ∞ = O ( 1 ) ( i . e . ℓ 1 − smoothness of the projection ) , then � � � log p � ˆ β 1 − β 0 1 � 1 = O P s 1 = o P ( 1 ) . n Special case: q = 1 (recall q = dim ( β 1 ) ). Then s 1 = 1 and hence �� log p � | ˆ β 1 − β 0 1 | = O P . n (Les Houches) Additive model January 8, 2013 21 / 30
The high-dimensional partial linear model Joint work with Patric M¨ uller . Additive model: Y = X β 0 + g 0 ( Z ) + ǫ, with ǫ ⊥ ( X , Z ) . We assume that the entries of ( X , Z ) ∈ R p × Z are i.i.d. with distribution Q and that the entries of ǫ are i.i.d. sub-Gaussian. We will assume that g 0 has a given “smoothness” m > 1 / 2 and that β 0 is sparse, with X β 0 is “smoother” than g 0 . Estimator: � � (ˆ � Y − X β − g ( Z ) � 2 2 / n + λ � β � 1 + µ 2 J 2 ( g ) β, ˆ g ) := arg min , β, g where J is some (semi-)norm on the space of functions on Z . (Les Houches) Additive model January 8, 2013 22 / 30
Notation We write ˜ X := XAZ := X − XPZ where XPZ := E ( X | Z ) . X T ˜ The smallest eigenvalue of E ˜ X / n is denoted by ˜ Λ 2 . The largest eigenvalue of E ( XPZ ) T ( XPZ ) / n is denoted by Λ 2 P . � · � is the L 2 ( Q ) -norm. (Les Houches) Additive model January 8, 2013 23 / 30
Recommend
More recommend