penalized fits to a multiway layout with multivariate
play

Penalized Fits to a Multiway Layout with Multivariate Responses - PowerPoint PPT Presentation

Penalized Fits to a Multiway Layout with Multivariate Responses Rudolf Beran University of California, Davis Workshop on Model Selection and Related Areas University of Vienna 24 July 2008 1 Multivariate Linear Model Y = CM + E , where


  1. Penalized Fits to a Multiway Layout with Multivariate Responses Rudolf Beran University of California, Davis Workshop on Model Selection and Related Areas University of Vienna 24 July 2008 1

  2. Multivariate Linear Model Y = CM + E , where • the rows of n × d matrix Y are d -variate responses; • the n × p design matrix C has rank p ≤ n ; • the p × d matrix M is unknown; • the n × d error matrix E = V Σ 1 / 2 , where Σ is an unknown p.d. covariance matrix and the elements of V are iid with mean 0 , variance 1 , and finite 4 -th moment. The least squares estimator of M is ˆ M ls = C + Y . Let y = vec( Y ) , m = vec( M ) , e = vec( E ) and ˜ C = I d ⊗ C . The vectorized model asserts y = ˜ Cm + e . m ls = ˜ C + y = vec( ˆ The least squares estimator of m is ˆ M ls ) . For now, assume Σ = I d . 2

  3. Quadratic Loss and Risk η be any estimator of η = ˜ Let ˆ Cm = E( y ) . η − η | 2 and the corresponding η, η ) = p − 1 | ˆ The loss of η is L (ˆ risk is R (ˆ η, η ) = E L (ˆ η, η ) . Equivalently, these are loss and risk η = ˜ functions on estimators of m through the 1-to-1 map ˆ C ˆ m . η ls = ˜ m ls = ˜ C ˜ C + y has risk The least squares estimator ˆ C ˆ R (ˆ η ls , η ) = d . Biased estimators of η can reduce risk substantially: Stein (1956), James and Stein (1961), Stein (1966); also papers on symmetric linear estimators such as Stein (1981), Li and Hwang (1984), Buja, Hastie and Tibshirani (1989), Kneip (1994), Beran (2007) . . . Penalized least squares (PLS) generates promising, biased, candidate symmetric linear estimators of η . 3

  4. General Structure of PLS for the Multivariate Linear Model Let S be an index set of fixed cardinality. Let { Q s : s ∈ S} be p × p p.s.d. penalty matrices . N = { N s : s ∈ S} be d × d p.s.d. affine penalty weights . Cm | 2 + m ′ Q ( N ) m , PLS criterion: G ( m, N ) = | y − ˜ where Q ( N ) = � s ∈S ( N s ⊗ Q s ) . The PLS estimators of m and η are then C ′ ˜ C + Q ( N )] − 1 ˜ m pls ( N ) = argmin m G ( m, N ) = [ ˜ C ′ y , ˆ C ′ ˜ C + Q ( N )] − 1 ˜ η pls ( N ) = ˜ m pls = ˜ C [ ˜ C ′ y , a symmetric linear ˆ C ˆ estimator (generalized ridge). These estimators can be derived as Bayes estimators in a normal error version of the multivariate linear model. Kimeldorf and Wahba (1970) make the general point. 4

  5. • When d = 1 , the penalty weights are non-negative scalars. E.g. Wood (2000), Beran (2005) use multiple penalty terms with scalar weights. • Functional data-analysis treats penalized estimation of a function m of continuous covariates. E.g. Wahba, Wang, Gu, Klein, Klein (1995), Li (2000), Ramsay and Silverman (2002). To be considered: • Data-based choice of the affine penalty weights { N s : s ∈ S} ; • Supporting asymptotic theory for the foregoing, as p → ∞ ; • Penalty matrices { Q s : s ∈ S} suitable for the multiway layout with d -variate responses; • Modifications for the case of a general unknown covariance matrix Σ . 5

  6. Canonical Form and Risk of ˆ η pls ( N ) Let ˜ R = I d ⊗ C ′ C , a pd × pd matrix of full rank. Let ˜ U = I d ⊗ C ( C ′ C ) − 1 / 2 , a nd × pd matrix. U ′ ˜ R 1 / 2 and ˜ Then ˜ C = I d ⊗ C = ˜ U ˜ U = I pd . Hence, C ′ ˜ C + Q ( N )] − 1 ˜ η pls ( N ) = ˜ C [ ˜ C ′ y = ˜ US ( N ) ˜ U ′ y , ˆ R − 1 / 2 ] − 1 is symmetric. where S ( N ) = [ I pd + ˜ R − 1 / 2 Q ( N ) ˜ U ′ ˜ Because R ( ˜ C ) = R ( ˜ U ) and ˜ U = I pd , η = ˜ Cm = ˜ Uξ , with ξ = ˜ U ′ η . Let z = ˜ η pls ( N ) = ˜ U ′ y . Then ˆ US ( N ) z . This is the canonical form of ˆ η pls ( N ) . The risk of ˆ η pls ( N ) is thus η ( N ) , η ) = p − 1 E | S ( N ) z − ξ | 2 = p − 1 [tr( T ( N )) + tr( ¯ T ( N ) ξξ ′ )] , R (ˆ where T ( N ) = S 2 ( N ) and ¯ T ( N ) = [ I pd − S ( N )] 2 . 6

  7. Estimated Risk The estimated risk of ˆ η pls ( N ) is T ( N )( zz ′ − I pd ) ′ )] , R ( N ) = p − 1 [tr( T ( N )) + tr( ¯ ˆ (cf. Mallows (1973), Stein (1981)). Let ˆ N = argmin N ˆ R ( N ) . E.g. Use Cholesky N s = L s L ′ s with { l s,i,i ≥ 0 } . The adaptive PLS estimators of η and of m are η pls ( ˆ m apls = C + ˆ η apls = ˆ ˆ N ) and ˆ η apls . Supporting Asymptotics Let | · | sp denote spectral matrix norm: | B | sp = sup x � =0 [ | Bx | / | x | ] . • Let W ( N ) denote either the loss or estimated risk of ˆ η pls ( N ) . Let N = { N : max s ∈S | N s | sp ≤ b } . Then, for every finite a > 0 , | W ( N ) − R (ˆ η pls ( N ) , η ) | ] = 0 . lim sup E[sup p →∞ p − 1 | η | 2 ≤ a N ∈N 7

  8. • For every finite a > 0 , lim sup | R (ˆ η apls , η ) − min N ∈N R (ˆ η ( N ) , η ) | = 0 . p →∞ p − 1 | η | 2 ≤ a • Let V denote either the loss or risk of ˆ η apls , Then, for every finite a > 0 , E | ˆ R ( ˆ N ) − V | = 0 . lim sup p →∞ p − 1 | η | 2 ≤ a The loss, risk and estimated risk of the candidate estimator η pls ( N ) converge together, as p → ∞ , uniformly over N ∈ N . ˆ Estimated risk is here a trustworthy surrogate for loss or risk. η apls converges, as p → ∞ , to the minimal risk The risk of ˆ achievable by the PLS candidate estimators The plug-in risk estimator ˆ R ( ˆ N ) converges to the loss or risk of ˆ η apls as p → ∞ . 8

  9. Complete k 0 -way Layout with Multivariate Responses Now the d dimensional responses depend on k 0 covariates. Covariate k has p k distinct levels x k, 1 < x k, 2 < . . . x k,p k . Let I denote all k 0 -tuples i = ( i 1 , i 2 , . . . , i k 0 ) , where 1 ≤ i k ≤ p k for 1 ≤ k ≤ k 0 . Thus, i k indexes the levels of covariate k and I lists all possible covariate-level combinations. We put the elements of I in mirror-dictionary order . We observe Y = CM + E , the assumptions on E as before. Here C is the n × p data-incidence matrix of 0’s and 1’s that suitably replicates rows of the p × d matrix M into the rows of E( Y ) = CM . The design is complete: rank( C ) = p . Row i ∈ I of M equals f ( x 1 ,i 1 , x 2 ,i 2 , . . . , x k 0 ,i k 0 ) where f is an unknown vector-valued function. 9

  10. Constructing Penalty Matrices { Q s : s ∈ S} We devise a scheme that penalizes individually the main effects and interactions in the MANOVA decomposition of M . For 1 ≤ k ≤ k 0 , define the p k × 1 vector u k = p − 1 / 2 (1 , 1 , . . . , 1) ′ . k Let A k be an annihilator: a matrix such that A k u k = 0 . Let S denote the set of all subsets of { 1 , 2 , . . . , k 0 } , including ∅ . Let Q s,k = u k u ′ ∈ s ; and Q s,k = A ′ k A k if k ∈ s . Define k if k / k 0 � Q s = Q s,k − k 0 +1 , s ∈ S . k =1 Special case: A k = I p k − u k u ′ k . Denote Q s in this case by P AN,s . The matrices { P AN,s : s ∈ S} are mutually orthogonal, orthogonal projections such that � s ∈S P AN,s = I p . MANOVA decomposition: M = � s ∈S P AN,s M . 10

  11. From the foregoing definitions, P AN,s Q s = Q s P AN,s = Q s for every s ∈ S ; and P AN,s 1 Q s 2 = Q s 2 P AN,s 1 = 0 if s 1 � = s 2 . Thus, s | 2 = | Q 1 / 2 m ′ ( N s ⊗ Q s ) m = | Q 1 / 2 s MN 1 / 2 s ( P AN,s M ) N 1 / 2 s | 2 . The penalty term in the PLS criterion is seen to operate on the summands in the MANOVA decompostion of M : s ∈S | Q 1 / 2 s ( P AN,s M ) N 1 / 2 m ′ Q ( N ) m = � s ∈S m ′ ( N s ⊗ Q s ) m = � s | 2 . Spectral Form of the Penalty Matrices { Q s } A ′ k A k = U k Λ k U ′ k , where Λ k = diag { l k,i k : 1 ≤ i k ≤ p k } and 0 = λ k, 1 ≤ λ k, 2 ≤ . . . ≤ λ k,p k . The first column of U k is chosen to be u k . Then u k u ′ k = U k E k U ′ k , where E k = diag { e k,i k : 1 ≤ i k ≤ p k } , with e k, 1 = 1 and e k,i k = 0 if i k ≥ 2 . Hence, Q s,k = U k Γ s,k U ′ k , where Γ s,k = diag { γ s,k,i k : 1 ≤ i k ≤ p k } , ∈ s ; γ s,k,i k = λ k,i k if k ∈ s . with γ s,k,i k = e k,i k if k / 11

  12. Write U k = [ u k, 1 , . . . u k,p k ] . Then, Q s,k = � p k i k =1 γ s,k,i k P k,i k , where P k,i k = u k,i k u ′ k,i k is a rank one orthogonal projection. For i ∈ I , let P i = � k 0 k =1 P k 0 − k +1 ,i k and γ s,i = � k 0 k =1 γ s,k 0 − k +1 ,i k . Let I s = { i ∈ I : i k = 1 if k / ∈ s and i k ≥ 2 if k ∈ s } . This defines a partition of I . Then, Q s = � k 0 k =1 Q s,k − k 0 +1 = � i ∈I s γ s,i P i . Here, γ {∅} ,i = 1 if i ∈ I ∅ and γ s,i = � k ∈ s λ s,i k if s � = ∅ and i ∈ I s . Note: The { P i } are mutually orthogonal projections such that � i ∈I P i = I pd . The MANOVA projection P AN,s = � i ∈I s P i . Next steps • Structure of the PLS estimators in balanced layouts. • Construction of suitable annihilator matrices. • Extension of PLS estimators to a general covariance matrix Σ . 12

  13. Balanced k 0 -way Layout with Multivariate Responses In a balanced layout C ′ C = n 0 I p for some n 0 ≥ 1 . Then, C ′ ˜ C ) − 1 ˜ m ls = ( ˜ 0 ˜ C ′ y = n − 1 C ′ y (averaging responses over ˆ replications) and, for Q ( N ) = � s ∈S ( N s ⊗ Q s ) , C ′ ˜ C + Q ( N )] − 1 ˜ 0 Q ( N )] − 1 ˆ m pls = [ ˜ C ′ y = [ I pd + n − 1 ˆ m ls . Using also Q s = � i ∈I s γ s,i P i yields I pd + n − 1 i ∈I s [( I d + n − 1 0 Q ( N ) = � � 0 γ s,i N s ) ⊗ P i ] . s ∈S Hence, for a balanced layout, 0 γ s,i N s ) − 1 ⊗ P i ] ˆ i ∈I s [( I d + n − 1 m pls ( N ) = � � ˆ m ls . s ∈S In matrix form, ˆ i ∈I s P i ˆ M ls ( I d + n − 1 0 γ s,i N s ) − 1 . M pls ( N ) = � � s ∈S The annihilators determine the projections { P i } and the { γ s,i } in the affine shrinkage factors. Estimated risk also simplifies. 13

Recommend


More recommend