Model selection and estimation for latent variable models Presented by Emi Tanaka School of Mathematics and Statisitcs dr.emi.tanaka@gmail.com @statsgen Download pdf of these slides June 26th 2019 @ EcoSta2019 1 / 18 here.
Motivation Many scientific disciplines use latent variable (LV) models, or its special case, factor analytic (FA) models (e.g. medical, economics and agriculture ). Data: ` ` corn hybrids n = 64 ` ` trials p = 6 1-3 replicates at each trial Trait: Yield 2 / 18
Which corn hybrid is the best? The transposed data matrix Y ⊤ 3 / 18
Factor analytic model A dimension reduction: p variables are written by a smaller number of underlying, unobserved factors where k (usually much less). k ≤ p Key symbol: Λ p × k Key point: like a linear regression but the common factors are not observed. 4 / 18
Factor analytic model - multivariate form Putting it all together in a matrix form or ... 5 / 18
Univariate form Thinking as a linear mixed model y = X β + Z u + e u 0 G 0 ] ∼ N ([ [ ] [ ]) e 0 0 R Here we have Y ⊤ ➤ y = vec( ) and ➤ X = 1 n ⊗ I p β = μ and f ⊤ ϵ ⊤ ) ⊤ ➤ Z = diag( I n ⊗ Λ , I np ) u = ( , ➤ G = diag( I nk , I n ⊗ Ψ ) What about and ? e R 6 / 18
All cells are not made equal The transposed data matrix Y ⊤ Most of the cells have 3 replicates but some are 2 replicates and one with 1 observation. 7 / 18
Two-stage analysis ➤ Quite often data is processed to fit the rectangular structure. ➤ In this case, "observations" in data matrix are estimates. ➤ Estimates may have different precisions. ➤ These precisions may be used as weights for the second step. ➤ We take the diagonal entries, , of precision matrix c ii np × np as weights for the next step, i.e. take as a known C − 1 R yy diagonal matrix with diagional entries as . c ii ➤ Alternatively, we can do a one-stage analysis (better, also can handle missing values) but not in this talk. 8 / 18
Processed 1 n ⊗ I p I np ⏞ ⏞ y = X β ���� + Z [( ⊗ Λ ) f + ϵ ] + e I n data u ������ : ���� Λ ⊤ I n ⊗ ( Λ + Ψ ) 0 np × np 0 np u ������ : ���� ] ∼ N ( [ ] , [ corndata [ R ̂ ]) e 0 np 0 np × np ➤ Our R-package platent fits this model with FA order selected by our # A tibble: 384 x 4 site hybrid yield weights <fct> <fct> <dbl> <dbl> OFAL algorithm (Hui et al. 2018, Biometrics). 1 S1 G01 144. 0.00999 2 S2 G01 67.5 0.00993 ➤ The current capibility is limited to above model. 3 S3 G01 105. 0.00857 4 S4 G01 154. 0.0113 5 S5 G01 110. 0.0143 6 S6 G01 88.3 0.00361 7 S1 G02 156. 0.00999 library (platent) # still in development 8 S2 G02 79.8 0.00662 9 S3 G02 61.7 0.00857 fit_ofal(yield ~ site + id(hybrid):rr(site) + id(hybrid):diag(site), 10 S4 G02 138. 0.0113 weights = weights, data = corndata) # … with 374 more rows fit_ofal(yield ~ site + id(hybrid):fa(site), weights = weights, data = corndata) 9 / 18
OFAL algorithm Estimating variance parameters Recall is a factor loading matrix: We need to estimate: Λ p × k ➤ fixed effects and β ⎡ ⎤ ⋯ λ 11 λ 12 λ 1 k ➤ variance parameters ⎢ ⎥ ) ⊤ ) ⊤ ) ⊤ θ = (vec( Λ , diag( Ψ ⋯ λ 21 λ 22 λ 2 k ⎢ ⎥ Estimating fixed effects ⎢ ⎥ ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ . For given , use BLUE for : ⎢ ⎥ ⋯ λ k 1 λ k 2 λ kk ⎢ ⎥ θ β ⎢ ⎥ ⋮ ⋮ ⋮ ⎢ ⎥ ⎣ ⋯ ⎦ λ p 1 λ p 2 λ pk X ⊤ V − 1 ) − 1 X ⊤ V − 1 β ̂ = ( X y What , i.e. how many factors, should we have? where is a function of . k var( y ) = V θ Here: . Λ ⊤ R ̂ Assume that is a pseudo factor loading V = I n ⊗ ( Λ + Ψ ) + Λ 0 p × d matrix where . k ≤ d ≤ p
Estimating variance parameters ⎡ ⎤ ω g ,1 REML or ML estimate: the typical (frequentist) approach ⎢ ⎥ ω g ,2 ⎢ ⎥ ω g = ⎢ ⎥ θ ̂ ⋮ = arg max ℓ ( θ | y , X , Z , β ) ML/REML ⎢ ⎥ θ OFAL estimate: our approach via penalised likelihood ⎣ ⎦ ω g , d ⎧ ⎫ d p d p d ‾ ‾‾‾‾‾‾‾‾ ‾ ⎪ ⎪ λ 2 θ ̂ ⎨ ⎬ = arg max ℓ ( θ ) − s ω g , l − s ω e , ij λ 0, ij | | OFAL 0, ij ∑ ∑ ∑ ∑ ∑ ⎪ ⎪ θ l =1 i =1 j = l i =1 j =1 ⎷ ⎩ ⎭ ⎡ ⎤ ⋯ ω e ,11 ω e ,12 ω e ,1 d where ⎢ ⎥ ⋯ ω e ,21 ω e ,22 ω e ,2 d ⎢ ⎥ ➤ is a tuning parameter, ⎢ ⎥ s ⋮ ⋮ ⋱ ⋮ ⎢ ⎥ is a group-wise adaptive weight for th column of , and Ω e = ⎢ ⎥ ω e , d 1 ω e , d 2 ⋯ ω e , dd ➤ ω g , l l Λ 0 ⎢ ⎥ is an element-wise adaptive weight for th entry of . ⎢ ⎥ ⋮ ⋮ ⋮ ⎢ ⎥ ➤ ω e , ij i , j Λ 0 ⎣ ⎦ ⋯ ω e , p 1 ω e , p 2 ω e , pd 11 / 18
OFAL Demonstration Say , then you would expect . s ω e ,15 → ∞ λ 15 → 0 12 / 18 Λ 0
OFAL Demonstration Say , then you would expect . ∑ p ‾ ‾‾‾‾‾‾‾‾‾‾‾‾ l =1 λ 2 λ 2 ‾ s ω g ,5 → ∞ ( + ) → 0 l 5 l 6 √ Sum of squares is zero only if each element is zero, so and λ l 5 → 0 for . λ l 6 → 0 l = 1, . . . , p 13 / 18 Λ 0
EM algorithm ( r ) θ ̂ ( r ) β ̂ Q ( θ ) = � ( ℓ ( θ ) | y , X , , ) ➤ with respect to the conditional density ; E-Step f ( u | y ) ➤ where is the complete log-likelihood (or residual log-likelihood); ℓ and are the estimate of and , respectively, for the th ( r ) ( r ) β ̂ θ ̂ ➤ β θ r iteration. 14 / 18
M-Step 1/2 p p k k k ( r +1) λ 2 θ ̂ = arg max Q ( θ ) − s ω g , l ( − s ω e , ij λ ij | | ij ∑ ∑ ∑ ∑ ∑ ) θ l =1 i =1 j = l i =1 j =1 If is a local maximiser of above then there exists local maxiiser of below such that . θ ̂ θ ̃ τ ̃ θ ̂ θ ̃ ( , ) = See proof in Hui et al. (2018). p p d d d d ns 2 ω el ( r +1) τ ̃ ( r +1) λ 2 θ ̃ ( , ) = arg max Q ( θ ) − − s ω g , jk λ jk | | − ω el τ l jk 4 ∑ τ l ∑ ∑ ∑ ∑ ∑ θ , τ ≥ 0 l =1 k =1 j =1 j =1 k =1 l =1 ➤ Reformulates problem into an elastic-net type regularisation problem. ➤ Employ coordinate-wise optimisation to obtain loading estimates. Hui, Tanaka & Warton (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. Biometrics 15 / 18
Adaptive Weights & Tuning Parameter Selection Adaptive Weights ➤ Fit model with unpenalised likelihood first to obtain estimate, of . Λ ̃ Λ 0 0 ➤ Perform eigendecomposition . Take as the first columns of . Q ⊤ Λ ∗ QD 1/2 Λ ̃ = QD d 0 0 ➤ Construct adaptive weights as 1/2 p d d ij ) 2 ij | − 1 D − 1/2 λ ∗ λ ∗ ω e , l = ( = and ω g , ij = | . kk ∑ ∑ ∑ ( ) j = l i =1 k = l Tuning Parameter ➤ Tuning parameter may be selected from some information criterion (e.g. AIC, BIC, EBIC). s We used ERIC * . 16 / 18 * Hui, Warton & Foster (2015) Tuning Parameter Selection for the Adaptive Lasso Using ERIC. JASA
Performance & Future ➤ Simulation results suggests competitive performance. See Hui et al. (2018). Details ➤ I only show this for an FA (Gaussian) model here but our paper also shows results from a negative binomial generalised linear latent variable model. ➤ We need more research into: ➤ adaptive weight construction; ➤ computational efficient approaches and ➤ study for high-dimensional problems & other non-normal responses. Hui, Tanaka & Warton (2018) Order Selection and Sparsity in Latent Variable Models via the Ordered Factor LASSO. Biometrics 17 / 18
These slides, made using xaringan R-package, can be found at bit.ly/ecosta2019 Our methods paper: Hui, Tanaka & Warton (2018) Biometrics. Follow the platent R-package development at http://github.com/emitanaka/platent Comments/feedback welcome! dr.emi.tanaka@gmail.com @statsgen Acknowledgement Big thanks go to my collaborator Dr. Francis Hui! His EcoSta2019 talk on this afternoon 4.10pm at Room S1A01 on 18 / 18
Recommend
More recommend