multivariate estimation of genetic parameters quo vadis
play

Multivariate estimation of genetic parameters Quo vadis? Karin - PowerPoint PPT Presentation

Multivariate estimation of genetic parameters Quo vadis? Karin Meyer Animal Genetics and Breeding Unit, University of New England, Armidale AAABG 2011 REML - quo vadis? Quo vadis? Is a Latin phrase meaning: "Where are you


  1. Multivariate estimation of genetic parameters – Quo vadis? Karin Meyer Animal Genetics and Breeding Unit, University of New England, Armidale AAABG 2011

  2. REML - quo vadis? “Quo vadis?” Is a Latin phrase meaning: "Where are you going?" or "Whither goest thou?" Wikipedia Disclaimer: Any similarity to previous AAABG title(s) is pure coincidence! Statistical methods I MIXED MODELS IN ANIMAL BREEDING: WHERE TO NOW? A.R. Gilmour Cargo Vale, CARGO, NSW 2800, formerly Orange Agricultural Institute, NSW Department of Primary Industries SUMMARY Over the past 60 years, mixed models have underpinned huge gains in plant and animal production through genetic improvement. Charles Henderson (1912-1989) established mixed models for estimating breeding values (BLUP) using the popularly called Henderson's Mixed Model and provided early methods (Henderson's Methods I, II and III) for estimating variance K. M. | | AAABG 2011 2 / 24 parameters. Robin Thompson then published the widely acclaimed REML method for variance

  3. REML - quo vadis? Outline Introduction Penalized REML Improved estimator Penalties Tuning factor Simulation results Application Finale www.wordle.net K. M. | | AAABG 2011 3 / 24

  4. REML - quo vadis? | Introduction Motivation Multivariate estimation: more than a few traits increasingly common −→ more traits of interest −→ technical capability: Hard/Soft-ware no. of parameters ↑ quadratically � → q traits −→ q ( q + 1 ) / 2 covariances SAMPLING VARIATION ↑ with no. of parameters/traits K. M. | | AAABG 2011 4 / 24

  5. REML - quo vadis? | Introduction Motivation Multivariate estimation: more than a few traits increasingly common −→ more traits of interest −→ technical capability: Hard/Soft-ware no. of parameters ↑ quadratically � → q traits −→ q ( q + 1 ) / 2 covariances SAMPLING VARIATION ↑ with no. of parameters/traits Alleviate S.V. Data: large −→ gigantic samples Model: parsimony −→ less parameters than covar.s e.g. RR, Reduced rank, structured Estimation −→ use additional information Bayesian: Prior REML: Impose penalty P on likelihood K. M. | | AAABG 2011 4 / 24

  6. REML - quo vadis? | Introduction Key message Yes, we can: Reduce effects of sampling variation in multivariate analyses −→ ‘better’ estimates Achieve this through ‘simple’ extension of standard REML −→ penalty K. M. | | AAABG 2011 5 / 24

  7. Penalized REML

  8. REML - quo vadis? | Penalized REML | Improved estimator What is a ‘better’ estimator? Quality Some measure of deviation from true value Loss e.g. MSE = Sampling variance + Bias 2 Covariance matrix Entropy loss (James & Stein 1961) Σ − 1 ˆ � Σ − 1 ˆ � − q Σ , ˆ � � � � � � = tr − log L 1 Σ Σ Σ Quadratic loss Σ − 1 ˆ � 2 Σ , ˆ � � � = tr Σ − I L 2 Σ K. M. | | AAABG 2011 6 / 24

  9. REML - quo vadis? | Penalized REML | Improved estimator What is a ‘better’ estimator? Quality Some measure of deviation from true value Loss e.g. MSE = Sampling variance + Bias 2 Covariance matrix Entropy loss (James & Stein 1961) Σ − 1 ˆ � Σ − 1 ˆ � − q Σ , ˆ � � � � � � = tr − log L 1 Σ Σ Σ Quadratic loss Σ − 1 ˆ � 2 Σ , ˆ � � � = tr Σ − I L 2 Σ Improved Modify estimator to reduce loss Sampling variance Loss Bias K. M. | | AAABG 2011 6 / 24

  10. REML - quo vadis? | Penalized REML | Improved estimator Penalized maximum likelihood ‘Standard’ (RE)ML −→ Maximize log L ( θ ) w.r.t. θ Penalized log L P ( θ ) = log L ( θ ) − 1 2 ψ P Tuning factor Penalty −→ f ( θ ) Crucial questions: What kind of penalty? −→ P should reduce S.V. How to determine the tuning factor? −→ relative emphasis: data ↔ penalty K. M. | | AAABG 2011 7 / 24

  11. REML - quo vadis? | Penalized REML | Improved estimator Toy example ψ = 0 Penalty changes shape 0 of likelihood! PHS design −5 s=100, n=10 log L q = 1, h 2 = 0 . 6 Penalty −10 P = ( h 2 − 0 . 3 ) 2 h 2 = 0.6 Tuning factor −15 Ψ = 0 , . . . , 100 0.2 0.4 0.6 0.8 Heritability K. M. | | AAABG 2011 8 / 24

  12. REML - quo vadis? | Penalized REML | Penalties Choosing a penalty: rationale Estimate genetic parameters −→ partition total variance −→ sampling correlations ˆ σ G ij & ˆ σ E ij Observation: ˆ Σ P estimated more accurately than ˆ Σ G ‘Borrow strength’ from ˆ Idea: Σ P K. M. | | AAABG 2011 9 / 24

  13. REML - quo vadis? | Penalized REML | Penalties Choosing a penalty: rationale Estimate genetic parameters −→ partition total variance −→ sampling correlations ˆ σ G ij & ˆ σ E ij Observation: ˆ Σ P estimated more accurately than ˆ Σ G ‘Borrow strength’ from ˆ Idea: Σ P How: − 1 Modify eigenvalues of ˆ ˆ Σ Σ G 1 P −→ λ i “canonical” eigenvalues [ 0 , 1 ] Assume ˆ Σ G ∼ IW distribution; scale ˆ Σ P 2 −→ general principle: P ∝ minus log of prior density � → empirical Bayes approach K. M. | | AAABG 2011 9 / 24

  14. REML - quo vadis? | Penalized REML | Penalties Penalty on eigenvalues Eig.values of ˆ Σ overdispersed Distribution of sample eigenvalues ● ● ● ● ● (Lawley 1956) ● ● ● ● ● ● 1.2 ● Overdispersion −→ S.V. ● ● ● ● ● ● ● ● ● Modify eigenvalues to ↓ S.V. ● ● ● ● ● ● ● ● ● ● 1 ● ● Regress ˆ λ i towards mean ¯ λ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 6 7 8 8 traits y i ∼ N ( 0 , I ) n = 500 i y i y ′ S = � i / ( n − 1 ) 1000 replicates K. M. | | AAABG 2011 10 / 24

  15. REML - quo vadis? | Penalized REML | Penalties Penalty on eigenvalues Eig.values of ˆ Σ overdispersed Distribution of sample eigenvalues ● ● ● ● ● (Lawley 1956) ● ● ● ● ● ● 1.2 ● Overdispersion −→ S.V. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Modify eigenvalues to ↓ S.V. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● Regress ˆ λ i towards mean ¯ ● ● ● λ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 λ ∗ = ¯ λ + 0 . 4 ( λ − ¯ ● λ ) ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 6 7 8 8 traits y i ∼ N ( 0 , I ) n = 500 i y i y ′ S = � i / ( n − 1 ) 1000 replicates K. M. | | AAABG 2011 10 / 24

  16. REML - quo vadis? | Penalized REML | Penalties Penalty on eigenvalues Eig.values of ˆ Σ overdispersed Distribution of sample eigenvalues ● ● ● ● ● (Lawley 1956) ● ● ● ● ● ● 1.2 ● Overdispersion −→ S.V. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Modify eigenvalues to ↓ S.V. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● Regress ˆ λ i towards mean ¯ ● ● ● λ ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ‘Bending’ (Hayes & Hill 1981) ● ● ● 0.8 λ ∗ = ¯ λ + 0 . 4 ( λ − ¯ ● ∗ λ ) ˆ G = β ˆ Σ G + ( 1 − β ) ¯ λ ˆ Σ Σ P ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 6 7 8 Equivalent for (RE)ML 8 traits i ( ˆ λ i − ¯ λ ) 2 � y i ∼ N ( 0 , I ) P λ ∝ n = 500 i ( log ( ˆ P ℓ λ i ) − log ( λ )) 2 λ ∝ � i y i y ′ S = � i / ( n − 1 ) 1000 replicates K. M. | | AAABG 2011 10 / 24

  17. REML - quo vadis? | Penalized REML | Penalties Penalty on eigenvalues - cont. λ i ∼ N ( ¯ λ, σ 2 ) Quadratic penalty implies Alternative: allow for non-common mean −→ Assume λ i ∼ Order statistics on unit interval → Beta P β ∝ � i ( i − 1 ) log ( λ i ) + ( q − i ) log ( 1 − λ i ) PDF: Order statistics for q=5 3 2 1 0 0 0.5 1 K. M. | | AAABG 2011 11 / 24

  18. REML - quo vadis? | Penalized REML | Penalties Penalty on matrix divergence Assume ˆ Σ G has Inverse Wishart prior 0 Substitute ˆ Σ P (unpenalized) for scale parameter − 1 0 C = ( ψ + q + 1 ) /ψ P Σ ∝ C log | ˆ Σ G | + tr (ˆ G ˆ P ) Σ Σ ≈ 1 Shrink ˆ Σ G towards ˆ Σ P By analogy: genetic correlation matrix ˆ R G 0 Shrink ˆ R G towards ˆ R P − 1 0 P R ∝ C log | ˆ R G | + tr (ˆ G ˆ P ) R R Advantages Easy to implement for standard REML parameterization Extension to penalize multiple matrices straightforward K. M. | | AAABG 2011 12 / 24

  19. REML - quo vadis? | Penalized REML | Tuning factor How to choose that tuning factor? Pick ψ a priori −→ ‘degree of belief’ e.g. based on sample size Estimate from the data Cross-validation Split data into ‘training’ & ‘validation’ sets Obtain ˆ Σ G and ˆ Σ E for range of ψ → training Evaluate log L (ˆ Σ G , ˆ Σ E ) for all ψ → validation Pick ψ which maximizes log L (ˆ Σ G , ˆ Σ E ) K − fold cross-validation K data subsets, K analyses K. M. | | AAABG 2011 13 / 24

Recommend


More recommend