genomic selection workshop hands on practical sessions
play

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (GBLUP-RR) - PowerPoint PPT Presentation

GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (GBLUP-RR) Paulino Prez 1 Jos Crossa 2 1 ColPos-Mxico 2 CIMMyT-Mxico September, 2014. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 1/35 Contents


  1. GENOMIC SELECTION WORKSHOP: Hands on Practical Sessions (GBLUP-RR) Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México September, 2014. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 1/35

  2. Contents General comments 1 GBLUP-Ridge Regression 2 Application examples 3 Biplot from marker effects 4 Extension of BRR to include infinitesimal effect 5 SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 2/35

  3. General comments General comments Remember, A simple model used frequently in plant breeding stands that the 1 phenotypic value of an individual ( P ) is expressed as the summation of the genetic value ( G ) and the residual environmental effect ( E ): P = G + E , (1) where G includes additive, dominance and epistatic effects. A model that includes solely additive effects ( A ) can be easily derived 2 from (1), and can be expressed as follows, P = A + E ′ (2) where E ′ includes effects that are non additive. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 3/35

  4. General comments Continue... The breeding value ( BV ) for an individual can be computed based on narrow sense heritability ( h 2 ), BV i = µ + h 2 ( y i − µ ) , where µ is mean phenotypic value of a population and y i is the phenotypic value for individual i . Obviously it is necessary to have information of parents and offsprings to compute this. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 4/35

  5. General comments Continue... In Genomic Selection (GS), genetic values are approximated using linear regression (Meuwissen et al., 2001), that is: p � y i = g i + e i = µ + x ij β j + e i (3) j = 1 Relationships between marker genotypes ( x 1 i : 0 and 1) and phenotypes ( y i ) of the individuals (open circles) in a training population. If the marker genotype is correlated with the phenotype, segregation is modelled using the bold line (taken from Nakaya and Isobe, 2012). SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 5/35

  6. General comments Continue... In GS it is possible to obtain Genomic Estimated Breeding Values (GEBVs for short). This can be done simply by adding up marker effects (according to its marker genotypes) obtained from a training population, that is: p x ij ˆ � GEBV i = β j (4) j = 1 y i (and in some cases ˆ Next we show how to obtain the predictions ˆ β j ) using several models. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 6/35

  7. General comments Continue... Figure 1: Graphical representation of parametric and non-parametric methods used commonly in whole-genomic prediction. In this presentation we will focus in Ridge Regression. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 7/35

  8. General comments Continue... 0.8 Gaussian Double Exponential Scaled − t (5df) BayesC ( π =0.25) 0.6 p( β j ) 0.4 0.2 0.0 − 6 − 4 − 2 0 2 4 6 β j Figure 2: Prior densities of regression coefficients with Markers. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 8/35

  9. GBLUP-Ridge Regression GBLUP-RR This is the most basic model used in GS. Let p � y i = g i + e i = µ + x ij β j + e i j = 1 marker effects are obtained by solving the following optimization problem, � � � � � β 2 X j β j ) ′ ( y − min β , λ ( y − X j β j ) + λ , (5) j where λ > 0 is a regularization parameter. Notes: λ is unknown and can be selected by using cross-validation 1 we need to minimize a “penalized sum of squares” . 2 SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 9/35

  10. GBLUP-Ridge Regression Continue... The optimization problem has a closed solution, β = ( X ′ X + λ I ) − 1 X ′ ˜ ˆ y , where ˜ y = y − µ 1 . Unfortunately, we need to know the value of λ to use this solution. The problem can be solved easily using the Bayesian framework. Let β ∼ N ( 0 , σ 2 β I ) and e ∼ N ( 0 , σ 2 e I ) , and u = X β , then model (3) can be written as: y = µ 1 + u + e (6) . Note that u ∼ N ( 0 , σ 2 β XX ′ ) Model (6) is know as GBLUP SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 10/35

  11. GBLUP-Ridge Regression Training and testing sets Note also that the covariance matrix for u involves the product XX ′ , which is proportional to the Genomic Relationship Matrix proposed by VanRaden (2008). We will assume that u ∼ N ( 0 , σ 2 u G ) with G = XX ′ / k . The mix-model equations for (6) are as follows: 1 ′ 1 σ − 2 1 ′ σ − 2 � � � � � 1 ′ y � µ ˆ e e = (7) 1 ′ σ − 2 I σ − 2 e σ − 2 + G σ − 2 ˆ u y e u u u and µ are obtained solving the mix-model equations, assuming that the variance components σ 2 e and σ 2 u are known. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 11/35

  12. GBLUP-Ridge Regression Continue... If we have individuals for training and testing, we can partition G and u as follows, � G 11 � u 1 � y 1 � 1 1 � � � � G 12 G = , u = , y = , 1 = G 21 G 22 u 2 y 2 1 2 µ and ˆ 1=individuals in the training set, 2=individuals in the testing set. ˆ u 1 are obtained as the solution of the mix-model equations, 1 1 1 σ − 2 1 σ − 2 � 1 ′ 1 ′ � � � � 1 ′ � µ ˆ 1 y 1 e e = 1 σ − 2 I 11 σ − 2 e σ − 2 + G 11 σ − 2 ˆ 1 ′ u 1 y 1 e u u The predictions for individuals in the testing set are given by µ 1 2 + G 21 G − 1 ˆ 11 ˆ y 2 = ˆ u 1 SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 12/35

  13. Application examples Wheat dataset Data for n = 599 wheat lines evaluated in 4 environments, wheat improvement program, CIMMyT. The dataset includes p = 1279 molecular markers ( x ij , i = 1 , ..., n , j = 1 , ..., p ) (coded as 0,1). The pedigree information is also available. Lets load the dataset in R, Load R 1 Install BGLR package (if not yet installed) 2 Load the package 3 Load the data 4 SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 13/35

  14. Application examples Continue... Figure 3: Loading the BGLR package and the wheat dataset. SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 14/35

  15. Application examples Continue... You can explore the MM matrix, pedigree matrix within R, fix(wheat.X) fix(wheat.A) SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 15/35

  16. Application examples Continue... Lets assume that we want to predict the grain yield for environment 1 using . We do not know the value for σ 2 ridge regression or equivalently the GBLUP e and λ , so we can obtain estimates using the data. We will use the function BGLR. R code below fit the RR model using Bayesian approach with non informative priors for σ 2 e , σ 2 β , rm(list=ls()) library(BGLR) data(wheat) X=wheat.X Y=wheat.Y setwd(’/tmp/’) #Linear predictor ETA=list(list(X=X,model="BRR")) fmR<-BGLR(y=Y[,1],ETA=ETA,nIter=10000,burnIn=5000,thin=10) plot(fmR$yHat,Y[,1]) SLU,Sweden GENOMIC SELECTION WORKSHOP:Hands on Practical Sessions (GBLUP-RR) 16/35

Recommend


More recommend