Genomic Prediction and Selection for Multi-Environments J. Crossa 1 j.crossa@CGIAR.org . Pérez 2 P perpdgo@gmail.com G. de los Campos 3 gcampos@gmail.com 1 CIMMyT-México 2 ColPos-México 3 Michigan-USA. June, 2015. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 1/24
Contents The problem 1 Models 2 Model fitting 3 Cross validation 4 Application examples (Part 1) 5 Model extensions with environmental covariates 6 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 2/24
The problem The problem In most agronomic traits, the effects of genes are modulated by environmental conditions, generating G × E. Researchers working in plant breeding have developed multiple methods for accounting for, and exploiting G × E in multi-environment trials. Genomic selection is gaining ground in plant breeding. Most applications so far are based on single-environment/single-trait models. Preliminary evidence (e.g., Burgueño et al., 2012) suggests that there is great scope for improving prediction accuracy using multi-environment models. The ideas can be taken one step further by incorporating information on environmental covariates. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 3/24
The problem Continue... CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 4/24
The problem Continue... CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 5/24
Models Models Model 1 (EL, Environment + Line, no pedigree) y ij = µ + E i + L i + e ij Model 2 (EA, Environment + Line, with markers) y ij = µ + E i + g j + e ij Model 3 (Environments, Line and interactions markes and environment) y ij = µ + E i + g j + Eg ij + e ij CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 6/24
Models Assumptions It is assumed that E i ∼ N ( 0 , σ 2 E ) , g ∼ N ( 0 , σ 2 g G ) with G being the genomic relationship matrix and Eg ij the interaction term between genotypes and environment. Eg ∼ N ( 0 , ( Z g GZ T g ) · Z E Z T E ) , Z g connects genotypes with phenotypes, Z E connects phenotypes with environments, and · stands for Hadamart product between two matrices. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 7/24
Model fitting Description of Data Objects - Y, data frame containing the elements described below; - Y$yield: (nx1), a numeric vector with centered and standardized yield; - Y$VAR (nx1), a factor giving the IDs for the varieties; - Y$ENV (nx1), a factor giving the IDs for the environments; - A, a symmetric positive semi-definite matrix containing the pedigree or marker-based relationships (dimensions equal to number of lines by number of lines). We assume that the rownames(A)=colnames(A) gives the IDs of the lines; CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 8/24
Model fitting Model fitting Model 1 (EL, Environment + Line, no pedigree) library(BGLR) # incidence matrix for main eff. of environments. ZE<-model.matrix(~factor(Y$ENV)-1) # incidence matrix for main eff. of lines. Y$VAR<-factor(x=Y$VAR,levels=rownames(A),ordered=TRUE) ZVAR<-model.matrix(~Y$VAR-1) # Model Fitting ETA<-list( ENV=list(X=ZE,model="BRR"), VAR=list(X=ZVAR,model="BRR")) fm1<-BGLR(y=Y$yield,ETA=ETA,saveAt="M1_",nIter=6000,burnIn=1000) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 9/24
Model fitting Model fitting Model 2 (EA, Environment + Line, with markers) X<-scale(X,center=TRUE,scale=TRUE) G<-tcrossprod(X)/ncol(X) G<-G/mean(diag(G)) L<-t(chol(G)) ZL<-ZVAR%*%L ETA<-list( ENV=list(X=ZE,model="BRR"), Grm=list(X=ZL,model="BRR") ) fm2<-BGLR(y=Y$yield,ETA=ETA,saveAt="M2_",nIter=6000,burnIn=1000) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 10/24
Model fitting Model 3 (Environments, Line and interactions markers and environment) ZGZ<-tcrossprod(ZL) ZEZE<-tcrossprod(ZE) K<-ZGZ*ZEZE diag(K)<-diag(K)+1/200 K<-K/mean(diag(K)) ETA<-list( ENV=list(X=ZE,model="BRR"), Grm=list(X=ZL,model="BRR"), EGrm=list(K=K,model="RKHS") ) fm3<-BGLR(y=Y$yield,ETA=ETA, saveAt=’M3_’,nIter=6000,burnIn=1000) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 11/24
Cross validation Cross validation CV1: Prediction of performance of newly developed lines (i.e., lines that 1 have not been evaluated in any field trials). CV2: Prediction in incomplete field trials; here the aim was to predict 2 performance of lines that have been evaluated in some environments but not in others. See Figure in next slide. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 12/24
Cross validation Continue... Figure 1: Two hypothetical cross-validation schemes (CV1 and CV2) for five lines (Lines 1-5) and five environments (E1-E5), source: Jarquín et al. (2014). CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 13/24
Application examples (Part 1) Example Wheat dataset (CIMMyT) Data for n = 599 wheat lines evaluated in 4 environments, wheat improvement program, CIMMyT. The dataset includes p = 1279 molecular markers ( x ij , i = 1 , ..., n , j = 1 , ..., p ) (coded as 0,1). The pedigree information is also available. Histogram of Y$yield ● 7 ● ● 400 ● ● ● 6 ● ● 300 5 ● ● ● ● ● ● ● Frequency Yield 4 200 3 ● 100 ● ● ● ● ● ● ● 2 ● ● ● ● ● ● 1 ● 0 1 2 4 5 1 2 3 4 5 6 7 Environment Y$yield Figure 2: Grain yield by environment. CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 14/24
Application examples (Part 1) Data preparation... #Load genotypic data load("pedigree_markers.RData") #Load phenotypic data pheno=read.table(file="599_yield_raw-1.prn",header=TRUE) pheno=pheno[,c(2,5,6)] index=paste(pheno$env,pheno$gen1,sep="@") yavg=tapply(pheno$GY,index,"mean") tmp=names(yavg) tmp2=strsplit(tmp,"@") gen=character() env=character() for(i in 1:length(tmp2)) { env[i]=tmp2[[i]][1] gen[i]=tmp2[[i]][2] } Y=data.frame(yield=yavg,VAR=gen,ENV=env) index=order(as.character(Y$ENV),as.character(Y$VAR)) Y=Y[index,] CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 15/24
Application examples (Part 1) Continue... index=order(colnames(A)) A=A[index,index] X=X[index,] save(Y,A,X,file="standarized_data.RData") CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 16/24
Application examples (Part 1) Code for cross validation schemas ... #CV=1: assigns lines to folds #CV=2: assigns entries of a line to folds CV<-2 nFolds<-5 sets<-rep(NA,nrow(Y)) set.seed(123) IDs<-as.character(unique(Y$VAR)) if(CV==1) { folds<-sample(1:nFolds,size=length(IDs),replace=TRUE) for(i in 1:nrow(Y)){ sets[i]<-folds[which(IDs==Y$VAR[i])] } } if(CV==2) { IDy<-as.character(Y$VAR) for(i in IDs){ tmp=which(IDy==i) ni=length(tmp) tmpFold<-sample(1:nFolds,size=ni,replace=ni>nFolds) sets[tmp]<-tmpFold } } CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 17/24
Application examples (Part 1) Fitting model and extracting results... ################################################### #Model 1 ################################################### # incidence matrix for main eff. of environments. ZE<-model.matrix(~factor(Y$ENV)-1) # incidence matrix for main eff. of lines. Y$VAR<-as.factor(Y$VAR) ZVAR<-model.matrix(~Y$VAR-1) # Model Fitting ETA<-list( ENV=list(X=ZE,model="BRR"), VAR=list(X=ZVAR,model="BRR")) y=Y$yield testing=(sets==1) y[testing]=NA fm1<-BGLR(y=y,ETA=ETA,saveAt="M1_",nIter=6000,burnIn=1000) unlink("*.dat") #Extract the predictions predictions=data.frame(Env=Y$ENV[testing], Individual=Y$VAR[testing], y=Y$yield[testing], yHat=fm1$yHat[testing]) CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 18/24
Application examples (Part 1) Continue... #write.table(predictions,file=paste("predictions.csv",sep=""), # row.names=FALSE,sep=",") #doBy version predictions=orderBy(~Env,data=predictions) lapplyBy(~Env,data=predictions,function(x){cor(x$yHat,x$y)}) > lapplyBy(~Env,data=predictions,function(x){cor(x$yHat,x$y)}) $‘1‘ [1] 0.01630911 $‘2‘ [1] 0.6108203 $‘4‘ [1] 0.564435 $‘5‘ [1] 0.289207 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 19/24
Application examples (Part 1) Results for one fold... 0.4 0.3 Correlation 0.2 0.1 0.0 M1 M2 M3 Figure 3: Results from CV1 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 20/24
Application examples (Part 1) Continue... 0.5 0.4 0.3 Correlation 0.2 0.1 0.0 M1 M2 M3 Figure 4: Results from CV2 CIMMYT, México-SAGPDB Genomic Prediction and Selection for Multi-Environments 21/24
Recommend
More recommend