J. Anim. Breed. Genet. ISSN 0931-2668 ORIGINAL ARTICLE Using the genomic relationship matrix to predict the accuracy of genomic selection M.E. Goddard 1,2 , B.J. Hayes 2 & T.H.E. Meuwissen 3 1 Department of Agriculture and Food Systems, University of Melbourne, Melbourne, Vic., Australia 2 Biosciences Research Division, Victorian Department of Primary Industries, Bundoora, Vic., Australia ˚ s, Norway 3 Norwegian University of Life Sciences, A Keywords Summary Genomic selection; relationship matrix. Estimated breeding values (EBVs) using data from genetic markers can be predicted using a genomic relationship matrix, derived from animal’s Correspondence genotypes, and best linear unbiased prediction. However, if the accuracy M. Goddard, Biosciences Research Division, of the EBVs is calculated in the usual manner (from the inverse element Victorian Department of Agriculture, 1 Park of the coefficient matrix), it is likely to be overestimated owing to sam- Drive, Bundoora, Vic. 3083, Australia. Tel: +61 pling errors in elements of the genomic relationship matrix. We show 39032 7091; Fax: +61 39032 7158; here that the correct accuracy can be obtained by regressing the rela- E-mail: mike.goddard@dpi.vic.gov.au tionship matrix towards the pedigree relationship matrix so that it is an Received: 6 February 2011; unbiased estimate of the relationships at the QTL controlling the trait. accepted: 18 August 2011 This method shows how the accuracy increases as the number of mark- ers used increases because the regression coefficient (of genomic rela- tionship towards pedigree relationship) increases. We also present a deterministic method for predicting the accuracy of such genomic EBVs before data on individual animals are collected. This method estimates the proportion of genetic variance explained by the markers, which is equal to the regression coefficient described above, and the accuracy with which marker effects are estimated. The latter depends on the vari- ance in relationship between pairs of animals, which equals the mean linkage disequilibrium over all pairs of loci. The theory was validated using simulated data and data on fat concentration in the milk of Hol- stein cattle. two individuals share, whereas the pedigree-derived Introduction relationship matrix is the expectation of this propor- The matrix of relationships among a group of indi- tion. This genomic relationship matrix can be used viduals can be used to predict their breeding values, in genomic selection to estimate breeding values. to manage inbreeding and in genetic conservation. Genomic selection refers to the use of a large num- This relationship matrix can be calculated from the ber of genetic markers, such as SNPs, covering the pedigree, but it is also possible to calculate the rela- whole genome to predict the genetic value of indi- tionship matrix from genotypes at genetic markers viduals (Meuwissen et al. 2001). The individuals such as single-nucleotide polymorphisms (SNPs). might be people whose genetic risk of developing a Elements of the genomic relationship matrix are esti- complex disease is being predicted, or they might be mates of the realized proportion of the genome that domestic animals or plants in which estimates of ª 2011 Blackwell Verlag GmbH • J. Anim. Breed. Genet. 128 (2011) 409–421 doi:10.1111/j.1439-0388.2011.00964.x
Predict the accuracy of genomic selection M. E. Goddard et al. their breeding value will be used to select parents to in VanRaden 2008 and Harris & Johnson 2010). breed the next generation. In cattle, the availability However, this requires that the statistical model of high-throughput, high-density genotyping with matches the true situation. For instance, if the model SNP chips has led to the widespread adoption of makes assumptions about the distribution of the genomic selection in dairy cattle breeding pro- effects of genes affecting the trait (QTL), this should grammes, where it is predicted to double the rate of match the real distribution. If we assume that there genetic improvement (Schaeffer 2006; Dalton 2009). are a very large number of QTL whose effects follow Traditionally, livestock have been selected on the a normal distribution with constant variance, the basis of estimated breeding values (EBVs) calculated analysis (called BLUP by Meuwissen et al. 2001) is from data on phenotype and pedigree using a statis- robust to departures from this assumption and the tical technique called best linear unbiased prediction accuracy is little affected even if the distribution of (BLUP) (Henderson 1984). A desirable feature of this QTL effects does not follow a normal distribution. method is that the accuracy of the EBVs could be However, empirical tests of the accuracy derived calculated as part of the statistical analysis. This is from the inverse of the BLUP equations often find not the case with many methods used for genomic that it is overestimated by this method, dramatically selection. Currently, the most trusted method for so if it is used to predict breeding values in one assessing the accuracy of genomic EBVs is an empiri- breed based on data from another breed (Hayes et al. cal test in which a sample of animals have genomic 2009a). An anomaly of the method is that it does EBVs calculated, and then additional phenotypic not predict increasing accuracy as the number of data are collected to assess how accurately the EBVs markers is increased and this explains why it overes- predict these new data. This is time-consuming, fails timates accuracy as shown below. In this paper, we to predict the accuracy of individual EBVs and is describe how to calculate the accuracy of genomic wasteful in that the new data are used only to esti- EBVs after the data have been collected, taking into mate accuracy and not to improve the prediction of account the number of markers used. breeding value. Other cross-validation techniques The accuracy of genomic EBVs expected before can also be used but they are also time-consuming data are collected has been considered by Goddard and do not yield the accuracy of the final prediction, (2009) for unrelated animals and by Hayes et al. or individual accuracies. In practice, some authors (2009b) for simple family structures such as groups have used the inverse of the mixed model or BLUP of full-sibs and half-sibs. Their deterministic method equations (e.g. VanRaden 2008; Hayes et al. 2009a,b) treats the genome as if it were a series of small chro- but, as we show in this paper, this can overestimate mosomal segments, each of which is inherited inde- the accuracy. pendently. Here, we treat chromosomes as It would be very useful to be able to predict the continuous and show that a similar prediction accuracy of EBVs calculated using genomic selection results. in two situations. Firstly, after the data have been The objectives of this paper are twofold: (i) to collected and are being analysed, it would be useful derive a method for calculating the accuracy of EBVs to calculate accuracies of EBVs as part of the statisti- calculated using a known genomic relationship cal analysis, as is done for traditional EBVs, includ- matrix; and (ii) to derive a method to predict this ing for individuals without their own phenotypes. accuracy before the data on individual animals are In this situation, we are interested in calculating the collected. In the Materials and Methods section, we accuracy of the EBVs of individual animals. Sec- first develop the theory to predict accuracy and then ondly, when planning a selection programme using describe the simulation and real data in which it is genomic selection, it would be useful to be able to tested. We only consider the genomic selection predict the accuracy of alternative designs so that method called BLUP by Meuwissen et al. (2001). the best one could be implemented. In this situa- tion, we wish to predict the accuracy of EBVs of classes of animals, which we might then use in Materials and methods deterministic simulations of alternative breeding Theory programmes. This paper presents methods for both of these situations. Calculation of accuracy after data are collected After the data have been collected and are being Consider a group of T animals with breeding values analysed, it should be possible to predict the accu- are controlled by Q QTL. At the j th QTL, the geno- types (00, 01, 11) have frequency (1 ) p j ) 2 , 2 p j racy from the properties of the statistical method (as 410 ª 2011 Blackwell Verlag GmbH • J. Anim. Breed. Genet. 128 (2011) 409–421
Recommend
More recommend