Marker Based Infinitesimal Model for Quantitative Trait Analysis Shizhong Xu Department of Botany and Plant Sciences University of California Riverside, CA 92521
Outline • Quantitative trait and the infinitesimal model • Infinitesimal model using marker information • Adaptive infinitesimal model • Simulation studies • Rice and beef cattle data analyses
Outline • Quantitative trait and the infinitesimal model • Infinitesimal model using marker information • Adaptive infinitesimal model • Simulation studies • Rice and beef cattle data analyses
Quantitative Trait
Quantitative Genetics Model Phenotype = Genotype + Environment
Infinitesimal Model • Infinite number of genes • Infinitely small effect of each gene • Effect of an individual gene is not recognizable • Collective effect of all genes are studied using pedigree information (genetic relationship) • Best linear unbiased prediction (BLUP)
Outline • Quantitative trait and the infinitesimal model • Infinitesimal model using marker information • Adaptive infinitesimal model • Simulation studies • Rice and beef cattle data analyses
Marker Based Infinitesimal Model p y Z j jk k j 1 k y Z j jk k j 1 k L ( ) ( ) y Z d j j j 0
Different from Longitudinal Data Analysis L ( ) ( ) y Z d j j j 0 ( ) ( ) ; y t t t j j
Numerical Integration
Bin Effect Model y Z j jk k j 1 k L ( ) ( ) y Z d j j j 0 m ( ) ( ) y Z j j k k k j 1 k m y Z j jk k j 1 k
Bin Effects p 1 k ( ) Z Z h jk j p 1 h k Dense markers Bin Bin
Recombination Breakpoint Data p 1 k Marker: ( ) Z Z h jk j p 1 h k 1 k Breakpoint: ( ) Z Z d jk j k 0
1 k ( ) Z Z d jk j k 0 8 2 1 0 0.8 Z jk 10 10
What Does a Bin Effect Represent? m y Z j jk k j 1 k 1 k ( ) Z Z d jk j k 0 1 k k ( ) ( ) d d k k k 0 0 size of bin k k uniform variable
Assumptions of the Infinitesimal Model • High linkage disequilibrium within a bin • Homogeneous genetic effect within a bin
High Linkage Disequilibrium 1 k ( ) Z Z d jk j k 0 number of crossovers, inversely k related to linkage disequilibrium 1 lim var( ) , high linkage disequilibrium (F ) Z 2 jk 2 0 k lim var( ) 0, low linkage disequilibrium Z jk k Larger v ar( ) means higher power Z jk
Range of Var(Z) 2 2 1 1 1 e k 2 lim var( ) lim k lim Z e k jk 2 4 2 2 0 0 0 k k k k 2 2 1 1 e k 2 lim var( ) lim k lim 0 Z e k jk 2 4 2 k k k k 0 var( ) 0.5 Z jk 0 k choose var( ) as close to 0.5 as possible Z jk but with the number of bins small enough to be handled by a program for a given sample size
Outline • Quantitative trait and the infinitesimal model • Infinitesimal model using marker information • Adaptive infinitesimal model • Simulation studies • Rice and beef cattle data analyses
Adaptive Model Relaxes the Two Assumptions • High linkage disequilibrium within a bin - prevent var(Z) from being zero • Homogeneous genetic effect within a bin - make all effects positive
Redefine the Bin Size by the Number of Markers Within a Bin m y Z j jk k j 1 k p 1 k ( ) Z Z h jk j p 1 h k p k ( ) p h k k k 1 h number of markers in bin k p k
Weighted Average Effect of a Bin p p 1 k k Unweighted: ( ); ( ) Z Z h p h jk j k k k p 1 1 h h k p p 1 k k * * 1 Weighted: ( ) ( ); ( ) ( ) Z w h Z h w h h jk j k p 1 1 h h k m * * y Z j jk k j 1 k
Weight System p 1 k ˆ ˆ Define | | = mean(| |) c b b k h p 1 h k ˆ where is the least squares estimate b h of marker within bin h k The weight for marker is defined as h ˆ ˆ p b b ˆ 1 k h h w c b ˆ h k h p mean(| |) k ˆ b | | b h 1 h
Weighted Var(Z*) > 0 p p 1 k k * * * * var( ) var ( ) 2 cov ( ), ( ) Z Z h Z h Z l jk j j j 2 p 1 h l h k p p 1 1 1 k k 2 2 (1 2 ) w w w h h l hl 2 2 2 p 1 h l h k p 1 1 k 2 , when no linkage disequilibrium (1 2 ) 0 w h hl 2 2 p 1 h k 0
Homogenization of Marker Effects Within Bin p p p ( ) h k k k ˆ * 1 ( ) | | w h c c p b ˆ k h k k k h b 1 1 1 h h h h ( ) h where (a constant) ˆ b h p k ˆ ˆ * | | 0 as long as one 0 b b k h h 1 h
Outline • Quantitative trait and the infinitesimal model • Infinitesimal model using marker information • Adaptive infinitesimal model • Simulation studies • Rice and beef cattle data analyses
Measurement of Prediction (Cross Validation) 1 n ˆ 2 ( ) , Mean Squared Error MSE y y j j n 1 j n 1 2 ( ) , Phenotypic Variance MSY y y j j n 1 j MSY MSE 2 , Squared Correlation R MSY
Simulation Experiment • Genome size = 2,500 cM • Number of markers = 120,000 • Marker interval = 0.02 cM • Cross validation (MSE) • Design I = 20 QTL • Design II = Clustered polygenic model • Design III = Polygenic model • Design IV = Design I with 2,500 x100 cM
True QTL Effect 8 6 true values 4 Effect 2 0 -2 -4 -6 0 500 1000 1500 2000 2500 Position (cM)
(a) Δ = 1cM (g) Δ = 100cM 6 6 m = 2400 m = 24 Effect p = 50 p = 5000 2 2 Estimated Bin Effects -2 -2 -6 -6 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 (b) Δ = 2cM (h) Δ = 150cM 6 6 m = 1200 m = 16 Effect p = 100 p = 7500 2 2 -2 -2 -6 -6 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 (c) Δ = 5cM (i) Δ = 300cM 6 6 m = 480 m = 8 Effect p = 250 p = 15000 2 2 -2 -2 -6 -6 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 (d) Δ = 10cM (j) Δ = 600cM 6 6 m = 240 m = 4 Effect p = 500 p = 30000 2 2 -2 -2 -6 -6 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 (e) Δ = 20cM (k) Δ = 1200cM 6 m = 120 10 m = 2 Effect p = 1000 p = 60000 2 5 -2 0 -5 -6 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 (f) Δ = 40cM (l) Δ = 2400cM 6 10 m = 60 m = 1 Effect p = 2000 p = 120000 2 5 -2 0 -5 -6 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 Position (cM) Position (cM)
True and Estimated QTL Effect 8 6 true values 4 Effect 2 0 -2 -4 -6 0 500 1000 1500 2000 2500 Position (cM) 8 Δ = 20cM 6 m = 120 4 p = 1000 Effect 2 0 -2 -4 -6 0 500 1000 1500 2000 2500 Position (cM)
140 2 =10, h 2 =0.638 2 =50, h 2 =0.457 (a) (c) Mean squared error 90 80 120 70 60 100 50 90 40 0 20 40 60 80 100 0 20 40 60 80 100 (b) 2 =20, h 2 =0.581 (d) 2 =100, h 2 =0.337 220 Mean squared error 90 100 n=200 n=300 200 n=400 n=500 n=1000 80 180 70 160 60 140 50 0 20 40 60 80 100 0 20 40 60 80 100 Bin size (cM) Bin size (cM) Figure 1 . Mean squared error expressed as a function of bin size for Design I. The mean squared errors were obtained from 100 replicated simulations. The overall proportion of the phenotypic variance contributed by the 20 simulated QTL was calculated using 2 2 64.41/ (64.41 26.53 ) h . Each panel contains the result of five different sample sizes ( n ). The phenotypic variance of the simulated trait is indicated by the light horizontal line in each panel (each panel represents one of the four different scenarios).
Recommend
More recommend