Overview Implementation of robust methods for locating quantitative trait loci in R • Introduction to QTL mapping Andreas Baierl and Andreas Futschik • Analysis of QTL data – modified BIC Institute of Statistics and Decision Support Systems University of Vienna – Robust methods • Implementation and Simulations in R Robust Methods for QTL Mapping in R Andreas Baierl 1 Robust Methods for QTL Mapping in R Andreas Baierl 2 Locating quantitative trait loci (QTL) Background • A gene can obtains different forms (alleles) Quantitative trait: evolution occurred in small steps • contribution of genetic effects to total (phenotypic) variation of a trait characters, that are influenced by many genes (heritability) determines rate at which characters respond to selection. Many relevant traits are quantitative: height, yield, ... (environmental variance reduces efficiency of response) Quantitative trait locus (QTL): trait value = genetic influence + environmental influence gene (functional sequence of bases) that influences a certain quantitative trait • partitioning genotypic variance into components with different impact on selection: additive, non-additive gene effects (epistasis) Relevant questions: -> dependency on background population - How many genes influence a trait (How many QTL) evolutionary reason: stabilization of phenotype - Find exact positions of QTL (- estimate size of genetic effects) phenotype: the form taken by some character in a specific individual. genotype: genetic makeup of individual Robust Methods for QTL Mapping in R Andreas Baierl 3 Robust Methods for QTL Mapping in R Andreas Baierl 4
Data from experimental crosses Data matrix for backcross design A A a a Indiv. QT marker.1 marker.2 ... marker.m ~ 50-500 markers F0 BACKCROSS 1 34.3 AA Aa ... AA 2 65.4 Aa AA ... * F1 3 23.2 Aa * ... Aa 4 45.4 AA AA ... Aa ... .... ... ... ... ... F2 INTERCROSS ~ 200 – 1000 individuals Robust Methods for QTL Mapping in R Andreas Baierl 5 Robust Methods for QTL Mapping in R Andreas Baierl 6 Genetic map Analysis of QTL data Genetic map Distance between markers is Find NUMBER, POSITIONS, EFFECT TYPES and SIZES of QTL 0 usually estimated from recombination frequency Challenges: 20 • large number of possible models If marker is close to QTL, then (main effects + interactions = m + m(m-1)/2 ~ 100 + 5.000) Location (cM) marker genotype will be -> efficient search strategy 40 associated with QTL genotype -> correct for test multiplicity (There would be a 1-1 • deviation from normality of conditional distribution of trait given marker 60 correspondence, if there were genotypes (especially when heavy tails or outliers) no recombinations) 80 • recover unobserved / wrong / missing genotype information No linkage between • confounding of effect types 100 chromosomes • selection bias for effect sizes, especially for small effects 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Chromosome Robust Methods for QTL Mapping in R Andreas Baierl 7 Robust Methods for QTL Mapping in R Andreas Baierl 8
Methods for QTL mapping Multiple regression approach marker based ANOVA on single markers multiple regression X ij : genotype of the i th individual (out of n ) at the j th marker (out of m ). X ij = ½ if individual has genotype AA (homozygous) univariate multiple X ij = -½ if individual has genotype Aa (heterozygous) I: subset of the set N = {1,...,m} marker - interval mapping - conditional interval mapping strict Bayesian U : subset of N x N - composite interval - multiple interval mapping approach mapping - Bayesian (Sen & Churchill) ε i : random error term with distribution f estimation of QTL location Robust Methods for QTL Mapping in R Andreas Baierl 9 Robust Methods for QTL Mapping in R Andreas Baierl 10 Model selection Behaviour of BIC depending on n & # of predictors 0.30 aim: identify correct model, not minimise prediction error number of predictors 0.25 -> criterion for inclusion and exclusion of variables 1 5 0.20 10 type 1 error und M0 BIC Mi : BIC of 1-dimensional • cross validation / bootstrap 20 30 Model M i • AIC: n log (RSS) + 2k/n minimises prediction error 100 0.15 N: Number of 1-dim models • BIC: n log (RSS) + k log(n) more conservative than AIC, n: sample size especially for small n 0.10 n : sample size BIC chooses too many QTL k = p + q = number of main effects ( p ) and interaction effects ( q ) under consideration 0.05 RSS: residual sum of squares (assuming normal error distribution !) every model has the same probability to be selected 0.00 -> efficient search strategy -> more likely to select forward selection + backward elimination step 0 2000 4000 6000 8000 10000 large model. n Robust Methods for QTL Mapping in R Andreas Baierl 11 Robust Methods for QTL Mapping in R Andreas Baierl 12
modified BIC Comparison of mBIC and BIC 0.20 Additional penalty term dependent on number of predictors under consideration (Bogdan et al 2004) 0.15 5 predictors (+10 two-way interaction terms) modified BIC type 1 error under M0 BIC 0.10 mBIC with 0.05 E(p): expected number of main effects E(q): expected number of epistasis (=interaction) effects 0.00 E(p) = E(q) = 2.2 controls the Type I error at a level of of 5% (for n = 200) 0 1000 2000 3000 4000 5000 n Robust Methods for QTL Mapping in R Andreas Baierl 13 Robust Methods for QTL Mapping in R Andreas Baierl 14 Deviations from Normality Robust model selection criterion • Typically, non-parametric methods based on ranks are used • Here we use robust regression techniques, in particular M-Estimators: minimise other measure of distance instead of residual sum of squares. popular alternatives are: still consistent under quite general conditions on the error distribution rho.huber, k=0.05 rho.huber, k=1.3 (Martin, 1980) but performance of BIC * ρ depends on ρ and error distribution: Jure č kova and Sen (1996) derived limiting distribution for -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 rho.bisquare rho.hampel -6 -4 -2 0 2 4 6 -6 -4 -2 0 2 4 6 Robust Methods for QTL Mapping in R Andreas Baierl 15 Robust Methods for QTL Mapping in R Andreas Baierl 16
Limiting Distribution Values for normalisation constant c e We showed that has the following property: with and error distribution f(x) for L 2 c e = 1 Robust Methods for QTL Mapping in R Andreas Baierl 17 Robust Methods for QTL Mapping in R Andreas Baierl 18 Robust mBIC Simulation Setup In practice, c e and therefore the error distribution f(x) have to be estimated. 2 chromosomes with 11 marker each (m=22) 200 individuals (n=200) This leads to a robust version of the mBIC: 1 additive effect 1 epistasis effect error distributions: Normal, Laplace, Cauchy, Tukey, χ 2 with estimators: L 2 , Huber (k=0.05) ~ L 1 , Huber (k=1.3), Bisquare, Hampel Robust Methods for QTL Mapping in R Andreas Baierl 19 Robust Methods for QTL Mapping in R Andreas Baierl 20
Simulation Results Implementation in R Percentage correctly identified effects and false discovery rate • Robust regression using procedure rlm of package MASS Huber-0.05 • program structure: 0.6 Huber-1.3 – parameter specification Bisquare Hampel – generate realisation of genetic setup L2-mBIC Percentage – estimation of error distribution and c e L2-BIC 0.4 – in each forward step: estimate likelihood for m + m(m-1)/2 models – generate output 0.2 • simulations: – 1000 replications – n=200-500, m=20-120 0.0 Normal Laplace Cauchy Tukey Chisq Chisq.med Robust Methods for QTL Mapping in R Andreas Baierl 21 Robust Methods for QTL Mapping in R Andreas Baierl 22 References • Baierl, A., Bogdan, M., Frommlet, F., Futschik, A., 2006. On Locating multiple interacting quantitative trait loci in intercross designs. To appear in Genetics. • Bogdan, M., J. K. Ghosh and R. W. Doerge, 2004. Modifying the Schwarz Bayesian Information Criterion to Locate Multiple Interacting Quantitative Trait Loci. Genetics, 1 6 7 : 989-999. • Broman, K. W. and T. P. Speed, 2002. A model selection approach for the identification of quantitative trait loci in experimental crosses. J Roy Stat Soc B, 6 4 : 641-656. • Jureckova, J., Sen, P.K., 1996. Robust statistical procedures: asymptoticsand interrelations. Wiley, New York. • Sen and Churchill (2001), A Statistical framework for quantitative trait mapping, Genetics, 1 5 9 :371-387. Robust Methods for QTL Mapping in R Andreas Baierl 23
Recommend
More recommend