genomics transcriptomics and proteomics in clinical
play

Genomics, Transcriptomics and Proteomics in Clinical Research - PowerPoint PPT Presentation

Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing Functional Diagnostics Discovery of Therapeutic signatures Targets Genomic Data single biomarkers candidate targets Prognostic Factor Studies


  1. Genomics, Transcriptomics and Proteomics in Clinical Research Statistical Learning for Analyzing Functional Diagnostics Discovery of Therapeutic signatures Targets Genomic Data single biomarkers candidate targets Prognostic Factor Studies Insight in Pharmacological Axel Benner response to treatment Mechanisms toxicity pathway analysis German Cancer Research Center, Heidelberg, Germany survival Custom Drug Selection June 16, 2006 predictive factors for response/ resistance to certain therapy indicators of adverse events Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data Explanation vs. Prediction Large scale problems New biomolecular techniques: Target: Explanation Number of input variables (genes, clones, etc.): 1000s to Implies that there is some likelihood of a ”true” model 10,000s Model selection: few input variables are relevant Number of observations: 10s to 100s Occam’s razor: ’do not make more assumptions than needed’ → number of observations << number of input variables → more unknown parameters than estimation equations Target: Prediction → infinitely many solutions Statistical learning Model selection: quality of prediction Models can be fit perfectly to the data Topic: Large scale problems → no bias but high variance Use statistical learning methods to handle these problems! Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data

  2. Statistical Learning Penalized maximum likelihood estimation Maximizing the log likelihood can result in fitting noise in the Control of Model Complexity data. Restriction methods A shrinkage approach will often result in estimates of the the class of functions of the input vectors is limited regression coefficients that, while biased, are lower in mean Selection methods squared error and are more close to the true parameters. constitute methods, which include only those basis functions of A good approach to shrinkage is penalized maximum the input vectors that contribute ‘significantly’ to the fit of the likelihood estimation (le Cessie & van Houwelingen, 1990). model A general form of penalized log likelihood is examples are variable selection methods, stepwise greedy approaches like boosting n d Regularization methods � � logL ( y i ; g ( x T i β )) − p λ ( | β j | ) restrict the coefficients of the model, e.g. ridge regression i =1 j =1 From the log-likelihood a so-called ‘penalty’ is subtracted, that discourages regression coefficients to become large. Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data Penalty functions Penalty functions Well-known penalty functions are L q -norm penalties: A good penalty function should result in a estimator with the p λ ( | θ | ) = λ | θ | q following three properties (Fan & Li, 2001): Unbiasedness: The resulting estimator is nearly unbiased when L 2 (Ridge regression) with thresholding rule the true unknown parameter is large to avoid excessive 1 estimation bias ˆ θ ( z ) = 1 + λ z Sparsity: Estimating a small coefficient as zero, to reduce → continuous, but biased and no sparse solutions model complexity L 1 (LASSO) with thresholding rule Continuity: The resulting estimator is continuous in the data to avoid instability in model prediction ˆ θ ( z ) = sgn ( z )( | z | − λ ) + → continuous and sparse, but no unbiased solutions Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data

  3. Penalty functions Penalty functions Convex penalties (e.g. quadratic penalties) Related approaches make trade-offs between bias and variance Bridge regression (Frank & Friedman, 1993) which minimizes can create unnecessary biases when the true parameters are j β j x ij ) 2 subject to � d j =1 | β j | γ ≤ t with γ ≥ 0. � ( y i − β 0 − � large parsimonious models cannot be produced Nonconcave penalities Nonnegative garotte (Breiman, 1995), which minimizes j c j β j x ij ) 2 under the constraint � c j ≤ s select variables and estimate coefficients of variables � ( y i − β 0 − � simultaneously where { ˆ β j } are the full-model OLS coefficients. e.g. hard thresholding penalty (HARD, Antoniadis 1997) p λ ( | θ | ) = λ 2 − ( | θ | − λ ) 2 I ( | θ | < λ ) Elastic net (Zou & Hastie, 2005), where the penalty is a convex combination of the lasso and ridge penalty. with thresholding rule Relaxed Lasso (Meinshausen, 2005). ˆ θ = z · I ( | z | > λ ) Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data SCAD penalty Selected penalty and thresholding functions HARD Penalty LASSO Penalty SCAD Penalty 3.0 3.0 3.0 Smoothly Clipped Absolute Deviation (SCAD; Fan, 1997) 2.5 2.5 2.5 satisfies all three requirements (unbiasedness, sparsity, 2.0 2.0 2.0 continuity) 1.5 1.5 1.5 is defined by 1.0 1.0 1.0 0.5 0.5 0.5 � I ( | θ | ≤ λ ) + ( a λ − | θ | ) + � 0.0 0.0 0.0 p ′ λ ( | θ | ) = λ ( a − 1) λ I ( | θ | > λ ) , a > 2 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 λ = 1.5 λ = 1.5 λ = 1 10 10 10 with thresholding rule 5 5 5 0 0 0  sgn ( z )( | z | − λ ) + , | z | ≤ 2 λ  ˆ θ ( z ) = { ( a − 1) z − sgn ( z ) a λ } / ( a − 2) , 2 λ < | z | ≤ a λ −5 −5 −5 z , | z | > a λ  −10 −10 −10 −10 −5 0 5 10 −10 −5 0 5 10 −10 −5 0 5 10 z z z Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data

  4. SCAD Penalty Penalized proportional hazards regression Penalized partial likelihood SCAD improves the LASSO via reducing estimation bias. d � l ( β ) − p λ ( | β j | ) → max β SCAD possesses an oracle property: j =1 the true regression coefficients that are zero are automatically with estimated as zero, and the remaining coefficients are estimated N � � [ x T exp( x T ( k ) β − log { i β ) } ] . l ( β ) = as well as if the correct submodel were known in advance. k =1 i ∈ R k where n = number of observations, Hence, SCAD is an ideal procedure for variable selection, at N = number of events, least from theoretical point of view. R k = risk set for event k , k = 1 , ..., N . Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data SCAD Regression SCAD Regression: Local quadratic approximation for p λ ( β ) λ β SCAD Regression (Fan & Li, 2002) Fan & Li, 2002 Fan & Li, 2002 3 Use ’LQA’, local quadratic approximation for β close to β 0 , 2 l ( β 0 )+ ∇ l ( β 0 ) T ( β − β 0 )+ 1 2 ( β − β 0 ) T ∇ 2 l ( β 0 )( β − β 0 ) − n 1 2 β T Σ λ ( β 0 ) β penalty with Σ λ ( β 0 ) = diag { p ′ λ ( | β 10 | ) / | β 10 | , ..., p ′ λ ( | β d 0 | ) / | β d 0 |} 1 Solve quadratic maximization problem by Newton-Raphson algorithm 0 β 1 = β 0 − [ ∇ 2 l ( β 0 ) − n Σ λ ( β 0 )] − 1 [ ∇ l ( β 0 ) − n Σ λ ( β 0 ) β 0 ] -4 -2 0 2 4 β Estimate covariance matrix by sandwich formula β ≈ β + ′ β β β − β β ≈ β cov (ˆ β 1 ) = [ ∇ 2 l (ˆ β 1 ) − n Σ λ (ˆ β 1 )] − 1 cov ( ∇ l (ˆ β 1 ))[ ∇ 2 l (ˆ β 1 ) − n Σ λ (ˆ β 1 )] − 1 ( β 2 j − β 2 � p ′ � p λ ( | β j | ) ≈ p λ ( | β j 0 | ) + 1 / 2 λ ( | β j 0 | ) / | β j 0 | j 0 ) for β j ≈ β j 0 λ λ λ Axel Benner Statistical Learning for Analyzing Functional Genomic Data Axel Benner Statistical Learning for Analyzing Functional Genomic Data

Recommend


More recommend