fast estimation of posterior change point probabilities
play

Fast estimation of posterior change-point probabilities for CNV data - PowerPoint PPT Presentation

Fast estimation of posterior change-point probabilities for CNV data The Minh Luong, Yves Rozenholc, Gregory Nuel, MAP5, Universit e Paris Descartes July 5, 2012 Luong et al, MAP5 Fast estimation of posterior change-point probabilities


  1. Fast estimation of posterior change-point probabilities for CNV data The Minh Luong, Yves Rozenholc, Gregory Nuel, MAP5, Universit´ e Paris Descartes July 5, 2012 Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  2. Introduction Change-point methods: applications in econometrics, engineering, network security, signal processing, music classification, bioinformatics e.g. copy number variation (CNV), to identify regions where DNA mutations are related to disease susceptibility High-resolution data, 10’s thousands of clones per chromosome Array comparative genomic hybridization (aCGH) Single nucleotide polymorphism (SNP) array array CGH profile, source: Redon and Carter, Methods Mol Biol. 2009; 529: 37-49. Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  3. Examples of R packages for change-point analysis Unsupervised hidden Markov model (HMM) approaches Willenbrock and Fridyland (2005) - aCGH package Marioni et al (2006) - snapCGH package Non-HMM segmentation approaches Venkatraman and Olshen (2004) - DNAcopy package Hup´ e et al (2004) - GLAD package Likelihood-based approaches - penalization criteria Picard et al (2005) - cghseg package Change-point uncertainty (MCMC) Erdman et al (2008) - bcp package Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  4. Motivation Few exact non-MCMC methods for assessing uncertainty of change-point estimates Methods for finding exact posterior probabilities of change-points: O ( n 2 ) complexity frequentist - Gu´ edon (2007) Bayesian - Rigaill (2011) High-resolution data in genomics technologies ( > 10 , 000 observations per chromosome): Smaller inter-segmental differences: characterize uncertainty More data: need efficient estimates O ( n 2 ) not feasible Next-generation sequencing: need methods adaptable to non-normal data Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  5. Segmentation approach to change-point detection Dataset: X = ( X 1 , X 2 , . . . , X n ): real-valued observations. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Hidden state space: ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● S = ( S 1 , S 2 , . . . , S n ): ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● corresponding segment indices. Y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Distribution: ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● S: ● 1 2 3 ● ● 4 5 ● ● ● P ( X i | S i = k , θ k ) ∼ g θ k ( · ): X i ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● belongs to segment k . ● Problem of interest: Find P ( S i | X ; θ ) =?, when segments Figure: Segment-based unknown given data change-point detection (K=5) Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  6. Constrained hidden Markov model for segmentation Use of HMM algorithms to estimate posterior probabilities with linear complexity S : Markov chain over { 1 , 2 , . . . , K , K + 1 } , M K : set of possible S { S ∈ M K } : K states in n observations Constraints on HMM correspond exactly to a segmentation change-point model. Find best partitioning S ∈ M K into K non-overlapping intervals, distribution homogeneous within each segment S 1 = 1 , S n = K , junk state: K + 1 Allow for transitions of only 0 or +1, S i − S i − 1 ∈ { 0 , 1 } . P ( S i = k + 1 | S i − 1 = k ) = η k ( i ) P ( S i = k | S i − 1 = k ) = 1 − η k ( i ) Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  7. Adapted forward-backward algorithm Forward and backward quantities, for observation i and state k : F i ( k ) = P ( X 1: i = x 1: i , S i = k ) B i ( k ) = P ( X i +1: n = x i +1: n , S n = K | S i = k ) Initialization: F 1 (1) = g θ 1 ( x 1 ) B 1 ( K − 1) = η K ( x n ) g θ k ( x n ) , B 1 ( K ) = (1 − η K ( x n )) g θ k ( x n ) Recursion: F i ( k ) = [ F i − 1 ( k )(1 − η k ( i )) + 1 k > 1 F i − 1 ( k − 1) η k ( i )] g θ k ( x i ) B i − 1 ( k ) = (1 − η k ( i )) g θ k ( x i ) B i ( k ) + 1 k < K η k +1 ( i ) g θ k +1 ( x i ) B i ( k + 1) Luong et al, MAP5 Fast estimation of posterior change-point probabilities

  8. Posterior probabilities from forward-backward algorithm Posterior probability of state k for observation i P ( S i = k | X 1: n = x 1: n ) = F i ( k ) B i ( k ) F 1 (1) B 1 (1) . Posterior probability of obs i being the k th change-point P ( CP k = i | X 1: n = x 1: n ) = P ( S i = k , S i +1 = k + 1 | X 1: n = x 1: n ) = F i ( k ) η k ( i ) g θ k +1 ( x k +1 ) B i +1 ( k + 1) F 1 (1) B 1 (1) Posterior transition probability from k − 1 th to k th state P ( S i = k | S i − 1 = k − 1 , X 1: n = x 1: n ) = η k − 1 ( i − 1) g θ k ( x i ) B i ( k ) . B i − 1 ( k − 1) Luong et al, MAP5 Fast estimation of posterior change-point probabilities

Recommend


More recommend