a bayesian method for partially paired high dimensional
play

A Bayesian Method for Partially Paired High Dimensional Data Fei - PowerPoint PPT Presentation

A Bayesian Method for Partially Paired High Dimensional Data Fei Liu, Feng Liang, Woncheol Jang Institute of Statistics and Decision Sciences Duke University SAMSI, Summer 2006 Outline Bayesian methods have been developed for paired high


  1. A Bayesian Method for Partially Paired High Dimensional Data Fei Liu, Feng Liang, Woncheol Jang Institute of Statistics and Decision Sciences Duke University SAMSI, Summer 2006

  2. Outline ◮ Bayesian methods have been developed for paired high dimensional data such as gene expression data. ◮ For partially paired data, however, excluding those unpaired observations for the analysis may lead to significant information loss. ◮ Using test statistics with FDR control is a possible solution. ◮ We provides a generalized Bayesian method for partially paired high dimensional data.

  3. Statistical Model ◮ The data for j th gene are arranged as: X ∗ 1 j , . . . , X ∗ X 1 j , . . . , X nj ; n 1 j ; Y ∗ 1 j . . . , Y ∗ Y 1 j , . . . , Y nj ; n 2 j . ( X ij , Y ij ) : paired gene expressions. X ∗ ij , Y ∗ ij : unpaired observations. ◮ ( X ij , Y ij ) T ∼ N ( µ j , Σ j ) σ 2 � � � � µ j ρ j σ 1 σ 2 1 µ j = , Σ j = . σ 2 µ j + δ j ρ j σ 1 σ 2 2 ◮ For the incomplete data X ∗ ij ∼ N ( µ j , σ 2 1 ) , Y ∗ ij ∼ N ( µ j + δ j , σ 2 2 )

  4. Review the FDR Control Accept Reject Total U V (False positive) m 0 True Null T (False negative) S m − m 0 Untrue Null W R m Total FDR = E ( V / R | R � = 0 ) Benjamini and Hochberg(1995) procedure to control FDR at q ∗ : � H 1 � � H ( 1 ) � H 2 . . . H m H ( 2 ) . . . H ( m ) = ⇒ p 1 p 2 . . . p m p ( 1 ) p ( 2 ) . . . p ( m ) m q ∗ , for all j ≤ i ) , Reject H ( 1 ):( k ) . j k = max ( i , p ( j ) ≤

  5. Test Statistics for Partially Paired Data in One Dimensional Space ◮ Lin and Stivers (1974) use the test statistic when n is small and | ρ |≤ 0 . 5: ( n + n 1 ) − ¯ ( n + n 2 ) ¯ x 1 x 2 T = r q 1 1 2 nr ( a ∗ n + n 1 + n + n 2 − 11 + b 22 ) / ( N − 2 ) ( n + n 1 )( n + n 2 ) where T ∼ t N − 4 approximately, and N = n + n 1 + n 2 . ◮ Another test statistic is given by the mixed effect model: z ik = µ + α i + β k + ǫ ik , where β k ∼ N ( 0 , σ 2 β ) , ǫ ij ∼ N ( 0 , σ 2 ǫ ) Perform ANOVA to test the fixed effect α i = 0.

  6. Scott and Berger (2003) Noticing the built-in penalty (“Ockham’s razor effect”) of the Bayesian method, Scott and Berger (2003) propose A Bayesian Hierarchical model for multiple comparisons, Observe x = ( x 1 , . . . , x M ) : N ( µ j , σ 2 ) ∼ x j γ j = 1 − δ ( µ j = 0 ) M � − ( x j − γ j µ j ) 2 1 � � f ( x | σ 2 , γ , µ ) √ = 2 πσ 2 exp 2 σ 2 j = 1 µ j ∼ N ( 0 , V ) π ( V , σ 2 ) ( V + σ 2 ) − 2 ∝ γ j ∼ Bernoulli ( p ) π ( p ) ∼ Beta ( α, β )

  7. EBarrays Method Kendziorski et. al (2004) propose Parametric Empirical Bayes Method to account for replicating arrays (multiple conditions as well). Observe x = ( x 1 , . . . , x J ) , where x j = ( x j 1 , x j 2 , . . . , x jI ) ◮ If gene j is not differentially expressed ( δ j = 0), I � � f 0 ( x j ) = ( f obs ( x ji | µ )) π ( µ ) d µ i = 1 ◮ If gene j is differentially expressed ( δ j � = 0), f 1 ( x j ) = f 0 ( x j 1 ) f 0 ( x j 2 ) ◮ Data is marginally distributed: pf 1 ( x j ) + ( 1 − p ) f 0 ( x j ) ◮ By Bayes’ rule, posterior probability of δ j � = 0 is pf 1 ( x j ) pf 1 ( x j ) + ( 1 − p ) f 0 ( x j )

  8. Mixture Prior ◮ Our primary interest is: H 0 : δ j = 0 ◮ We propose a mixture distribution for δ j , i.e., π ( δ j | p , τ 2 ) = p φ ( δ j /τ ) + ( 1 − p ) I { 0 } ( δ j ) , p : probability of being differentially expressed. γ j : Latent variables. Set to 1 if the j th gene is differentially expressed; otherwise 0. Interest P ( γ j = 1 | Data ) .

  9. Priors and Posteriors ◮ Priors distributions for ( µ , σ 2 , p , τ 2 ) are: π ( µ j ) ∝ 1 � − 2 1 + τ 2 � 1 π ( τ 2 | σ 2 ) ∝ σ 2 σ 2 p α − 1 ( 1 − p ) β − 1 ≡ Beta ( α, β ) π ( p ) ∝ ◮ Improper prior distributions for ρ 1 1 π 1 ( ρ j ) ∝ j ) , π 2 ( ρ j ) ∝ ( 1 − ρ 2 ( 1 − ρ 2 j ) 2 π 1 and π 2 are both can be shown to have proper posteriors. Bayarri(1981) shows that π 1 avoids the “Jeffrey-Lindley” paradox.

  10. Gibbs Sampling Θ = ( µ , δ , γ , ρ , σ 2 , τ 2 , p ) , Data = ( x , y , x ∗ , y ∗ ) Closed forms for sampling µ , δ , γ , σ 2 : N ( m ( µ ) , σ ( µ ) ( µ j | Θ − µ , Data ) ∼ ) j j Bernoulli ( p ( γ ) ( γ j | Θ − ( γ ∪ δ ) , Data ) ∼ ) j N ( m ( δ ) , σ ( δ ) ( δ j | Θ − δ , Data ) ∼ ) j j 0 1 J J X X ( p | Θ − p , Data ) ∼ Beta @ α + γ j , β + J − γ j A j = 1 j = 1 „ „ n 1 + n 2 « « ( σ 2 | Θ − σ 2 , Data ) ∼ IG J n + , η 2 No closed forms for ( τ 2 , ρ ) .

  11. Simulation Study with Normal Distributions Simulate the data with 1000 genes, 5 paired, 2 unpaired control, 2 unpaired treatment, and ρ = 0 . 1 , τ 2 = 100 , σ 2 = 1 . 0 , p = 0 . 01 . False Positive False Negative FDR - T test 0/9 2/991 FDR - random effect 1/11 1/989 Bayesian Model 0/10 1/990 Histogram of p, true 0.01 140 120 100 80 density 60 40 20 0 0.005 0.010 0.015 0.020 0.025 p Figure: Posterior distribution of p (true is 0.01)

  12. Simulation Study with Normal Distributions (Cont...) Delta_ 325 Delta_ 666 Delta_ 84 True = −12.68011;P = 1 True = −3.878074;P = 0.081 True = 0;P = 0 100 0.8 15 80 0.6 60 10 density density density 0.4 40 5 0.2 20 0.0 0 0 −13.5 −13.0 −12.5 −12.0 −11.5 −11.0 −10.5 −4 −3 −2 −1 0 0.0 0.2 0.4 0.6 0.8 1.0 delta delta delta Figure: Posterior distribution for δ ’s

  13. Simulation Study with t Distributions Simulate the data with 1000 genes, 9 samples (5 pairs, 2 unpaired control, 2 unpaired treatment) � 0 . 1 � 1 = Bivariate T 4 with mean 0 and Σ = Data 1 0 . 1 + µ + δ ∼ U ( − 0 . 01 , 0 . 01 ) µ N ( 0 , τ 2 = 100 ); δ i | δ i � = 0 ∼ P ( δ i � = 0 ) = 0 . 01 False Positive False Negative FDR - T test 1/7 3/993 FDR - random effect 6/13 2/987 Bayesian Model 6/13 2/987

  14. Simuation Study with t Distributions (Cont...) Histogram of p, true 0.01 100 80 60 density 40 20 0 0.005 0.010 0.015 0.020 0.025 0.030 0.035 p Figure: Posterior distribution for p

  15. Simulation Study with t Distributions (Cont...) Delta_ 6 Delta_ 390 Delta_ 401 True = −8.308207;P = 1 True = −4.34904;P = 0.855 True = 0;P = 0.178 3.0 1.0 15 2.5 0.8 2.0 10 0.6 density density 1.5 density 0.4 1.0 5 0.2 0.5 0.0 0.0 0 −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 −1 0 1 2 delta delta delta Figure: Posterior distribution for δ ’s

  16. Future work ◮ Apply the method to gene expression data. ◮ Use different p to achieve different thresholding. ◮ EBarrays method with random effects e k . Observe X jk = ( X jk1 , X jk2 ) for gene j and sample k . - If gene j is not differentially expressed, �� � � � � f 0 ( X jk ) = f obs ( X jki | µ + e k ) π ( µ ) π ( e k ) d e k d µ i k - If gene j is differentially expressed, �� � � � � f 0 ( X jk ) = f obs ( X jki | µ i + e k ) π ( µ i ) π ( e k ) d e k d µ i i k

Recommend


More recommend