A Bayesian Method for Partially Paired High Dimensional Data Fei Liu, Feng Liang, Woncheol Jang Institute of Statistics and Decision Sciences Duke University SAMSI, Summer 2006
Outline ◮ Bayesian methods have been developed for paired high dimensional data such as gene expression data. ◮ For partially paired data, however, excluding those unpaired observations for the analysis may lead to significant information loss. ◮ Using test statistics with FDR control is a possible solution. ◮ We provides a generalized Bayesian method for partially paired high dimensional data.
Statistical Model ◮ The data for j th gene are arranged as: X ∗ 1 j , . . . , X ∗ X 1 j , . . . , X nj ; n 1 j ; Y ∗ 1 j . . . , Y ∗ Y 1 j , . . . , Y nj ; n 2 j . ( X ij , Y ij ) : paired gene expressions. X ∗ ij , Y ∗ ij : unpaired observations. ◮ ( X ij , Y ij ) T ∼ N ( µ j , Σ j ) σ 2 � � � � µ j ρ j σ 1 σ 2 1 µ j = , Σ j = . σ 2 µ j + δ j ρ j σ 1 σ 2 2 ◮ For the incomplete data X ∗ ij ∼ N ( µ j , σ 2 1 ) , Y ∗ ij ∼ N ( µ j + δ j , σ 2 2 )
Review the FDR Control Accept Reject Total U V (False positive) m 0 True Null T (False negative) S m − m 0 Untrue Null W R m Total FDR = E ( V / R | R � = 0 ) Benjamini and Hochberg(1995) procedure to control FDR at q ∗ : � H 1 � � H ( 1 ) � H 2 . . . H m H ( 2 ) . . . H ( m ) = ⇒ p 1 p 2 . . . p m p ( 1 ) p ( 2 ) . . . p ( m ) m q ∗ , for all j ≤ i ) , Reject H ( 1 ):( k ) . j k = max ( i , p ( j ) ≤
Test Statistics for Partially Paired Data in One Dimensional Space ◮ Lin and Stivers (1974) use the test statistic when n is small and | ρ |≤ 0 . 5: ( n + n 1 ) − ¯ ( n + n 2 ) ¯ x 1 x 2 T = r q 1 1 2 nr ( a ∗ n + n 1 + n + n 2 − 11 + b 22 ) / ( N − 2 ) ( n + n 1 )( n + n 2 ) where T ∼ t N − 4 approximately, and N = n + n 1 + n 2 . ◮ Another test statistic is given by the mixed effect model: z ik = µ + α i + β k + ǫ ik , where β k ∼ N ( 0 , σ 2 β ) , ǫ ij ∼ N ( 0 , σ 2 ǫ ) Perform ANOVA to test the fixed effect α i = 0.
Scott and Berger (2003) Noticing the built-in penalty (“Ockham’s razor effect”) of the Bayesian method, Scott and Berger (2003) propose A Bayesian Hierarchical model for multiple comparisons, Observe x = ( x 1 , . . . , x M ) : N ( µ j , σ 2 ) ∼ x j γ j = 1 − δ ( µ j = 0 ) M � − ( x j − γ j µ j ) 2 1 � � f ( x | σ 2 , γ , µ ) √ = 2 πσ 2 exp 2 σ 2 j = 1 µ j ∼ N ( 0 , V ) π ( V , σ 2 ) ( V + σ 2 ) − 2 ∝ γ j ∼ Bernoulli ( p ) π ( p ) ∼ Beta ( α, β )
EBarrays Method Kendziorski et. al (2004) propose Parametric Empirical Bayes Method to account for replicating arrays (multiple conditions as well). Observe x = ( x 1 , . . . , x J ) , where x j = ( x j 1 , x j 2 , . . . , x jI ) ◮ If gene j is not differentially expressed ( δ j = 0), I � � f 0 ( x j ) = ( f obs ( x ji | µ )) π ( µ ) d µ i = 1 ◮ If gene j is differentially expressed ( δ j � = 0), f 1 ( x j ) = f 0 ( x j 1 ) f 0 ( x j 2 ) ◮ Data is marginally distributed: pf 1 ( x j ) + ( 1 − p ) f 0 ( x j ) ◮ By Bayes’ rule, posterior probability of δ j � = 0 is pf 1 ( x j ) pf 1 ( x j ) + ( 1 − p ) f 0 ( x j )
Mixture Prior ◮ Our primary interest is: H 0 : δ j = 0 ◮ We propose a mixture distribution for δ j , i.e., π ( δ j | p , τ 2 ) = p φ ( δ j /τ ) + ( 1 − p ) I { 0 } ( δ j ) , p : probability of being differentially expressed. γ j : Latent variables. Set to 1 if the j th gene is differentially expressed; otherwise 0. Interest P ( γ j = 1 | Data ) .
Priors and Posteriors ◮ Priors distributions for ( µ , σ 2 , p , τ 2 ) are: π ( µ j ) ∝ 1 � − 2 1 + τ 2 � 1 π ( τ 2 | σ 2 ) ∝ σ 2 σ 2 p α − 1 ( 1 − p ) β − 1 ≡ Beta ( α, β ) π ( p ) ∝ ◮ Improper prior distributions for ρ 1 1 π 1 ( ρ j ) ∝ j ) , π 2 ( ρ j ) ∝ ( 1 − ρ 2 ( 1 − ρ 2 j ) 2 π 1 and π 2 are both can be shown to have proper posteriors. Bayarri(1981) shows that π 1 avoids the “Jeffrey-Lindley” paradox.
Gibbs Sampling Θ = ( µ , δ , γ , ρ , σ 2 , τ 2 , p ) , Data = ( x , y , x ∗ , y ∗ ) Closed forms for sampling µ , δ , γ , σ 2 : N ( m ( µ ) , σ ( µ ) ( µ j | Θ − µ , Data ) ∼ ) j j Bernoulli ( p ( γ ) ( γ j | Θ − ( γ ∪ δ ) , Data ) ∼ ) j N ( m ( δ ) , σ ( δ ) ( δ j | Θ − δ , Data ) ∼ ) j j 0 1 J J X X ( p | Θ − p , Data ) ∼ Beta @ α + γ j , β + J − γ j A j = 1 j = 1 „ „ n 1 + n 2 « « ( σ 2 | Θ − σ 2 , Data ) ∼ IG J n + , η 2 No closed forms for ( τ 2 , ρ ) .
Simulation Study with Normal Distributions Simulate the data with 1000 genes, 5 paired, 2 unpaired control, 2 unpaired treatment, and ρ = 0 . 1 , τ 2 = 100 , σ 2 = 1 . 0 , p = 0 . 01 . False Positive False Negative FDR - T test 0/9 2/991 FDR - random effect 1/11 1/989 Bayesian Model 0/10 1/990 Histogram of p, true 0.01 140 120 100 80 density 60 40 20 0 0.005 0.010 0.015 0.020 0.025 p Figure: Posterior distribution of p (true is 0.01)
Simulation Study with Normal Distributions (Cont...) Delta_ 325 Delta_ 666 Delta_ 84 True = −12.68011;P = 1 True = −3.878074;P = 0.081 True = 0;P = 0 100 0.8 15 80 0.6 60 10 density density density 0.4 40 5 0.2 20 0.0 0 0 −13.5 −13.0 −12.5 −12.0 −11.5 −11.0 −10.5 −4 −3 −2 −1 0 0.0 0.2 0.4 0.6 0.8 1.0 delta delta delta Figure: Posterior distribution for δ ’s
Simulation Study with t Distributions Simulate the data with 1000 genes, 9 samples (5 pairs, 2 unpaired control, 2 unpaired treatment) � 0 . 1 � 1 = Bivariate T 4 with mean 0 and Σ = Data 1 0 . 1 + µ + δ ∼ U ( − 0 . 01 , 0 . 01 ) µ N ( 0 , τ 2 = 100 ); δ i | δ i � = 0 ∼ P ( δ i � = 0 ) = 0 . 01 False Positive False Negative FDR - T test 1/7 3/993 FDR - random effect 6/13 2/987 Bayesian Model 6/13 2/987
Simuation Study with t Distributions (Cont...) Histogram of p, true 0.01 100 80 60 density 40 20 0 0.005 0.010 0.015 0.020 0.025 0.030 0.035 p Figure: Posterior distribution for p
Simulation Study with t Distributions (Cont...) Delta_ 6 Delta_ 390 Delta_ 401 True = −8.308207;P = 1 True = −4.34904;P = 0.855 True = 0;P = 0.178 3.0 1.0 15 2.5 0.8 2.0 10 0.6 density density 1.5 density 0.4 1.0 5 0.2 0.5 0.0 0.0 0 −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 −1 0 1 2 delta delta delta Figure: Posterior distribution for δ ’s
Future work ◮ Apply the method to gene expression data. ◮ Use different p to achieve different thresholding. ◮ EBarrays method with random effects e k . Observe X jk = ( X jk1 , X jk2 ) for gene j and sample k . - If gene j is not differentially expressed, �� � � � � f 0 ( X jk ) = f obs ( X jki | µ + e k ) π ( µ ) π ( e k ) d e k d µ i k - If gene j is differentially expressed, �� � � � � f 0 ( X jk ) = f obs ( X jki | µ i + e k ) π ( µ i ) π ( e k ) d e k d µ i i k
Recommend
More recommend