Background Gene Expression Analysis Replicated Microarray Analysis CSci 8980: Advanced Topics in Graphical Models Application: Gene Expression Analysis Instructor: Arindam Banerjee November 20, 2007
Background Gene Expression Analysis Replicated Microarray Analysis Microarray Technology
Background Gene Expression Analysis Replicated Microarray Analysis Microarray Data
Background Gene Expression Analysis Replicated Microarray Analysis Microarray Data T gene expression profiles under M conditions
Background Gene Expression Analysis Replicated Microarray Analysis Microarray Data T gene expression profiles under M conditions x i is the profile for the i th gene
Background Gene Expression Analysis Replicated Microarray Analysis Microarray Data T gene expression profiles under M conditions x i is the profile for the i th gene x im is the value for gene i , condition m
Background Gene Expression Analysis Replicated Microarray Analysis Microarray Data T gene expression profiles under M conditions x i is the profile for the i th gene x im is the value for gene i , condition m Each profile is assumed to be generated by one cluster
Background Gene Expression Analysis Replicated Microarray Analysis Microarray Data T gene expression profiles under M conditions x i is the profile for the i th gene x im is the value for gene i , condition m Each profile is assumed to be generated by one cluster Total Q clusters, later Q → ∞
Background Gene Expression Analysis Replicated Microarray Analysis Microarray Data T gene expression profiles under M conditions x i is the profile for the i th gene x im is the value for gene i , condition m Each profile is assumed to be generated by one cluster Total Q clusters, later Q → ∞ Clustering is an assignment ( c 1 , . . . , c T )
Background Gene Expression Analysis Replicated Microarray Analysis Microarray Data T gene expression profiles under M conditions x i is the profile for the i th gene x im is the value for gene i , condition m Each profile is assumed to be generated by one cluster Total Q clusters, later Q → ∞ Clustering is an assignment ( c 1 , . . . , c T ) Each c i ∈ { 1 , . . . , Q }
Background Gene Expression Analysis Replicated Microarray Analysis Microarray Data T gene expression profiles under M conditions x i is the profile for the i th gene x im is the value for gene i , condition m Each profile is assumed to be generated by one cluster Total Q clusters, later Q → ∞ Clustering is an assignment ( c 1 , . . . , c T ) Each c i ∈ { 1 , . . . , Q } Want posterior probability over assignments
Background Gene Expression Analysis Replicated Microarray Analysis Microarray Data T gene expression profiles under M conditions x i is the profile for the i th gene x im is the value for gene i , condition m Each profile is assumed to be generated by one cluster Total Q clusters, later Q → ∞ Clustering is an assignment ( c 1 , . . . , c T ) Each c i ∈ { 1 , . . . , Q } Want posterior probability over assignments Idea: Use a Bayesian infinite mixture model
Background Gene Expression Analysis Replicated Microarray Analysis Infinite Mixture Model for Gene Expression Level 1: Data generation p ( x i | c i = j , µ h , σ 2 h , [ h ] Q 1 ) = N ( x ; µ j , σ 2 j I )
Background Gene Expression Analysis Replicated Microarray Analysis Infinite Mixture Model for Gene Expression Level 1: Data generation p ( x i | c i = j , µ h , σ 2 h , [ h ] Q 1 ) = N ( x ; µ j , σ 2 j I ) Level 2: Priors for parameters p ( µ j | λ, r ) = N ( µ ; λ, 1 / r I ) p ( σ − 2 f G ( σ − 2 ; β/ 2 , β w / 2) | β, w ) = j
Background Gene Expression Analysis Replicated Microarray Analysis Infinite Mixture Model for Gene Expression Level 1: Data generation p ( x i | c i = j , µ h , σ 2 h , [ h ] Q 1 ) = N ( x ; µ j , σ 2 j I ) Level 2: Priors for parameters p ( µ j | λ, r ) = N ( µ ; λ, 1 / r I ) p ( σ − 2 f G ( σ − 2 ; β/ 2 , β w / 2) | β, w ) = j Level 2: Prior for clustering c i ∼ Discrete ( π )
Background Gene Expression Analysis Replicated Microarray Analysis Infinite Mixture Model for Gene Expression (Contd.) Prior for hyper-parameters p ( w | σ 2 f G ( w ; 1 / 2 , 1 / (2 σ 2 x ) = x )) p ( β ) = f G ( β ; 1 / 2 , 1 / 2) p ( r | σ 2 f G ( r ; 1 / 2 , 1 / (2 σ 2 x ) = x )) p ( λ | µ x , σ 2 f N ( λ | µ x , σ 2 x ) = x I )
Background Gene Expression Analysis Replicated Microarray Analysis Infinite Mixture Model for Gene Expression (Contd.) Prior for hyper-parameters p ( w | σ 2 f G ( w ; 1 / 2 , 1 / (2 σ 2 x ) = x )) p ( β ) = f G ( β ; 1 / 2 , 1 / 2) p ( r | σ 2 f G ( r ; 1 / 2 , 1 / (2 σ 2 x ) = x )) p ( λ | µ x , σ 2 f N ( λ | µ x , σ 2 x ) = x I ) ( µ x , σ 2 x ) are empirical mean, variance
Background Gene Expression Analysis Replicated Microarray Analysis Infinite Mixture Model for Gene Expression (Contd.) Prior for hyper-parameters p ( w | σ 2 f G ( w ; 1 / 2 , 1 / (2 σ 2 x ) = x )) p ( β ) = f G ( β ; 1 / 2 , 1 / 2) p ( r | σ 2 f G ( r ; 1 / 2 , 1 / (2 σ 2 x ) = x )) p ( λ | µ x , σ 2 f N ( λ | µ x , σ 2 x ) = x I ) ( µ x , σ 2 x ) are empirical mean, variance Priors on cluster-prior π π ∼ Dirichlet ( α/ Q )
Background Gene Expression Analysis Replicated Microarray Analysis Gibbs Sampler As Q → ∞ , we get a DPM
Background Gene Expression Analysis Replicated Microarray Analysis Gibbs Sampler As Q → ∞ , we get a DPM Gibbs sampler for the model n − i , j p ( c i = j | c − i , x i , µ j , σ 2 T − 1 + α N ( x i ; µ j , σ 2 j ) ∝ j I ) α � p ( c i � = c j , j � = i | c − i , x i , µ x , σ 2 N ( x i | µ, σ 2 I ) p ( µ, σ ∝ x ) T − 1 + α
Background Gene Expression Analysis Replicated Microarray Analysis Gibbs Sampler As Q → ∞ , we get a DPM Gibbs sampler for the model n − i , j p ( c i = j | c − i , x i , µ j , σ 2 T − 1 + α N ( x i ; µ j , σ 2 j ) ∝ j I ) α � p ( c i � = c j , j � = i | c − i , x i , µ x , σ 2 N ( x i | µ, σ 2 I ) p ( µ, σ ∝ x ) T − 1 + α Pairwise probability of being generated by the same pattern P ij = # samples after ‘burn-in’ with c i = c j S B
Background Gene Expression Analysis Replicated Microarray Analysis Gibbs Sampler As Q → ∞ , we get a DPM Gibbs sampler for the model n − i , j p ( c i = j | c − i , x i , µ j , σ 2 T − 1 + α N ( x i ; µ j , σ 2 j ) ∝ j I ) α � p ( c i � = c j , j � = i | c − i , x i , µ x , σ 2 N ( x i | µ, σ 2 I ) p ( µ, σ ∝ x ) T − 1 + α Pairwise probability of being generated by the same pattern P ij = # samples after ‘burn-in’ with c i = c j S B Distance D ij = 1 − P ij
Background Gene Expression Analysis Replicated Microarray Analysis Model for Replicated Data Generate replicates of each expression
Background Gene Expression Analysis Replicated Microarray Analysis Model for Replicated Data Generate replicates of each expression Account for variability in gene expression data
Background Gene Expression Analysis Replicated Microarray Analysis Model for Replicated Data Generate replicates of each expression Account for variability in gene expression data For G replicates N ( x ik ; y i , ψ 2 p ( x ik | y i , ψ i ) = i ) N ( y i | µ j , σ 2 p ( y i | c i = j , µ j , σ j ) = j I )
Background Gene Expression Analysis Replicated Microarray Analysis Model for Replicated Data Generate replicates of each expression Account for variability in gene expression data For G replicates N ( x ik ; y i , ψ 2 p ( x ik | y i , ψ i ) = i ) N ( y i | µ j , σ 2 p ( y i | c i = j , µ j , σ j ) = j I ) Integrating out the mean expression profile j + ψ 2 p ( x i 1 , . . . , x ik | c i = j , µ j , σ 2 x i ; µ j , ( σ 2 � i N (¯ j , ψ i ) = G ) I ) k
Background Gene Expression Analysis Replicated Microarray Analysis Model for Replicated Data Generate replicates of each expression Account for variability in gene expression data For G replicates N ( x ik ; y i , ψ 2 p ( x ik | y i , ψ i ) = i ) N ( y i | µ j , σ 2 p ( y i | c i = j , µ j , σ j ) = j I ) Integrating out the mean expression profile j + ψ 2 p ( x i 1 , . . . , x ik | c i = j , µ j , σ 2 x i ; µ j , ( σ 2 � i N (¯ j , ψ i ) = G ) I ) k Gibbs sampler used for inference
Background Gene Expression Analysis Replicated Microarray Analysis Experimental Results Move to paper for results
Background Gene Expression Analysis Replicated Microarray Analysis Results
Background Gene Expression Analysis Replicated Microarray Analysis Gibbs Sampler For the replicated model p ( c i = q | C − i , x i 1 , . . . , x ik , µ q , σ 2 q , ψ i , α ) n − i , q x i ; µ q , ( σ 2 q + ψ 2 ∝ T − 1 + α N (¯ i / G ) I ) p ( c i � = c j , j � = i | C − i , x ik , ψ i , α ) α � x i ; µ, ( σ 2 + ψ 2 i / G ) I ) p ( µ, σ 2 | λ, τ, ∝ N (¯ T − 1 + α
Background Gene Expression Analysis Replicated Microarray Analysis Gibbs Sampler For the replicated model p ( c i = q | C − i , x i 1 , . . . , x ik , µ q , σ 2 q , ψ i , α ) n − i , q x i ; µ q , ( σ 2 q + ψ 2 ∝ T − 1 + α N (¯ i / G ) I ) p ( c i � = c j , j � = i | C − i , x ik , ψ i , α ) α � x i ; µ, ( σ 2 + ψ 2 i / G ) I ) p ( µ, σ 2 | λ, τ, ∝ N (¯ T − 1 + α The integral is implemented as follows
Recommend
More recommend