Analysing Gene Expression Data Using Gaussian Processes Lorenz Wernisch School of Crystallography Birkbeck College London – p.1/35
Overview Gene regulatory networks, microarrays Time-series analysis by linear regression Bayesian inference, Occam’s razor Extension to nonlinear models Gaussian processes Applications Filtering with Gaussian processes – p.2/35
Gene Regulatory Networks _� +� Gene expression levels depend on external stimuli and activity of genes (transcription factors) Microarrays measure the mRNA levels of genes Construction of gene networks from microarray data – p.3/35
A. thaliana : APRR family Time-series of A. APRR9 thaliana log expression levels Constant light APRR7 13 time points APRR5 every 4 hours from 26 to 74 hrs APRR3 Data by Kieron TOC1/APRR1 Edwards and Andrew Millar 26 30 34 38 42 46 50 54 58 62 66 70 74 time/hrs APRR family, possible modulators for light sensitivity of main circadian clock series – p.4/35
Networks from time-series data A A A . . . A B B B . . . B C C C C . . . t = 0 t = 1 t = 2 Static graph representing dependencies between genes has cycles Cycles unrolled in time: acyclic graph Network topology repeated over time slices – p.5/35
Linear time-series model x t = Φ x t − 1 + µ + w t x t is N -vector of RNA levels at time t (of N genes) w t is N -vector of biological noise added at t µ is N -vector of constant trend, ie constitutive expression If there is no constant trend , µ = 0 , Φ can be estimated by standard regression: Φ ′ = ( X t − 1 X ′ t − 1 ) − 1 X t − 1 X ′ t where X t and X t − 1 are N × ( T − 1) matrices with time vectors x 2 , . . . , x T and x 1 , . . . , x T − 1 as columns – p.6/35
Estimating matrix for APPR family Estimation by standard (least squares) regression: APRR9 APRR7 APRR5 APRR3 TOC1 APRR9 -0.59 -0.06 0.78 0.39 0.48 APRR7 0.56 0.35 0.34 0.29 0.21 APRR5 -0.80 0.15 -0.26 0.46 0.43 APRR3 -0.34 -0.94 -0.12 -0.13 0.05 TOC1 -0.11 -0.05 0.66 0.46 0.30 Problem : each gene connected to each other One could test for significance of nonzero parameters: problems of significance tests, significance levels, multiple testing, . . . – p.7/35
Bayesian models are simple Automatic complexity control, 0.030 Occam’s razor : data probability 0.020 Complex model covers many data sets: small probability each 0.010 Simple model few data sets: large probability each 0.000 −100 −50 0 50 100 [MacKay, Neal] data space Automatic relevance determination : assume Gaussian distribution for each matrix entry a ij with variances σ 2 ij as free parameters, integrate out a ij and maximize P ( D | model , { σ 2 ij } ) [RVMs Tipping] – p.8/35
Linear regression framework t = Φ w + ǫ Probability of data, given parameters ( likelihood ): (2 π ) N/ 2 σ N exp( −| t − Φ w | 2 1 p ( t | w, σ 2 ) = ) 2 σ 2 Gaussian prior on coefficients (weights) w : M � m exp( − α m w 2 1 α 1 / 2 m p ( w | α ) = ) (2 π ) − M/ 2 2 m =1 α m is the precision (the inverse variance 1 /σ 2 m ) – p.9/35
Maximum likelihood type II Integrating out w : (2 π ) N/ 2 | C | 1 / 2 exp( − 1 1 p ( t | α, σ 2 ) = 2 t ′ C − 1 t ) C = σ 2 I + Φ A − 1 Φ ′ Maximum likelihood estimation of hyperparameters α by maximizing p ( t | α, σ 2 ) (type II ML) brings Occam’s razor to bear Tipping et al. suggest analytical solutions for iterative optimization, optimizing for α i in turn Maximization, eg, by conjugate gradients seems to be at least as efficient – p.10/35
Sparse Bayesian estimates for APRR net APRR9 APRR7 APRR5 APRR3 TOC1 APRR9 -0.11 0.27 -0.90 -0.01 0 APRR7 0.00 0.28 0.00 -0.80 0 APRR5 0.28 0.39 0.00 0.00 0 APRR3 0.00 0.41 0.59 0.00 0 TOC1 0.00 0.37 0.52 0.00 0 Far fewer nonzero entries than in standard regression! – p.11/35
Reconstruction of APRR traces APRR9 APRR9 log expression levels APRR7 APRR7 APRR5 APRR5 APRR3 APRR3 TOC1/APRR1 TOC1/APRR1 26 30 34 38 42 46 50 54 58 62 66 70 74 26 30 34 38 42 46 50 54 58 62 66 70 74 time/hrs time/hrs Start estimated dynamics on initial conditions with 0 process noise: good agreement – p.12/35
Sparse Bayesian estimates for LHY/TOC1 net LHY TOC1 GI PIF3 LHY 0.66 0.80 -0.78 0.00 TOC1 -0.34 -0.19 0.58 -0.10 GI 0.00 -0.87 0.65 0.00 PIF3 0.00 0.00 0.22 -0.14 LHY in negative feedback with TOC1 Second negative feedback loop involving GI PIF3 just added for good measure – p.13/35
Reconstruction of LHY/TOC1 traces LHY LHY log expression levels TOC1/APRR1 TOC1/APRR1 GI GI PIF3 PIF3 26 30 34 38 42 46 50 54 58 62 66 70 74 26 30 34 38 42 46 50 54 58 62 66 70 74 time/hrs time/hrs Start estimated dynamics on initial conditions with 0 process noise – p.14/35
Nonlinear dependencies Assumed linear depencies of level of 4 gene A on other 2 gene levels G e 0 n e 10 C −2 Genes often operate 5 −4 as switches and −10 Gene B 0 −5 complex gates with 0 −5 Gene A nonlinear 5 interactions (eg −10 10 exclusive or) Need to go beyond linear models: Gaussian processes (GP) – p.15/35
Gaussian process Input values d -dimensional x = ( x 1 , . . . , x N ) , x i ∈ R d Target values t = ( t 1 , . . . , t N ) , t i ∈ R Joint distribution of the output t is multivariate Gaussian N (0 , K ) Covariance matrix K K pq = β 0 + C L ( x p , x q ) + C G ( x p , x q ) + σ 2 ǫ I ( p = q ) β 0 overall constant σ 2 ǫ noise term along diagonal of K I () indicator function – p.16/35
Covariance components Linear covariance part p B − 1 x q C L ( x p , x q ) = x ′ with linear relevance parameters B = diag ( β 1 , . . . , β d ) Squared exponential (Gaussian) covariance part C G ( x p , x q ) = α 0 exp( − 1 2( x p − x q ) ′ A − 1 ( x p − x q )) with nonlinear relevance parameters A = diag ( α 1 , . . . , α d ) and scale parameter α 0 – p.17/35
Compare with linear regression Compare linear covariance part with noise: p B − 1 x q + σ 2 C L ( x p , x q ) = x ′ ǫ I with the covariance matrix of a linear regression with weights integrated out (see above): C = Φ A − 1 Φ ′ + σ 2 ǫ I This is the same if B = diag( α 1 , . . . , α p ) = diag(1 /σ 2 1 , . . . , 1 /σ 2 p ) and the rows of Φ are the input vectors x i – p.18/35
Training of GP Covariance parameters θ MAP maximizing posterior probability: P ( θ | t, x ) ∝ P ( t | x, θ ) P ( θ ) with � � log P ( t | x, θ ) = − 1 t ′ K ( x, θ ) t − log | K ( x, θ ) |− n log 2 π 2 Lognormal prior P ( θ ) with fixed a and b log P ( θ ) = N ( θ | a, b ) Optimization with conjugate gradients (using derivatives) – p.19/35
Conditional mean and variance New input point x ∗ : � � K k ( x ∗ ) ˜ K = k ( x ∗ ) ′ k ( x ∗ , x ∗ ) where k ( x ∗ ) = ( β 0 + C L ( x ∗ , x q ) + C G ( x ∗ , x q )) N q =1 k ( x ∗ , x ∗ ) = β 0 + x ∗′ B − 1 x ∗ + α 0 + σ 2 ǫ f ( x ∗ ) is Gaussian N ( µ ( x ∗ ) , σ 2 ( x ∗ )) µ ( x ∗ ) = k ( x ∗ ) ′ K − 1 t σ 2 ( x ∗ ) = k ( x ∗ , x ∗ ) − k ( x ∗ ) ′ K − 1 k ( x ∗ ) – p.20/35
GP on simulated static data 12 21 10 18 Relevance parameters: 19 3 26 5 7 23 16 10 x 1 x 2 x 3 4 30 28 17 14 6 2 15 11 0 z nonlinear 0.21 0 0 22 8 20 1 9 linear 0 0.35 0 −5 13 27 29 10 −10 5 estimated sd 0.92 −10 5 −5 25 0 y 0 24 x −5 5 −10 10 30 data points with f ( x 1 , x 2 , x 3 ) = 5 sin(0 . 7 x 1 ) + 0 . 5 x 2 + ǫ where ǫ ∼ N (0 , 1) – p.21/35
GP on simulated time-series data Artificial network of 3 8 variables connected by nonlinear relationships 6 3 variables 4 x t +1 = 0 . 35 x t + 5 sin(0 . 3 y t ) + ǫ 1 2 y t +1 = 0 . 4 y t + 5 cos(0 . 3 z t ) + ǫ 2 0 z t +1 = 0 . 4 z t + 0 . 1 y 2 t − 2 + ǫ 3 −2 5 10 15 20 time Stable cycling easy to achieve with nonlinear networks – p.22/35
GP on time-series 10 10 5 5 0 z 0 z −5 −5 10 −10 10 −10 −10 5 −10 5 −5 −5 0 0 y 0 0 y x x −5 −5 5 5 −10 −10 10 10 Variable 1: the linear and nonlinear relevance parameters for input 3 are both 0 – p.23/35
GP on time-series 10 10 5 5 0 0 z z 10 −5 −5 10 5 5 −10 −10 10 0 10 0 x 5 x 5 0 −5 0 −5 y y −5 −5 −10 −10 −10 −10 Variable 2: the linear and nonlinear relevance parameters for input 1 are both 0 – p.24/35
GP on time-series 10 10 5 5 0 z 0 z −5 −5 10 −10 10 −10 −10 5 −10 5 −5 −5 0 0 y 0 0 y x x −5 −5 5 5 −10 −10 10 10 Variable 3: the linear and nonlinear relevance parameters for input 1 are both 0 – p.25/35
Gene network: LHY dependency Nonlinear relevance: 2 0.01, 0.01, 0.73, 0.01 1 Linear relevance: z 0.81, 1.13, 0.45, 0.00 0 Estimated sd 0.18 2 −1 −2 1 No dependency of LHY on −1 0 PIF3 y 0 x −1 1 −2 2 Nonlinear dependency of LHY on TOC1 and GI, LHY and PIF3 were set to 0 – p.26/35
Recommend
More recommend