A reversible infinite HMM using normalised random measures Konstantina Palla, David A. Knowles, Zoubin Ghahramani 23rd of June 2014 Konstantina Palla 1 / 24
M OTIVATION Assume a Markov chain X 1 , . . . , X t , . . . , X T , which is reversible : P ( X 1 , . . . , X t , . . . X T ) = P ( X T , . . . , X t , . . . , X 1 ) Applications • Modelling physical systems e.g transitions of a macromolecule conformation at fixed temperature. • Chemical dynamics of protein folding. Tasks • Find the transition operation (transition matrix) of the reversible Markov chain • Put a prior on the reversible Markov chain This work: proposes a Bayesian non-parametric prior for reversible Markov chains. Konstantina Palla 2 / 24
R EVERSIBLE M ARKOV CHAINS Problem : Put prior on reversible Markov chains. What does that mean? Reversible chains and random walk on weighted graph G ( V , E , W ) weighted undirected graph • vertex-set V = { i , r , q , . . . } i • edge-set E = { e ir , e iq , e rq , . . . } • weight-set W = { J ir , J rq , J iq , . . . } Discrete-time random walk on G → J iq J ir Markov chain with X t ∈ V and transition matrix J ij P ( i , j ) := k J ik , J rq � r q Put a prior on the transition matrix P (or on the weights J s). Konstantina Palla 3 / 24
B ASIC T HEORY Seminal work by Diaconis, Freedman and Coppersmith. Markov Exchangeability A process on a countable space S is Markov exchangeable if the probability of observing a path X 1 , . . . , X t , . . . , X T is only a function of X 1 and the transition counts C ( i , j ) := |{ X t = i , X t + 1 = j ; 1 ≤ t < T }| for all i , j ∈ S . Representation Theorem (Diaconis and Freedman, 1980) A process is Markov exchangeable and returns to every state visited infinitely often (recurrent), if and only if it is a mixture of recurrent Markov chains T − 1 � � P ( X 2 , . . . , X t , . . . , X T | X 1 ) = P ( X t , X t + 1 ) µ ( dP | X 1 ) P t = 1 where P is the set of stochastic matrices on S × S and the mixing measure µ ( ·| X 1 ) on P is uniquely determined. Problem: Determine the prior µ . Not always easy. Konstantina Palla 4 / 24
R ELATED WORK Random walk with reinforcement i • Idea: Simulate from the prior µ . • Increase the edge weight by +1 each time an edge is crossed. + 1 1 + 1 1 1 T →∞ T [ J ir , J rq , J iq ] − − − → [ L ir , L rq , L iq ] ∼ µ T - total number of steps, µ - measure over edge + 1 weights, the underlying prior r q 1 • Process Markov exchangeable, recurrent → mixture of recurrent MCs Examples • Edge Reinforcement Random Walk (ERRW) Diaconis and Freedman [1980], Diaconis and Rolles [2006]; conjugate prior for the transition matrix for reversible MCs. • Edge reinforced schema by Bacallado et al. [2013] extends ERRW to countably infinite space, reversible process, prior is difficult to characterise. Konstantina Palla 5 / 24
R ELATED WORK Define a prior over reversible Markov chains: 1. Explicitly characterize the measure µ over transition matrix 2. Define an Edge Reinforcement schema Proposed work : Explicitly construct the prior µ over the weights (or equivalently the transition matrix) Konstantina Palla 6 / 24
A MODEL FOR REVERSIBLE M ARKOV CHAINS General idea: Define the prior over the weights using the Gamma process hierarchically . Gamma process Γ P ( α 0 H ) Completely random measure on X with Lévy measure ν ( dw , dx ) = ρ ( dw ) H ( dx ) = a 0 w − 1 e − a 0 w dw H ( dx ) . on the space X × [ 0 , ∞ ) . H is the base measure and α 0 the concentration parameter. ∞ � G 0 := w i δ X i ∼ Γ P ( α 0 H ) i = 1 Countably infinite collection of pairs { X i , w i } ∞ i = 1 sampled from a Poisson process with intensity ν . Konstantina Palla 7 / 24
A MODEL FOR REVERSIBLE M ARKOV CHAINS Define the prior over the weights using the Gamma process hierarchically . Model α 0 µ 0 1. First level: Γ P over space X G 0 α ∞ � G 0 = w i δ x i ∼ Γ P ( α 0 , µ 0 ) i = 1 G Set of states S := { x i ; x i ∈ X , i ∈ N } , countably infinite . X 2. Second level: Γ P over space S × S . w i i ∞ ∞ � � G = J ij δ X i X j ∼ Γ P ( α, µ ) , J iq J ri i = 1 j = 1 J ir J qi J ij | α, w i , w j ∼ Gamma ( α w i w j , α ) J rq Base measure atomic on S × S : w q r w r q µ ( x i , x j ) = G 0 ( x i ) G 0 ( x j ) J qr Non-reversible : Directed edges, J ij � = J ji Konstantina Palla 8 / 24
A MODEL FOR REVERSIBLE M ARKOV CHAINS Reversibility w i i Impose symmetry J ij = J ji ∼ Gamma ( α w i w j , α ) Proof: Sufficient to prove detailed balance J ir J qi π i P ( i , j ) = π j P ( j , i ) � k J ik J rq where π i = k J jk , 0 < � k J jk < ∞ � � w q r w r j q Corollary: π is the invariant measure of the chain. We call the model the Symmetric Hierarchical Gamma Process (SHGP) Konstantina Palla 9 / 24
A MODEL FOR REVERSIBLE M ARKOV CHAINS Properties • Irreducibility A MC is irreducible if ∃ t ∈ N s.t P t ij > 0 , ∀ i , j ∈ S J ij SHGP is irreducible: , J ij , � k J ik ∈ ( 0 , ∞ ) → P ij = k J ik > 0 a.s ∀ i , j ∈ S � • Recurrence A state i is positive recurrent if E ( τ ii ) < ∞ , τ ij := min { t > 1 : X t = j | X 1 = i } The SHGP is positive recurrent since the following applies: Theorem (Levin et al. [2006]) An irreducible Markov chain is positive recurrent iff there exists a probability distribution π such that π = π P. Konstantina Palla 10 / 24
A MODEL FOR REVERSIBLE M ARKOV CHAINS Representation Theorem A process is Markov exchangeable and returns to every state visited infinitely often (recurrent), if and only if it is a mixture of recurrent Markov chains T − 1 � � P ( X 2 , . . . , X t , . . . , X T | X 1 ) = P ( X t , X t + 1 ) µ ( dP | X 1 ) P t = 1 where P is the set of stochastic matrices on S × S and µ ( ·| X 1 ) on P is the mixing measure. SHGP • Explicitly defined prior µ ; hierarchical construction of weights • SHGP is a mixture of recurrent, reversible Markov chains • SHGP is recurrent, Markov exchangeable and reversible. Konstantina Palla 11 / 24
T HE SHGP H IDDEN M ARKOV M ODEL α 0 µ 0 G 0 α G X 1 X 2 X 3 X T X E Y 1 Y 2 Y 3 Y T Y Finite number of states K . Countably X t ∈ { 1 , . . . , K } - hidden state sequence. infinite model as K → ∞ . E - emission matrix K Y t , t = 1 , . . . , T - observed sequence with � G 0 = w i δ x i observation model F ( ·| E ) i = 1 Y t | X t , E ∼ iid F ( ·| E X t ) w i ∼ Gamma ( α 0 µ 0 ( x i ) , α 0 ) K K � � { E k , k = 1 , · · · , K } state emission G = J ij δ x i , x j parameters. F ; multinomial, Poisson and i = 1 j = 1 Gaussian observation models J ij = J ji ∼ Gamma ( α w i w j , α ) Konstantina Palla 12 / 24
E XPERIMENTS We ran the SHGP Hidden Markov model on 2 real world datasets with reversible underlying systems. Comparison against • SHGP HMM non-reversible • infinite HMM (HDP) Konstantina Palla 13 / 24
C H IP- SEQ DATA FROM NEURAL STEM CELLS • ChIP-seq allows us to measure what proteins, with what chemical modifications, are bound to DNA along the genome. • Y matrix T × L , T = 2 · 10 4 and L = 6: counts, how many reads for the protein of interest l map to bin t. • Poisson (multivariate) likelihood model F . 200 H3K27ac H3K27me3 150 H3K4me1 read counts H3K4me3 100 p300 Pol2 50 0 0 50 100 150 200 250 300 genomic location (100bp) Figure: ChipSeq data for a small section of length 300 of the whole chromosome region, along with the L = 6 identifiers (proteins of interest) Konstantina Palla 14 / 24
C H IP- SEQ DATA FROM NEURAL STEM CELLS Task: Predict held out values in Y . Table: ChipSeq results for 10 runs using different hold out patterns (20%), a truncation level of K = 30, 1000 iterations and a burnin of 700. Model Alogirthm Train error Test error Train log likelihood Test log likelihood Reversible HMC 0 . 9122 ± 0 . 0032 1 . 1158 ± 0 . 0097 − 1 . 0488 ± 0 . 0009 − 3 . 2422 ± 0 . 0023 Non-rev 0 . 9127 ± 0 . 0033 1 . 1167 ± 0 . 0095 − 1 . 0494 ± 0 . 0009 − 3 . 2478 ± 0 . 0022 iHMM Beam Sampler 0 . 9383 ± 0 . 0061 1 . 1365 ± 0 . 0107 − 1 . 0727 ± 0 . 0041 − 3 . 3047 ± 0 . 0027 Konstantina Palla 15 / 24
C H IP- SEQ DATA FROM NEURAL STEM CELLS SHGP recovers known types of regulatory regions • promoters . • enhancers . Figure: Learnt emission matrix L × K for ChIP-seq dataset. Element E lk is the Poisson rate parameter for protein l in state k . Brighter indicates higher values . Konstantina Palla 16 / 24
S INGLE ION CHANNEL RECORDINGS DATASET • Patch clamp recordings is a method for measuring conformational changes in ion channels. These changes are accompanied by changes in electrical potential (measurements). • Y matrix 1 × T , T = 10 4 : 10KHz recording of electrical potential measurements of a single alamethicin channel. • Gaussian likelihood model F . Y t | X t , E ∼ N ( Y t ; µ, σ ) , where µ = E ( X t , 1 ) and σ = E ( X t , 2 ) with K × 2 emission matrix E . Table: Ion channel results across 10 different random hold out patterns, a truncation of K = 15, 1000 iterations and a burnin of 700. Model Alogirthm Train error Test error Train log likelihood Test log likelihood 0 . 023 ± 0 . 001 0 . 030 ± 0 . 002 2 . 204 ± 0 . 055 2 . 034 ± 0 . 058 Reversible HMC 0 . 027 ± 0 . 007 0 . 033 ± 0 . 007 2 . 108 ± 0 . 084 1 . 970 ± 0 . 078 Non-reversible HMC 0 . 038 ± 0 . 005 0 . 045 ± 0 . 004 2 . 134 ± 0 . 070 2 . 008 ± 0 . 058 iHMM Beam sampler Konstantina Palla 17 / 24
Recommend
More recommend