. . . . . . . . . Baum-Welch Algorithm December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang December 4th, 2012 Hyun Min Kang Advanced Hidden Markov Models Biostatistics 615/815 Lecture 22: Baum-Welch . . Uniform HMM Implementation 1 / 33 . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . Baum-Welch December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang 2 / 33 Revisiting Hidden Markov Model Implementation Uniform HMM . . . . . . . . . . . . . . . . . . . . . . . …" 1 # 2 # (me# 3 # T # a 12# a 23# a (T01)T# …" q 1# q 2# q 3# q T# states# b q1 (o 1 ) # b q2 (o 2 ) # b q3 (o 3 ) # b qT (o T ) # …" o 1# o 2# o 3# o T# data# π #
. . . . . . . . . Baum-Welch December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang . . HMM for a stochastic process / algorithm . 3 / 33 . . Implementation Uniform HMM Statistical analysis with HMM HMM for a deterministic problem . . . . . . . . . . . . . . . . . . . . . . . . • Given • Given parameters λ = { π, A , B } • and data o = ( o 1 , · · · , o T ) • Forward-backward algorithm • Compute Pr ( q t | o , λ ) • Viterbi algorithm • Compute arg max q Pr ( q | o , λ ) • Generate random samples of o given λ
. . . . . . . . . Baum-Welch December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang exponential using naive algorithms deterministic 4 / 33 given data Deterministic Inference using HMM Uniform HMM Implementation . . . . . . . . . . . . . . . . . . . . . . . • If we know the exact set of parameters, the inference is deterministic • No stochastic process involved in the inference procedure • Inference is deterministic just as estimation of sample mean is • The computational complexity of the inference procedure is • Using dynamic programming, the complexity can be reduced to O ( n 2 T ) .
. . . . . . . . . Using random process for the inference December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang Baum-Welch 5 / 33 Using Stochastic Process for HMM Inference Uniform HMM Implementation . . . . . . . . . . . . . . . . . . . . . . . • Randomly sampling o from Pr ( o | λ ) . • Estimating arg max λ Pr ( o | λ ) . • No analyitic algorithm available • Simplex, E-M algorithm, or Simulated Annealing is possible apply • Estimating the distribution Pr ( λ | o ) . • Gibbs Sampling
. . . . . . . . . conditional distribution of latent variable z . December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang . . Maximization step (M-step) . Baum-Welch distribution of z can be obtained 6 / 33 . Implementation . Recap : The E-M Algorithm Uniform HMM Expectation step (E-step) . . . . . . . . . . . . . . . . . . . . . . . . • Given the current estimates of parameters θ ( t ) , calculate the • Then the expected log-likelihood of data given the conditional Q ( θ | θ ( t ) ) = E z | x ,θ ( t ) [ log p ( x , z | θ )] • Find the parameter that maximize the expected log-likelihood θ ( t +1) = arg max Q ( θ | θ t ) θ
. . . . . . . . . Baum-Welch December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang be provided. 7 / 33 Uniform HMM Assumptions Implementation . . . . . . . . . . . . . . . . . . . . . . . Baum-Welch for estimating arg max λ Pr ( o | λ ) • Transition matrix is identical between states • a ij = Pr ( q t +1 = j | q t = i ) = Pr ( q t = j | q t − 1 = i ) • Emission matrix is identical between states • b i ( j ) = Pr ( o t = j | q t = i ) = Pr ( o t =1 = j | q t − 1 = i ) • This is NOT the only possible configurations of HMM • For example, a ij can be parameterized as a function of t . • Multiple sets of o independently drawn from the same distribution can • Other assumptions will result in different formulation of E-M algorithm
. . . . . . . . . Baum-Welch December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang . . 8 / 33 . . Implementation E-step of the Baum-Welch Algorithm Uniform HMM . . . . . . . . . . . . . . . . . . . . . . . 1 Run the forward-backward algorithm given λ ( τ ) Pr ( o 1 , · · · , o t , q t = i | λ ( τ ) ) α t ( i ) = Pr ( o t +1 , · · · , o T | q t = i , λ ( τ ) ) β t ( i ) = α t ( i ) β t ( i ) Pr ( q t = i | o , λ ( τ ) ) = γ t ( i ) = ∑ k α t ( k ) β t ( k ) 2 Compute ξ t ( i , j ) using α t ( i ) and β t ( i ) Pr ( q t = i , q t +1 = j | o , λ ( τ ) ) ξ t ( i , j ) = α t ( i ) a ij b j ( o t +1 ) β t +1 ( j ) = Pr ( o | λ ( τ ) ) α t ( i ) a ij b j ( o t +1 ) β t +1 ( j ) = ∑ ( k , l ) α t ( k ) a kl b l ( o t +1 ) β t +1 ( l )
. . . . . . . . . Baum-Welch Implementation Uniform HMM Hyun Min Kang Biostatistics 615/815 - Lecture 22 December 4th, 2012 9 / 33 . . . . . . . . . . . . . . . . . . . . . . . E-step : ξ t ( i , j ) ξ t ( i , j ) = Pr ( q t = i , q t +1 = j | o , λ ( τ ) ) • Quantifies joint state probability between consecutive states • Need to estimate transition probability • Requires O ( n 2 T ) memory to store entirely. • Only O ( n 2 ) is necessary for running Baum-Welch algorithm
. . . . . . . . . T December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang IEEE Information Theory Society News Letter, Dec 2003 A detailed derivation can be found at Baum-Welch ij T 10 / 33 Uniform HMM M-step of the Baum-Welch Algorithm Implementation . . . . . . . . . . . . . . . . . . . . . . . Let λ ( τ +1) = ( π ( τ +1) , A ( τ +1) , B ( τ +1) ) t =1 Pr ( q t = i | o , λ ( τ ) ) ∑ T ∑ T t =1 γ t ( i ) π ( τ +1) ( i ) = = ∑ T − 1 ∑ T − 1 t =1 Pr ( q t = i , q t +1 = j | o , λ ( τ ) ) t =1 ξ t ( i , j ) a ( τ +1) = = ∑ T − 1 ∑ T − 1 t =1 Pr ( q t = i | o , λ ( τ ) ) t =1 γ t ( i ) t =1 Pr ( q t = i , o t = k | o , λ ( τ ) ) ∑ T ∑ T t =1 γ t ( i ) I ( o t = k ) b i ( k ) ( τ +1) = = ∑ T t =1 Pr ( q t = i | o , λ ( τ ) ) ∑ T t =1 γ t ( i ) • Welch, ”Hidden Markov Models and The Baum Welch Algorithm”,
. . . . . . . . . Baum-Welch December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang 11 / 33 Uniform HMM Implementation . . . . . . . . . . . . . . . . . . . . . . . Additional function to HMM615.h class HMM615 { ... // assign newVal to dst, after computing the relative differences between them // note that dst is call-by-reference, and newVal is call-by-value static double update(double& dst, double newVal) { // calculate the relative differences double relDiff = fabs((dst-newVal)/(newVal+ZEPS)); dst = newVal; // update the destination value return relDiff; } ... };
. . . . . . . . . Baum-Welch December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang 12 / 33 Handling large number of states Uniform HMM Implementation . . . . . . . . . . . . . . . . . . . . . . . class HMM615 { ... void normalize(std::vector<double>& v) { // additional function double sum = 0; for(int i=0; i < (int)v.size(); ++i) sum += v[i]; for(int i=0; i < (int)v.size(); ++i) v[i] /= sum; } void forward() { for(int i=0; i < nStates; ++i) alphas.data[0][i] = pis[i] * emis.data[i][outs[0]]; for(int t=1; t < nTimes; ++t) { for(int i=0; i < nStates; ++i) { alphas.data[t][i] = 0; for(int j=0; j < nStates; ++j) { alphas.data[t][i] += (alphas.data[t-1][j] * trans.data[j][i] * emis.data[i][outs[t]]); } } normalize(alphas.data[t]); // **ADD THIS LINE** } } ... };
. . . . . . . . . Baum-Welch December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang 13 / 33 Uniform HMM Implementation . . . . . . . . . . . . . . . . . . . . . . . Additional function to Matrix615.h template <class T> void Matrix615<T>::fill(T val) { int nr = rowNums(); for(int i=0; i < nr; ++i) { std::fill(data[i].begin(),data[i].end(),val); } }
. . . . . . . . . Baum-Welch December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang 14 / 33 Implementation Uniform HMM . . . . . . . . . . . . . . . . . . . . . . . Additional function to Matrix615.h // print the content of matrix template <class T> void Matrix615<T>::print(std::ostream& o) { int nr = rowNums(); int nc = colNums(); for(int i=0; i < nr; ++i) { for(int j=0; j < nc; ++j) { if ( j > 0 ) o << "\t"; o << data[i][j]; } o << std::endl; } }
. . . . . . . . . Baum-Welch December 4th, 2012 Biostatistics 615/815 - Lecture 22 Hyun Min Kang 15 / 33 Uniform HMM Baum-Welch algorithm : initialization Implementation . . . . . . . . . . . . . . . . . . . . . . . // return a pair of (# iter, relative diff) given tolerance std::pair<int,double> HMM615::baumWelch(double tol) { // temporary variables to use internally Matrix615<double> xis(nStates,nStates); // Pr(q_{t+1} = j | q_t = j) Matrix615<double> sumXis(nStates,nStates); // sum_t xis(i,j) Matrix615<double> sumObsGammas(nStates,nObs); // sum_t gammas(i)I(o_t=j) std::vector<double> sumGammas(nStates); // sum_t gammas(i) double tmp, sum, relDiff = 1.; int iter; for(iter=0; (iter < MAX_ITERATION) && ( relDiff > tol ); ++iter) { relDiff = 0; // E-step : compute Pr(q|o,lambda) forwardBackward();
Recommend
More recommend