spiral efficient and exact model identification for
play

SPIRAL: Efficient and Exact Model Identification for Hidden Markov - PowerPoint PPT Presentation

SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models Yasuhiro Fujiwara (NTT Cyber Space Labs) Yasushi Sakurai (NTT Communication Science Labs) Masashi Yamamuro (NTT Cyber Space Labs) Speaker: Yasushi Sakurai 1 Motivation


  1. SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models Yasuhiro Fujiwara (NTT Cyber Space Labs) Yasushi Sakurai (NTT Communication Science Labs) Masashi Yamamuro (NTT Cyber Space Labs) Speaker: Yasushi Sakurai 1

  2. Motivation • HMM(Hidden Markov Model) – Mental task classification • Understand human brain functions with EEG signals – Biological analysis • Predict organisms functions with DNA sequences – Many other applications • Speech recognition, image processing, etc • Goal – Fast and exact identification of the highest-likelihood model for large datasets 2

  3. Mini-introduction to HMM ( ) = • Observation sequence is a ! X x , x , , x 1 2 n probabilistic function of states • Consists of the three sets of parameters: { } ( ) – Initial state probability : p = p £ £ 1 i m i u = • State at time t 1 i { } ( ) – State transition probability: = £ £ a a 1 i , j m ij u u • Transition from state to j i ( ) { ( ) } ( ) – Symbol probability: = £ £ b v b v 1 i m i v u • Output symbol in state i 3

  4. Mini-introduction to HMM • HMM types – Ergodic HMM • Every state can be reached from every other state – Left-right HMM • Transitions to lower number states are prohibited • Always begin with the first state • Transition are limited to a small number of states Ergodic HMM Left-right HMM 4

  5. Mini-introduction to HMM • Viterbi path in the trellis structure – Trellis structure: states lie on the vertical axis, the sequence is aligned along the horizontal axis – Viterbi path: state sequence which gives the likelihood u m ・ ・ Viterbi path ・ ・ u 2 u 1 x x x 1 n 2 Trellis structure 5

  6. Mini-introduction to HMM • Viterbi algorithm – Dynamic programming approach – Maximize the probabilities from the previous states ( ) = P max p in £ £ 1 i m ( ) ( ) ( ) × × £ £ ì max p a b x 2 t n ( ) - j t 1 ji i t = p í £ £ 1 j m ( ) ( ) it p × = b x t 1 î i i 1 p u t : the maximum probability of state at time it i 6

  7. Problem Definition • Given – HMM dataset ( ) = – Sequence of arbitrary length X x , x , ! , x 1 2 n • Find – Highest-likelihood model, estimated with respect to X, from the dataset 7

  8. Why not ‘Naive’ • Naïve solution 1. Compute the likelihood for every model using the Viterbi algorithm 2. Then choose the highest-likelihood model But.. ( ) 2 O nm – High search cost: time for every model • Prohibitive for large HMM datasets m : # of states n : sequence length of X 8

  9. Our Solution, SPIRAL • Requirements: – High-speed search • Identify the model efficiently – Exactness • Accuracy is not sacrificed – No restriction on model type • Achieve high search performance for any type of models 9

  10. Likelihood Approximation Reminder: Naive 10

  11. Likelihood Approximation • Create compact models (reduce the model size) – For given m states and granularity g, – Create m / g states by merging ‘similar’ states m g m g n 11

  12. Likelihood Approximation • Use the vector F i of state u i for clustering ( ( ) ( ) ) = p ! ! ! F ; a , , a , a , , a ; b v , , b v i i i 1 im 1 i mi i 1 i s s : number of symbols • Merge all the states u i in a cluster C and create a new state u C • Choose the highest probability among the probabilities of u i Obtain the upper ( ) ( ) ¢ ¢ p = p = bounding likelihood max a max a C i Cj ij Î Î Ï u C u C , u C i i j ( ) ( ) ( ) ( ( ) ) ¢ ¢ ¢ = = = a max a a max a b v max b v CC ik jC ji C i Î Î Ï Î u , u C u C , u C u C i k i j i 12

  13. Likelihood Approximation P ¢ • Compute approximate likelihood from the compact model ( ) ¢ ¢ = P max p in ¢ £ £ 1 i m ( ) ( ) ( ) ¢ ¢ ¢ × × £ £ ì max p a b x 2 t n ( ) - j t 1 ji i t ¢ = p í ¢ £ £ 1 j m ( ) ( ) it ¢ ¢ p × = b x t 1 î p ¢ : maximum probability of states i i 1 it • Upper bounding likelihood P ¢ P ³ ' P – For approximate likelihood , holds – Exploit this property to guarantee exactness in search processing 13

  14. Likelihood Approximation Advantages • The best model can not be pruned – The approximation gives the upper bounding likelihood of the original model • Support any model type – Any probabilistic constraint is not applied to the approximation 14

  15. Multi-granularities • The likelihood approximation has the trade-off between accuracy and computation time – As the model size increases, accuracy improves – But the likelihood computation cost increases g • Q: How to choose granularity ? 15

  16. Multi-granularities • The likelihood approximation has the trade-off between accuracy and computation time – As the model size increases, accuracy improves – But the likelihood computation cost increases g • Q: How to choose granularity ? • A: Use multiple granularities ( ) + = ë û – h 1 h log m distinct granularities that form a 2 geometric progression g i =2 i ( i =0,1,2,…, h ) – Geometrically increase the model size 16

  17. Multi-granularities P ¢ • Compute the approximate likelihood from the coarsest model as the first step û ( ) ë = – Coarsest model has states h m 2 1 ¢ < q P • Prune the model if , otherwise q : threshold 17

  18. Multi-granularities P ¢ • Compute the approximate likelihood from the second coarsest model ë û 2 - – Second coarsest model has states h 1 m ¢ < q P • Prune the model if 18

  19. Multi-granularities q • Threshold – Exploit the fact that we have found a good model of high likelihood q • : exact likelihood of the best-so-far candidate during search processing q – is updated and increases when promising model is found q – Use for model pruning 19

  20. Multi-granularities P ¢ • Compute the approximate likelihood from the second coarsest model ë û 2 - – Second coarsest model has states h 1 m ¢ < q P • Prune the model if , otherwise – q : exact likelihood of the best-so-far candidate 20

  21. Multi-granularities P ¢ • Compute the likelihood from more accurate model ¢ < q P • Prune the model if 21

  22. Multi-granularities • Repeat until the finest granularity (the original model) • Update the answer candidate and best-so-far ³ q P likelihood if 22

  23. Multi-granularities • Optimize the trade-off between accuracy and computation time – Low-likelihood models are pruned by coarse-grained models – Fine-grained approximation is applied only to high- likelihood models • Efficiently find the best model for a large dataset – The exact likelihood computations are limited to the minimum number of necessary 23

  24. Transition Pruning • Trellis structure has too many transitions • Q: How to exclude unlikely paths 24

  25. Transition Pruning • Trellis structure has too many transitions • Q: How to exclude unlikely paths • A: Use the two properties – Likelihood is monotone non-increasing (likelihood computation) – Threshold is monotone non-decreasing (search processing) 25

  26. Transition Pruning e • In likelihood computation, compute the estimate it ì ( ) ( n ( ) ) Õ - n t × × £ £ - ï p a b x 1 t n 1 = e it max max j í it = + j t 1 ( ) ï = p t n î in ( ) ( ) ( ) = = a max a , b v max b v where max ij max i £ £ £ £ 1 i , j m 1 i m – e it : conservative estimate of the likelihood p it of state u i at time t < q • If , prune all paths that pass through u i at t e it – q : exact likelihood of the best-so-far candidate 26

  27. Transition Pruning • Terminate the likelihood computation if all the paths are excluded • Efficient especially for long sequences • Applicable to approximate likelihood computation 27

  28. Accuracy and Complexity • SPIRAL needs the same order of memory space, 2 while can be up to times faster m Complexity Accuracy Memory Space Computation time ( ) 2 Viterbi O nm ( ) + 2 O m ms Guarantee exactness ( ) At least O n SPIRAL ( ) 2 O nm At most 28

  29. Experimental Evaluation • Setup – Intel Core 2 1.66GHz, 2GB memory • Datasets – EEG, Chromosome, Traffic • Evaluation – Mainly computation time – Ergodic HMM – Compared the Viterbi algorithm and Beam search • Beam search: popular technique, but does not guarantee exactness 29

  30. Experimental Evaluation • Evaluation – Wall clock time versus number of states – Wall clock time versus number of models – Effect of likelihood approximation – Effect of transition pruning – SPIRAL vs Beam search 30

  31. Experimental Evaluation • Wall clock time versus number of states – EEG : up to 200 times faster 31

  32. Experimental Evaluation • Wall clock time versus number of states – Chromosome : up to 150 times faster 32

Recommend


More recommend