Latent dynamics workshop 2010 Decoding in Latent Conditional Models: A Practically Fast Solution for an NP-hard Problem Xu Sun ( 孫 栩 ) University of Tokyo 2010.06.16
Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 2
Latent dynamics • Latent-structures (latent dynamics here) are important in information processing – Natural language processing – Data mining – Vision recognition • Modeling latent dynamics: Latent-dynamic conditional random fields (LDCRF) 3
Latent dynamics • Latent-structures (latent dynamics here) are important in information processing Parsing: Learn refined grammars with latent info S NP VP . PRP VBD NP . He heard DT NN the voice 4
Latent dynamics • Latent-structures (latent dynamics here) are important in information processing Parsing: Learn refined grammars with latent info S -x NP -x VP -x . -x PRP -x VBD -x NP -x . He heard DT -x NN -x the voice 5
More common cases: linear-chain latent dynamics • The previous example is a tree-structure • More common cases could be linear-chain latent dynamics – Named entity recognition – Phrase segmentation – Word segmentation seg seg seg noSeg These are her flowers. Phrase segmentation [Sun+ COLING 08] 6
A solution without latent annotation: Latent-dynamic CRFs A solution: Latent-dynamic conditional random fields (LDCRFs) [Morency+ CVPR 07] * No need to annotate latent info seg seg seg noSeg These are her flowers. Phrase segmentation [Sun+ COLING 08] 7
Current problem & our target A solution: Latent-dynamic conditional random fields (LDCRFs) [Morency+ CVPR 07] * No need to annotate latent info Our target: Current problem: An almost exact Inference (decoding) is an inference method NP-hard problem. with fast speed. 8
Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 9
Traditional methods • Traditional sequential labeling models – Hidden Markov Model (HMM) [Rabiner IEEE 89] – Maximum Entropy Model (MEM) [Ratnaparkhi EMNLP 96] – Conditional Random Fields (CRF) [Lafferty+ ICML 01] – Collins Perceptron [Collins EMNLP 02] Arguably the most accurate one. • Problem: not able to model latent structures We will use it as one of the baseline. 10
Conditional random field (CRF) [Lafferty+ ICML 01] y 1 y 2 y 3 y 4 y n x 1 x 2 x 3 x 4 x n 1 ( | , ) exp ( , ) P y x F y x k k Z ( , ) x k Problem: CRF does not model latent info 11
Latent-Dynamic CRFs [Morency+ CVPR 07] y 1 y 2 y 3 y 4 y n Latent-dynamic h 1 h 2 h 3 h 4 h n CRFs x 1 x 2 x 3 x 4 x n y 1 y 2 y 3 y 4 y n Conditional random fields x 1 x 2 x 3 x 4 x n 12
Latent-Dynamic CRFs [Morency+ CVPR 07] y 1 y 2 y 3 y 4 y n Latent-dynamic h 1 h 2 h 3 h 4 h n CRFs x 1 x 2 x 3 x 4 x n We can think (informally) it as y 1 y 2 y 3 y 4 y n “CRF + unsup . learning on latent info” Conditional random fields x 1 x 2 x 3 x 4 x n 13
Latent-Dynamic CRFs [Morency+ CVPR 07] 1 exp ( , ) ( | , ) ( | , ) F h x P y x P h x k k Z ( , ) x : H H h h h : h k j j y y j j Good performance reports * Outperforming HMM, MEMM, SVM, CRF, etc. * Syntactic parsing [Petrov+ NIPS 08] * Syntactic chunking [Sun+ COLING 08] * Vision object recognition [Morency+ CVPR 07; Quattoni+ PAMI 08] 14
Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 15
Inference problem Recent fast solutions are only y 1 y 2 y 3 y 4 y n approximation methods: *Best Hidden Path [Matsuzaki+ ACL 05] h 1 h 2 h 3 h 4 h n *Best Marginal Path [Morency+ CVPR 07] x 1 x 2 x 3 x 4 x n • Prob: Exact inference (find the sequence with max probability) is NP-hard! – no fast solution existing 16
Related work 1: Best hidden path (BHP) [Matsuzaki+ ACL 05] Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 17
Related work 1: Best hidden path (BHP) [Matsuzaki+ ACL 05] Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 Result: These are her flowers . Seg Seg Seg NoSeg Seg 18
Related work 2: Best marginal path (BMP) [Morency+ CVPR 07] Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 19
Related work 2: Best marginal path (BMP) [Morency+ CVPR 07] Seg-0 0.1 0.1 0.4 0.0 0.1 0.6 0.1 0.3 0.1 0.1 Seg-1 Seg-2 0.2 0.5 0.0 0.1 0.5 noSeg-0 0.1 0.1 0.2 0.1 0.2 0.0 0.2 0.0 0.7 0.0 noSeg-1 noSeg-2 0.0 0.0 0.1 0.0 0.1 Result: These are her flowers . Seg Seg Seg NoSeg Seg 20
Our target 1) Exact inference y 1 y 2 y 3 y 4 y n 2) Comparable speed to existing approximation methods h 1 h 2 h 3 h 4 h n x 1 x 2 x 3 x 4 x n • Prob: Exact inference (find the sequence with Challenge/Difficulty: Exact & practically-fast solution max probability) is NP-hard! on an NP-hard problem – no fast solution existing 21
Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 22
Essential ideas [Sun+ EACL 09] • Fast & exact inference from a key observation – A key observation on prob. Distribution – Dynamic top-n search – Fast decision on optimal result from top-n candidates 23
Key observation • Natural problems (e.g., NLP problems) are not completely ambiguous • Normally, Only a few result candidate are highly probable • Therefore, probability distribution on latent models could be sharp 24
Key observation • Probability distribution on latent models is sharp These are her flowers . seg noSeg seg seg seg P = 0.2 0.8 seg seg seg noSeg seg P = 0.3 prob seg seg seg seg seg P = 0.2 seg seg noSeg noSeg seg P = 0.1 P = … seg noSeg seg noSeg seg P = … … … … … … 25
Key observation • Probability distribution on latent models is • Challenge: the number of probable candidates are unknown & changing sharp • Need a method which can automatically These are her flowers . adapt itself on different cases seg noSeg seg seg seg P = 0.2 seg seg seg noSeg seg P = 0.3 seg seg seg seg seg P = 0.2 compare seg seg noSeg noSeg seg P = 0.1 P = … seg noSeg seg noSeg seg P(unknown) P = … … … … … … ≤ 0.2 26
A demo on lattice Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 27
(1) Admissible heuristics for A* search Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 28
(1) Admissible heuristics for A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . Viterbi algo. (Right to left) 29
(1) Admissible heuristics for A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 30
(2) Find 1st latent path h1: A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 31
(3) Get y1 & P(y1): Forward-Backward algo. Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 32
(3) Get y1 & P(y1): Forward-Backward algo. Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 P(seg, noSeg, seg, seg, seg) = 0.2 noSeg-2 P(y*) = 0.2 h05 h15 h25 h35 h45 P(unknown) = 1 - 0.2 = 0.8 These are her flowers . P(y*) > P(unknown) ? 33
(4) Find 2nd latent path h2: A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 34
(5) Get y2 & P(y2): Forward-backward algo. Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 35
Recommend
More recommend