decoding in latent conditional models
play

Decoding in Latent Conditional Models: A Practically Fast Solution - PowerPoint PPT Presentation

Latent dynamics workshop 2010 Decoding in Latent Conditional Models: A Practically Fast Solution for an NP-hard Problem Xu Sun ( ) University of Tokyo 2010.06.16 Outline Introduction Related Work & Motivations Our


  1. Latent dynamics workshop 2010 Decoding in Latent Conditional Models: A Practically Fast Solution for an NP-hard Problem Xu Sun ( 孫 栩 ) University of Tokyo 2010.06.16

  2. Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 2

  3. Latent dynamics • Latent-structures (latent dynamics here) are important in information processing – Natural language processing – Data mining – Vision recognition • Modeling latent dynamics: Latent-dynamic conditional random fields (LDCRF) 3

  4. Latent dynamics • Latent-structures (latent dynamics here) are important in information processing Parsing: Learn refined grammars with latent info S NP VP . PRP VBD NP . He heard DT NN the voice 4

  5. Latent dynamics • Latent-structures (latent dynamics here) are important in information processing Parsing: Learn refined grammars with latent info S -x NP -x VP -x . -x PRP -x VBD -x NP -x . He heard DT -x NN -x the voice 5

  6. More common cases: linear-chain latent dynamics • The previous example is a tree-structure • More common cases could be linear-chain latent dynamics – Named entity recognition – Phrase segmentation – Word segmentation seg seg seg noSeg These are her flowers. Phrase segmentation [Sun+ COLING 08] 6

  7. A solution without latent annotation: Latent-dynamic CRFs A solution: Latent-dynamic conditional random fields (LDCRFs) [Morency+ CVPR 07] * No need to annotate latent info seg seg seg noSeg These are her flowers. Phrase segmentation [Sun+ COLING 08] 7

  8. Current problem & our target A solution: Latent-dynamic conditional random fields (LDCRFs) [Morency+ CVPR 07] * No need to annotate latent info Our target: Current problem: An almost exact Inference (decoding) is an inference method NP-hard problem. with fast speed. 8

  9. Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 9

  10. Traditional methods • Traditional sequential labeling models – Hidden Markov Model (HMM) [Rabiner IEEE 89] – Maximum Entropy Model (MEM) [Ratnaparkhi EMNLP 96] – Conditional Random Fields (CRF) [Lafferty+ ICML 01] – Collins Perceptron [Collins EMNLP 02] Arguably the most accurate one. • Problem: not able to model latent structures  We will use it as one of the baseline. 10

  11. Conditional random field (CRF) [Lafferty+ ICML 01] y 1 y 2 y 3 y 4 y n x 1 x 2 x 3 x 4 x n   1       ( | , ) exp ( , ) P y x F y x  k k   Z ( , ) x k Problem: CRF does not model latent info 11

  12. Latent-Dynamic CRFs [Morency+ CVPR 07] y 1 y 2 y 3 y 4 y n Latent-dynamic h 1 h 2 h 3 h 4 h n CRFs x 1 x 2 x 3 x 4 x n y 1 y 2 y 3 y 4 y n Conditional random fields x 1 x 2 x 3 x 4 x n 12

  13. Latent-Dynamic CRFs [Morency+ CVPR 07] y 1 y 2 y 3 y 4 y n Latent-dynamic h 1 h 2 h 3 h 4 h n CRFs x 1 x 2 x 3 x 4 x n We can think (informally) it as y 1 y 2 y 3 y 4 y n “CRF + unsup . learning on latent info” Conditional random fields x 1 x 2 x 3 x 4 x n 13

  14. Latent-Dynamic CRFs [Morency+ CVPR 07]   1           exp ( , ) ( | , ) ( | , ) F h x P y x P h x  k k Z   ( , ) x     : H H h h h : h k j j y y j j Good performance reports * Outperforming HMM, MEMM, SVM, CRF, etc. * Syntactic parsing [Petrov+ NIPS 08] * Syntactic chunking [Sun+ COLING 08] * Vision object recognition [Morency+ CVPR 07; Quattoni+ PAMI 08] 14

  15. Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 15

  16. Inference problem Recent fast solutions are only y 1 y 2 y 3 y 4 y n approximation methods: *Best Hidden Path [Matsuzaki+ ACL 05] h 1 h 2 h 3 h 4 h n *Best Marginal Path [Morency+ CVPR 07] x 1 x 2 x 3 x 4 x n • Prob: Exact inference (find the sequence with max probability) is NP-hard! – no fast solution existing 16

  17. Related work 1: Best hidden path (BHP) [Matsuzaki+ ACL 05] Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 17

  18. Related work 1: Best hidden path (BHP) [Matsuzaki+ ACL 05] Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 Result: These are her flowers . Seg Seg Seg NoSeg Seg 18

  19. Related work 2: Best marginal path (BMP) [Morency+ CVPR 07] Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 19

  20. Related work 2: Best marginal path (BMP) [Morency+ CVPR 07] Seg-0 0.1 0.1 0.4 0.0 0.1 0.6 0.1 0.3 0.1 0.1 Seg-1 Seg-2 0.2 0.5 0.0 0.1 0.5 noSeg-0 0.1 0.1 0.2 0.1 0.2 0.0 0.2 0.0 0.7 0.0 noSeg-1 noSeg-2 0.0 0.0 0.1 0.0 0.1 Result: These are her flowers . Seg Seg Seg NoSeg Seg 20

  21. Our target 1) Exact inference y 1 y 2 y 3 y 4 y n 2) Comparable speed to existing approximation methods h 1 h 2 h 3 h 4 h n x 1 x 2 x 3 x 4 x n • Prob: Exact inference (find the sequence with Challenge/Difficulty: Exact & practically-fast solution max probability) is NP-hard! on an NP-hard problem – no fast solution existing 21

  22. Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 22

  23. Essential ideas [Sun+ EACL 09] • Fast & exact inference from a key observation – A key observation on prob. Distribution – Dynamic top-n search – Fast decision on optimal result from top-n candidates 23

  24. Key observation • Natural problems (e.g., NLP problems) are not completely ambiguous • Normally, Only a few result candidate are highly probable • Therefore, probability distribution on latent models could be sharp 24

  25. Key observation • Probability distribution on latent models is sharp These are her flowers . seg noSeg seg seg seg P = 0.2 0.8 seg seg seg noSeg seg P = 0.3 prob seg seg seg seg seg P = 0.2 seg seg noSeg noSeg seg P = 0.1 P = … seg noSeg seg noSeg seg P = … … … … … … 25

  26. Key observation • Probability distribution on latent models is • Challenge: the number of probable candidates are unknown & changing sharp • Need a method which can automatically These are her flowers . adapt itself on different cases seg noSeg seg seg seg P = 0.2 seg seg seg noSeg seg P = 0.3 seg seg seg seg seg P = 0.2 compare seg seg noSeg noSeg seg P = 0.1 P = … seg noSeg seg noSeg seg P(unknown) P = … … … … … … ≤ 0.2 26

  27. A demo on lattice Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 27

  28. (1) Admissible heuristics for A* search Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 28

  29. (1) Admissible heuristics for A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . Viterbi algo. (Right to left) 29

  30. (1) Admissible heuristics for A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 30

  31. (2) Find 1st latent path h1: A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 31

  32. (3) Get y1 & P(y1): Forward-Backward algo. Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 32

  33. (3) Get y1 & P(y1): Forward-Backward algo. Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 P(seg, noSeg, seg, seg, seg) = 0.2 noSeg-2 P(y*) = 0.2 h05 h15 h25 h35 h45 P(unknown) = 1 - 0.2 = 0.8 These are her flowers . P(y*) > P(unknown) ? 33

  34. (4) Find 2nd latent path h2: A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 34

  35. (5) Get y2 & P(y2): Forward-backward algo. Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 35

Recommend


More recommend