Decoding in Latent Conditional Models: A Practically Fast Solution - PowerPoint PPT Presentation

Latent dynamics workshop 2010 Decoding in Latent Conditional Models: A Practically Fast Solution for an NP-hard Problem Xu Sun ( 孫栩 ) University of Tokyo 2010.06.16

Outline • Introduction • Related Work & Motivations • Our proposals • Experiments • Conclusions 2

Latent dynamics • Latent-structures (latent dynamics here) are important in information processing – Natural language processing – Data mining – Vision recognition • Modeling latent dynamics: Latent-dynamic conditional random fields (LDCRF) 3

Latent dynamics • Latent-structures (latent dynamics here) are important in information processing Parsing: Learn refined grammars with latent info S NP VP . PRP VBD NP . He heard DT NN the voice 4

Latent dynamics • Latent-structures (latent dynamics here) are important in information processing Parsing: Learn refined grammars with latent info S -x NP -x VP -x . -x PRP -x VBD -x NP -x . He heard DT -x NN -x the voice 5

More common cases: linear-chain latent dynamics • The previous example is a tree-structure • More common cases could be linear-chain latent dynamics – Named entity recognition – Phrase segmentation – Word segmentation seg seg seg noSeg These are her flowers. Phrase segmentation [Sun+ COLING 08] 6

A solution without latent annotation: Latent-dynamic CRFs A solution: Latent-dynamic conditional random fields (LDCRFs) [Morency+ CVPR 07] * No need to annotate latent info seg seg seg noSeg These are her flowers. Phrase segmentation [Sun+ COLING 08] 7

Current problem & our target A solution: Latent-dynamic conditional random fields (LDCRFs) [Morency+ CVPR 07] * No need to annotate latent info Our target: Current problem: An almost exact Inference (decoding) is an inference method NP-hard problem. with fast speed. 8

Traditional methods • Traditional sequential labeling models – Hidden Markov Model (HMM) [Rabiner IEEE 89] – Maximum Entropy Model (MEM) [Ratnaparkhi EMNLP 96] – Conditional Random Fields (CRF) [Lafferty+ ICML 01] – Collins Perceptron [Collins EMNLP 02] Arguably the most accurate one. • Problem: not able to model latent structures  We will use it as one of the baseline. 10

Conditional random field (CRF) [Lafferty+ ICML 01] y 1 y 2 y 3 y 4 y n x 1 x 2 x 3 x 4 x n   1       ( | , ) exp ( , ) P y x F y x  k k   Z ( , ) x k Problem: CRF does not model latent info 11

Latent-Dynamic CRFs [Morency+ CVPR 07] y 1 y 2 y 3 y 4 y n Latent-dynamic h 1 h 2 h 3 h 4 h n CRFs x 1 x 2 x 3 x 4 x n y 1 y 2 y 3 y 4 y n Conditional random fields x 1 x 2 x 3 x 4 x n 12

Latent-Dynamic CRFs [Morency+ CVPR 07] y 1 y 2 y 3 y 4 y n Latent-dynamic h 1 h 2 h 3 h 4 h n CRFs x 1 x 2 x 3 x 4 x n We can think (informally) it as y 1 y 2 y 3 y 4 y n “CRF + unsup . learning on latent info” Conditional random fields x 1 x 2 x 3 x 4 x n 13

Latent-Dynamic CRFs [Morency+ CVPR 07]   1           exp ( , ) ( | , ) ( | , ) F h x P y x P h x  k k Z   ( , ) x     : H H h h h : h k j j y y j j Good performance reports * Outperforming HMM, MEMM, SVM, CRF, etc. * Syntactic parsing [Petrov+ NIPS 08] * Syntactic chunking [Sun+ COLING 08] * Vision object recognition [Morency+ CVPR 07; Quattoni+ PAMI 08] 14

Inference problem Recent fast solutions are only y 1 y 2 y 3 y 4 y n approximation methods: *Best Hidden Path [Matsuzaki+ ACL 05] h 1 h 2 h 3 h 4 h n *Best Marginal Path [Morency+ CVPR 07] x 1 x 2 x 3 x 4 x n • Prob: Exact inference (find the sequence with max probability) is NP-hard! – no fast solution existing 16

Related work 1: Best hidden path (BHP) [Matsuzaki+ ACL 05] Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 17

Related work 1: Best hidden path (BHP) [Matsuzaki+ ACL 05] Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 Result: These are her flowers . Seg Seg Seg NoSeg Seg 18

Related work 2: Best marginal path (BMP) [Morency+ CVPR 07] Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 19

Related work 2: Best marginal path (BMP) [Morency+ CVPR 07] Seg-0 0.1 0.1 0.4 0.0 0.1 0.6 0.1 0.3 0.1 0.1 Seg-1 Seg-2 0.2 0.5 0.0 0.1 0.5 noSeg-0 0.1 0.1 0.2 0.1 0.2 0.0 0.2 0.0 0.7 0.0 noSeg-1 noSeg-2 0.0 0.0 0.1 0.0 0.1 Result: These are her flowers . Seg Seg Seg NoSeg Seg 20

Our target 1) Exact inference y 1 y 2 y 3 y 4 y n 2) Comparable speed to existing approximation methods h 1 h 2 h 3 h 4 h n x 1 x 2 x 3 x 4 x n • Prob: Exact inference (find the sequence with Challenge/Difficulty: Exact & practically-fast solution max probability) is NP-hard! on an NP-hard problem – no fast solution existing 21

Essential ideas [Sun+ EACL 09] • Fast & exact inference from a key observation – A key observation on prob. Distribution – Dynamic top-n search – Fast decision on optimal result from top-n candidates 23

Key observation • Natural problems (e.g., NLP problems) are not completely ambiguous • Normally, Only a few result candidate are highly probable • Therefore, probability distribution on latent models could be sharp 24

Key observation • Probability distribution on latent models is sharp These are her flowers . seg noSeg seg seg seg P = 0.2 0.8 seg seg seg noSeg seg P = 0.3 prob seg seg seg seg seg P = 0.2 seg seg noSeg noSeg seg P = 0.1 P = … seg noSeg seg noSeg seg P = … … … … … … 25

Key observation • Probability distribution on latent models is • Challenge: the number of probable candidates are unknown & changing sharp • Need a method which can automatically These are her flowers . adapt itself on different cases seg noSeg seg seg seg P = 0.2 seg seg seg noSeg seg P = 0.3 seg seg seg seg seg P = 0.2 compare seg seg noSeg noSeg seg P = 0.1 P = … seg noSeg seg noSeg seg P(unknown) P = … … … … … … ≤ 0.2 26

A demo on lattice Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 27

(1) Admissible heuristics for A* search Seg-0 Seg-1 Seg-2 noSeg-0 noSeg-1 noSeg-2 These are her flowers . 28

(1) Admissible heuristics for A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . Viterbi algo. (Right to left) 29

(1) Admissible heuristics for A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 30

(2) Find 1st latent path h1: A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 31

(3) Get y1 & P(y1): Forward-Backward algo. Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 32

(3) Get y1 & P(y1): Forward-Backward algo. Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 P(seg, noSeg, seg, seg, seg) = 0.2 noSeg-2 P(y*) = 0.2 h05 h15 h25 h35 h45 P(unknown) = 1 - 0.2 = 0.8 These are her flowers . P(y*) > P(unknown) ? 33

(4) Find 2nd latent path h2: A* search Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 34

(5) Get y2 & P(y2): Forward-backward algo. Seg-0 h00 h10 h20 h30 h40 h01 h11 h21 h31 h41 Seg-1 Seg-2 h02 h12 h22 h32 h42 noSeg-0 h03 h13 h23 h33 h43 h04 h14 h24 h34 h44 noSeg-1 noSeg-2 h05 h15 h25 h35 h45 These are her flowers . 35

Decoding in Latent Conditional Models: A Practically Fast Solution - PowerPoint PPT Presentation

Latent dynamics workshop 2010 Decoding in Latent Conditional Models: A Practically Fast Solution for an NP-hard Problem Xu Sun ( ) University of Tokyo 2010.06.16 Outline Introduction Related Work & Motivations Our

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

Latent Class Models: The Latent Class Logit Model Accouting for unobserved heterogeneity:

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

BiFluX: A Bidirectional Functional Update Language for XML Hugo Pacheco 1 Tao Zan 2 Zhenjiang Hu 2

Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of

Courageously Liv iving Through COVID-19 Together: Residents and Families Dee Lender, Executive

A Theory Of Inferred Causation Daniel Kttel ETH Zrich, Switzerland 23. May 2006 Our Task

Topic Modeling Lecture 9: October 9, 2013 CS886 2 Natural Language Understanding University of

Recommendation Systems Stony Brook University CSE545, Spring 2019 Recommendation Systems

Digital I&C Software Reliability February 1, 2011 Gerard J. Holzmann Laboratory for

Decoding in Latent Conditional Models: A Practically Fast Solution - PowerPoint PPT Presentation

Latent dynamics workshop 2010 Decoding in Latent Conditional Models: A Practically Fast Solution for an NP-hard Problem Xu Sun ( ) University of Tokyo 2010.06.16 Outline Introduction Related Work & Motivations Our

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

1 Latent variable models In the next section we will discuss latent variable models for

Part III: Latent Tree Models Le Song ICML 2012 Tutorial on Spectral Algorithms for Latent

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

Latent Variable Models CS3750 Xiaoting Li 1 Out utli line Latent Variable Models

Review: Conditional Probability Conditional Probability The conditional probability of event

11/15/16 Conditional distributions Let X and Y be discrete r.v.s. Conditional probability mass

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

Latent Models: Sequence Models Beyond HMMs and Machine Translation Alignment CMSC 473/673 UMBC

Latent Class Models: The Latent Class Logit Model Accouting for unobserved heterogeneity:

Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1 Latent

BiFluX: A Bidirectional Functional Update Language for XML Hugo Pacheco 1 Tao Zan 2 Zhenjiang Hu 2

Variational Decoding for Statistical Machine Translation Zhifei Li, Jason Eisner, and Sanjeev

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of

Courageously Liv iving Through COVID-19 Together: Residents and Families Dee Lender, Executive

A Theory Of Inferred Causation Daniel Kttel ETH Zrich, Switzerland 23. May 2006 Our Task

Topic Modeling Lecture 9: October 9, 2013 CS886 2 Natural Language Understanding University of

Recommendation Systems Stony Brook University CSE545, Spring 2019 Recommendation Systems

Digital I&amp;C Software Reliability February 1, 2011 Gerard J. Holzmann Laboratory for

Digital I&C Software Reliability February 1, 2011 Gerard J. Holzmann Laboratory for