Neural Encoding with Structured Decoding Pushpendre Rastogi 3 rd year CS Phd. Student pushpendre@jhu.edu Johns Hopkins University CLSP Student Seminar, Spring 2016 Pushpendre Rastogi (CLSP, JHU) Representations . . . 1 / 18
Outline 1 Introduction 2 Best of Both Worlds: Neural Encoding with Structured Decoding 3 Acknowledgements and References Pushpendre Rastogi (CLSP, JHU) Representations . . . 2 / 18
Introduction: Two Themes 1 Improving Neural Network Architectures. Pushpendre Rastogi (CLSP, JHU) Representations . . . 3 / 18
Outline 1 Introduction 2 Best of Both Worlds: Neural Encoding with Structured Decoding 3 Acknowledgements and References Pushpendre Rastogi (CLSP, JHU) Representations . . . 4 / 18
Background: What is the task? String transduction : Convert an input string to an output string. Example • Morphological Transduction: • Convert an imperative word in german to its past participle form. a b r e i b t �→ a b g e r i e b e n • Lemmatization: • Lemmatize a word in tagalog. b i n a w a l a n �→ b a w a l • Annotate a string: • Bob is a builder �→ Noun Verb Det Noun Pushpendre Rastogi (CLSP, JHU) Representations . . . 5 / 18
What do we offer? Task = 13SIA Task = 2PIE 100 95 Accuracy 90 85 80 75 Task = 2PKE Task = rP 100 95 Accuracy 90 85 80 75 BiLSTM Seq2Seq BiLSTM Seq2Seq WFST Attention WFST Attention Method Method Pushpendre Rastogi (CLSP, JHU) Representations . . . 6 / 18
The Idea Use a Neural Sequence Encoder to weight the arcs of a Weighted FST. Pushpendre Rastogi (CLSP, JHU) Representations . . . 7 / 18
Background Weighted Finite State Transducers: Deterministic 3 0 1 2 s:s a:a y:y Pushpendre Rastogi (CLSP, JHU) Representations . . . 8 / 18
Background Weighted Finite State Transducers: Deterministic 3 0 1 2 s:s a:a y:y Pushpendre Rastogi (CLSP, JHU) Representations . . . 8 / 18
Background Weighted Finite State Transducers: Deterministic 3 0 1 2 s:s a:a y:y Pushpendre Rastogi (CLSP, JHU) Representations . . . 8 / 18
Background Weighted Finite State Transducers: Deterministic 3 0 1 2 s:s a:a y:y What is a State? The States of an FST/WFST are its Memory. Previous Work weights this transducer. Pushpendre Rastogi (CLSP, JHU) Representations . . . 8 / 18
Background Weighted Finite State Transducers: Non-Deterministic y:s ! :s ! :s s $ s:s y:y ! :a s:a a y a:s d: ! d:y i:y Pushpendre Rastogi (CLSP, JHU) Representations . . . 9 / 18
Background Weighted Finite State Transducers: Non-Deterministic y:s ! :s ! :s s $ s:s y:y ! :a s:a a y a:s d: ! d:y i:y Pushpendre Rastogi (CLSP, JHU) Representations . . . 9 / 18
Background Weighted Finite State Transducers: Non-Deterministic y:s ! :s ! :s s $ s:s y:y ! :a s:a a y a:s d: ! d:y i:y Pushpendre Rastogi (CLSP, JHU) Representations . . . 9 / 18
Background Weighted Finite State Transducers: Non-Deterministic y:s ! :s ! :s s $ s:s y:y ! :a s:a a y a:s d: ! d:y i:y Pushpendre Rastogi (CLSP, JHU) Representations . . . 9 / 18
Background Weighted Finite State Transducers: Non-Deterministic y:s What’s in a Path? A Path is an alignment. ! :s ( ǫ :s s:a a:s y:s) �→ say:sass ! :s s $ ( ǫ :s s:a a: ǫ y:y) �→ say:say s:s ( ǫ : ǫ s:s a:a y:y) �→ say:say y:y ! :a s:a ( ǫ :s s:a a:s y:y) �→ say:sasy Previous Work weights this a y a:s transducer. d: ! d:y i:y Pushpendre Rastogi (CLSP, JHU) Representations . . . 9 / 18
Background Neural Bi-Directional Sequence Encoder α 0 e y e a e s Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18
Background Neural Bi-Directional Sequence Encoder α 0 f ( α 0 , e s ) e y e a e s Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18
Background Neural Bi-Directional Sequence Encoder α 0 α 1 e y e a e s Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18
Background Neural Bi-Directional Sequence Encoder α 0 α 1 α 2 e y e a e s Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18
Background Neural Bi-Directional Sequence Encoder α 0 α 1 α 2 e y e a e s β 3 Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18
Background Neural Bi-Directional Sequence Encoder α 0 α 1 α 2 e y e a e s β 2 β 3 Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18
Background Neural Bi-Directional Sequence Encoder α 0 α 1 α 2 e y e a e s β 1 β 2 β 3 Pushpendre Rastogi (CLSP, JHU) Representations . . . 10 / 18
Background: Existing models. Weighted Finite State Transducers [Moh97, Eis02] Pros Cons Neural Encoders and Decoders [SVL14] Pros Cons Pushpendre Rastogi (CLSP, JHU) Representations . . . 11 / 18
Background: Existing models. Weighted Finite State Transducers [Moh97, Eis02] Pros The states in an FST can be tailored for the task. Can compute the probability of a string. Cons Neural Encoders and Decoders [SVL14] Pros Cons Pushpendre Rastogi (CLSP, JHU) Representations . . . 11 / 18
Background: Existing models. Weighted Finite State Transducers [Moh97, Eis02] Pros The states in an FST can be tailored for the task. Can compute the probability of a string. Cons Traditionally arcs weights are linear functionals of arc features. • ROI on feature engineering may be low. • The model may become slow if there are too many features. • The local features may not be expressive enough. Neural Encoders and Decoders [SVL14] Pros Cons Pushpendre Rastogi (CLSP, JHU) Representations . . . 11 / 18
Background: Existing models. Weighted Finite State Transducers [Moh97, Eis02] Pros The states in an FST can be tailored for the task. Can compute the probability of a string. Cons Traditionally arcs weights are linear functionals of arc features. • ROI on feature engineering may be low. • The model may become slow if there are too many features. • The local features may not be expressive enough. Neural Encoders and Decoders [SVL14] Pros Produce reasonable results with zero feature engineering. Cons Pushpendre Rastogi (CLSP, JHU) Representations . . . 11 / 18
Background: Existing models. Weighted Finite State Transducers [Moh97, Eis02] Pros The states in an FST can be tailored for the task. Can compute the probability of a string. Cons Traditionally arcs weights are linear functionals of arc features. • ROI on feature engineering may be low. • The model may become slow if there are too many features. • The local features may not be expressive enough. Neural Encoders and Decoders [SVL14] Pros Produce reasonable results with zero feature engineering. Cons Require a lot of training data for performance. Cannot return the probability of a string. Pushpendre Rastogi (CLSP, JHU) Representations . . . 11 / 18
Neural Encoding with Structured Decoding 3 0 1 2 s:s a:a y:y Figure: The automaton I encoding say . y:s ! :s ! :s s $ s:s y:y ! :a s:a a y a:s d: ! d:y i:y Figure: Transducer F . Only a few of the possible states and edit arcs are shown. Previous Work weights these transducers Pushpendre Rastogi (CLSP, JHU) Representations . . . 12 / 18
Neural Encoding with Structured Decoding ! :s ! :s ! :s ! :s 3 y:s 0 1 2 s:s a:s s:s a:a y:y s: ! a: ! y: ! 3, s Figure: The automaton I encoding say . 0, s 1, s 2, s s:a a:a y:a y:s ! :s ! :a ! :s ! :a ! :s ! :a ! :s ! :a ! :s ! :s s $ s:s s:s a:s y:s 0, a 1, a 2, a 3, a y:y s: ! a: ! y: ! ! :a s:a s:s a:s y:s a y a:s ! :s ! :s ! :s ! :s d: ! d:y i:y Figure: G = I ◦ F . Only a few states, Figure: Transducer F . Only a few of the but all arcs between them are shown. possible states and edit arcs are shown. Our Work weights this transducer. Previous Work weights these transducers Pushpendre Rastogi (CLSP, JHU) Representations . . . 12 / 18
Neural Encoding with Structured Decoding ! :s ! :s ! :s ! :s 3 y:s 0 1 2 s:s a:s s:s a:a y:y s: ! a: ! y: ! 3, s Figure: The automaton I encoding say . 0, s 2, s 1, s s:a a:a y:a y:s ! :s ! :a ! :s ! :a ! :s ! :a ! :s ! :a ! :s ! :s s $ s:s s:s a:s y:s 0, a 1, a 2, a 3, a y:y s: ! a: ! y: ! ! :a s:a a s:s a:s y:s y a:s ! :s ! :s ! :s ! :s d: ! d:y i:y Figure: G = I ◦ F . Only a few states, Figure: Transducer F . Only a few of the but all arcs between them are shown. possible states and edit arcs are shown. Our Work weights this transducer. Previous Work weights these transducers Why do we do this? Weighting F ≡ Weighting edits per type. Weighting G ≡ Weighting edits per token. Neural features encode entire sentence. We get a context dependent output side language model. Pushpendre Rastogi (CLSP, JHU) Representations . . . 12 / 18
Neural Encoding with Structured Decoding ! :s ! :s ! :s ! :s y:s s:s a:s α 0 α 1 α 2 s: ! a: ! y: ! e y 3, s e a 0, s 1, s 2, s e s s:a a:a y:a β 1 β 2 β 3 ! :s ! :a ! :s ! :a ! :s ! :a ! :s ! :a s:s a:s y:s 0, a 1, a 2, a 3, a s: ! a: ! y: ! s:s a:s y:s ! :s ! :s ! :s ! :s Figure: G = I ◦ F . Only a few states, but all arcs between them are shown. Our Work weights this transducer. Pushpendre Rastogi (CLSP, JHU) Representations . . . 12 / 18
Recommend
More recommend