Parallelizable StackLSTM Shuoyang Ding Philipp Koehn NAACL 2019 - PowerPoint PPT Presentation

Parallelizable StackLSTM Shuoyang Ding Philipp Koehn NAACL 2019 Structured Prediction Workshop Minneapolis, MN, United States June 7th, 2019

Outline • What is StackLSTM? • Parallelization Problem • Homogenizing Computation • Experiments Parallelizable StackLSTM 2

What is StackLSTM?

A Partial Tree Parallelizable StackLSTM 4

Good Edge? Parallelizable StackLSTM 5

Good Edge? Parallelizable StackLSTM 6

LSTM? Parallelizable StackLSTM 7

:( Parallelizable StackLSTM 8

StackLSTM • An LSTM whose states are stored in a stack • Computation is conditioned on the stack operation Dyer et al. (2015) Ballesteros et al. (2017) Parallelizable StackLSTM 9

StackLSTM Parallelizable StackLSTM 10

Push , Parallelizable StackLSTM 11

Pop Parallelizable StackLSTM 12

Push 61 Parallelizable StackLSTM 13

Push years Parallelizable StackLSTM 14

Push old Parallelizable StackLSTM 15

Push , Parallelizable StackLSTM 19

Push will Parallelizable StackLSTM 21

Push join Parallelizable StackLSTM 22

:) Parallelizable StackLSTM 23

Parallelization Problem

LSTM Parallelizable StackLSTM 25

LSTM Parallelizable StackLSTM 26

Batched LSTM Parallelizable StackLSTM 27

Batched… StackLSTM? Parallelizable StackLSTM 28

:( Parallelizable StackLSTM 29

Wouldn’t it be nice if… Parallelizable StackLSTM 30

Homogenizing Computation

Push • read the stack top hidden state h_{p(t)}; • perform LSTM forward computation with x(t) and h_{p(t)}; • write new hidden state to h_{p(t) + 1}; • update stack top pointer p(t+1) = p(t) + 1; Parallelizable StackLSTM 32

Pop • update stack top pointer p(t+1) = p(t) - 1; Parallelizable StackLSTM 37

Pop • update stack top pointer p(t+1) = p(t) - 1; Parallelizable StackLSTM 38

Observation 1 • read the stack top hidden state h_{p(t)}; • perform LSTM forward computation with x(t) • update stack top pointer and h_{p(t)}; p(t+1) = p(t) - 1; • write new hidden state to h_{p(t) + 1}; • update stack top pointer p(t+1) = p(t) + 1; Parallelizable StackLSTM 39

Observation 1 • read the stack top Use op = +1 for push and hidden state h_{p(t)}; op = -1 for pop • perform LSTM forward computation with x(t) • update stack top pointer and h_{p(t)}; p(t+1) = p(t) + op ; • write new hidden state to h_{p(t) + 1}; • update stack top pointer p(t+1) = p(t) + op ; Parallelizable StackLSTM 40

Observation 1 The computation performed for Pop operation is a subset of Push operation. Parallelizable StackLSTM 41

Observation 2 Is it safe to do the other computations for push for pop as well? Parallelizable StackLSTM 42

Observation 2 • read the stack top hidden state h_{p(t)}; • perform LSTM forward computation with x(t) • update stack top pointer and h_{p(t)}; p(t+1) = p(t) + op; • write new hidden state to h_{p(t) + 1}; • update stack top pointer p(t+1) = p(t) + op; Parallelizable StackLSTM 43

Observation 2 A write will always happen before the stack top pointer advances. Parallelizable StackLSTM 45

Observation 2 If one wants to write anything in the higher position than the current stack top pointer… Parallelizable StackLSTM 46

Observation 2 If one wants to write anything in the higher position than the current stack top pointer… Just do it! Parallelizable StackLSTM 47

Observation 2 • read the stack top • read the stack top hidden state h_{p(t)}; hidden state h_{p(t)}; • perform LSTM forward • perform LSTM forward computation with x(t) computation with x(t) and h_{p(t)}; and h_{p(t)}; • write new hidden state • write new hidden state to h_{p(t) + 1}; to h_{p(t) + 1}; • update stack top pointer • update stack top pointer p(t+1) = p(t) + op; p(t+1) = p(t) + op; Parallelizable StackLSTM 49

Done! • read the stack top hidden state h_{p(t)}; • perform LSTM forward computation with x(t) and h_{p(t)}; • write new hidden state to h_{p(t) + 1}; • update stack top pointer p(t+1) = p(t) + op; Parallelizable StackLSTM 50

Experiments

Benchmark Transition-based dependency parsing on Stanford Dependency Treebank PyTorch, Single K80 GPU Parallelizable StackLSTM 52

Hyperparameters • Largely following Dyer et al. (2015); Ballesteros et al. (2017), except: • Adam w/ ReduceLROnPlateau and warmup • Arc-Hybrid w/o composition function • Slightly larger models (200 hidden, 200 state, 48 action embedding) perform better Parallelizable StackLSTM 53

Speed Parallelizable StackLSTM 54

Speed Parallelizable StackLSTM 55

Performance Ours Ballesteros 2017 93 92.5 92 91.5 91 8 16 32 64 128 256 batch size Parallelizable StackLSTM 56

Conclusion

Conclusion • We propose a parallelization scheme for StackLSTM architecture. • Together with a different optimizer, we are able to train parsers of comparable performance within 1 hour. paper code slides https://github.com/shuoyangd/hoolock Parallelizable StackLSTM 58

Parallelizable StackLSTM Shuoyang Ding Philipp Koehn NAACL 2019 - PowerPoint PPT Presentation

Parallelizable StackLSTM Shuoyang Ding Philipp Koehn NAACL 2019 Structured Prediction Workshop Minneapolis, MN, United States June 7th, 2019 Outline What is StackLSTM? Parallelization Problem Homogenizing Computation

ROUND COMPLEXITY LOWER BOUND OF ISC PROTOCOL IN THE PARALLELIZABLE MODEL Huijing Gong CMSC 858F

Transformer MT vs. Human translation 2 [https://www.eff.org/ai/metrics#Translation] Get rid of

Mapreduce With Parallelizable Reduce S. Muthu Muthukrishnan Some Premises

PAEQ: Parallelizable Permutation-based Authenticated Encryption Alex Biryukov and Dmitry

Finding Rare Concurrent Programming Bugs An Automatic , Symbolic , Randomized , and Parallelizable

COBRA: A Parallelizable Authenticated Online Cipher without Block Cipher Inverse 1 Atul Luykx

IMPROVING GPU UTILIZATION WITH MULTI-PROCESS SERVICE (MPS) PRIYANKA, COMPUTE DEVTECH, NVIDIA

Testing Shock Absorbers: How This Problem Is . . . Towards a Faster Need for Parallelization

CAESAR candidate Marble Jian Guo DIAC 24 August 2014 @Santa Barbara, CA, USA Design Goals

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Authenticated Encryption Kazuhiko Minematsu NEC Corporation Joint work with Akiko Inoue Asian

Wayne County 4-H Achievement Record Workshop Presented by Doug Foxx Adapted from Erin Deel

Welcome to the Extraordinary General Meeting of H & M Hennes & Mauritz AB 20 October

Geographic Distribution of Committed Affordable Units Overview of Task Force Analysis and

Dock H Replacement Built in 1987 (32 years old) Pillar Point Marina Dock H Replacement

Jerome-Anthony Avenue RFP Pre-submission Conference January 17, 2020 Agenda 1. Important Dates

H-1B Visa Presented by Sok-Khieng (So-Can) Lim of Davies Pearson, P.C. Tacoma, WA Summary of

Maintenance Division Line H Compliance Presented by: Steven G. Raymer Director, Maintenance

2nd & H Parking Deck Transformation FEBRUARY 2017 AGENDA I. Team Background 3 II.

Enhancement Department of Administrative Services State Purchasing Division doas.ga.gov GPM

A FAQ on tech, jobs, and wages Andrew McAfee, MIT amcafee@mit.edu

2021-22 harvest specifications and management measures: check-in Number of new proposals (not

LUOV Ward Beullens, Bart Preneel, Alan Szepieniec, Frederik Vercauteren 1 / 10 Overview

Hannans Reward Ltd Minerals Exploration Western Australia Gold Nickel Iron

Sambuz

Useful Links

Newsletter

Mail Us

Parallelizable StackLSTM Shuoyang Ding Philipp Koehn NAACL 2019 - PowerPoint PPT Presentation

Parallelizable StackLSTM Shuoyang Ding Philipp Koehn NAACL 2019 Structured Prediction Workshop Minneapolis, MN, United States June 7th, 2019 Outline What is StackLSTM? Parallelization Problem Homogenizing Computation

ROUND COMPLEXITY LOWER BOUND OF ISC PROTOCOL IN THE PARALLELIZABLE MODEL Huijing Gong CMSC 858F

Transformer MT vs. Human translation 2 [https://www.eff.org/ai/metrics#Translation] Get rid of

Mapreduce With Parallelizable Reduce S. Muthu Muthukrishnan Some Premises

PAEQ: Parallelizable Permutation-based Authenticated Encryption Alex Biryukov and Dmitry

Finding Rare Concurrent Programming Bugs An Automatic , Symbolic , Randomized , and Parallelizable

COBRA: A Parallelizable Authenticated Online Cipher without Block Cipher Inverse 1 Atul Luykx

IMPROVING GPU UTILIZATION WITH MULTI-PROCESS SERVICE (MPS) PRIYANKA, COMPUTE DEVTECH, NVIDIA

Testing Shock Absorbers: How This Problem Is . . . Towards a Faster Need for Parallelization

CAESAR candidate Marble Jian Guo DIAC 24 August 2014 @Santa Barbara, CA, USA Design Goals

MapReduce Andrew Crotty Alex Galakatos What is MapReduce? MapReduce is a framework for:

Authenticated Encryption Kazuhiko Minematsu NEC Corporation Joint work with Akiko Inoue Asian

Wayne County 4-H Achievement Record Workshop Presented by Doug Foxx Adapted from Erin Deel

Welcome to the Extraordinary General Meeting of H &amp; M Hennes &amp; Mauritz AB 20 October

Geographic Distribution of Committed Affordable Units Overview of Task Force Analysis and

Dock H Replacement Built in 1987 (32 years old) Pillar Point Marina Dock H Replacement

Jerome-Anthony Avenue RFP Pre-submission Conference January 17, 2020 Agenda 1. Important Dates

H-1B Visa Presented by Sok-Khieng (So-Can) Lim of Davies Pearson, P.C. Tacoma, WA Summary of

Maintenance Division Line H Compliance Presented by: Steven G. Raymer Director, Maintenance

2nd &amp; H Parking Deck Transformation FEBRUARY 2017 AGENDA I. Team Background 3 II.

Enhancement Department of Administrative Services State Purchasing Division doas.ga.gov GPM

A FAQ on tech, jobs, and wages Andrew McAfee, MIT amcafee@mit.edu

2021-22 harvest specifications and management measures: check-in Number of new proposals (not

LUOV Ward Beullens, Bart Preneel, Alan Szepieniec, Frederik Vercauteren 1 / 10 Overview

Hannans Reward Ltd Minerals Exploration Western Australia Gold Nickel Iron

Sambuz

Useful Links

Newsletter

Mail Us

Welcome to the Extraordinary General Meeting of H & M Hennes & Mauritz AB 20 October

2nd & H Parking Deck Transformation FEBRUARY 2017 AGENDA I. Team Background 3 II.