Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing Daniel Fried and Dan Klein
Parsing by Local Decisions S VP NP NP nap . a The cat took (S (NP The cat ) (VP β¦ π π = log π π§ π¦; π) = ΰ· log π(π§ π’ |π§ 1:π’β1 , π¦; π) π’
Non-local Consequences Loss-Evaluation Mismatch S S π§ NP π§ ΰ· VP VP NP VP NP NP took a nap . took a nap . The cat The cat β(π§, ΰ· π§) : -F1 (π§, ΰ· π§) Exposure Bias True π§ (S (NP The cat β¦ Parse π§ ΰ· Prediction (S (NP (VP ?? [Ranzato et al. 2016; Wiseman and Rush 2016]
Dynamic Oracle Training Explore at training time. Supervise each state with an expert policy. π§ (S (NP The cat β¦ True Parse addresses Prediction π§ ΰ· The (S (NP (VP β¦ exposure (sample, or greedy) bias π§ β Oracle The The (NP cat addresses β π§ π’ choose to maximize β |ΰ· π π = ΰ· log π(π§ π’ π§ 1:π’β1 , π¦; π) loss achievable F1 (typically) mismatch π’ [Goldberg & Nivre 2012; Ballesteros et al. 2016; inter alia]
Dynamic Oracles Help! Expert Policies / Dynamic Oracles mostly Daume III et al., 2009; Ross et al., 2011; dependency Choi and Palmer, 2011; Goldberg and Nivre, 2012; parsing Chang et al., 2015; Ballesteros et al., 2016; Stern et al. 2017 PTB Constituency Parsing F1 Static Dynamic System Oracle Oracle Coavoux and CrabbΓ©, 2016 88.6 89.0 Cross and Huang, 2016 91.0 91.3 FernΓ‘ndez-GonzΓ‘lez and 91.5 91.7 GΓ³mez-RodrΓguez, 2018
What if we donβt have a dynamic oracle? Use reinforcement learning
Reinforcement Learning Helps! (in other tasks) machine translation Auli and Gao, 2014; Ranzato et al., 2016; Shen et al., 2016 Xu et al., 2016; Wiseman and Rush, 2016; Edunov et al. 2017 machine several, CCG translation including parsing dependency parsing
Policy Gradient Training Minimize expected sequence-level cost: π§ π§ ΰ· True Parse Prediction S S π(π) = ΰ· π ΰ· π§ π¦; π β(π§, ΰ· π§) NP VP VP NP NP ΰ· NP π§ NP idea. The man had an The man had an idea. β(π§, ΰ· π§) πΌπ π = ΰ· π ΰ· π§ π¦; π β π§, ΰ· π§ πΌ log π(ΰ· π§|π¦; π) ΰ· π§ addresses addresses compute in exposure bias loss the same way (compute by mismatch as for the sampling) (compute F1) true tree [Williams, 1992]
Policy Gradient Training πΌπ π = ΰ· π ΰ· π§ π¦; π β π§, ΰ· π§ πΌ log π(ΰ· π§|π¦; π) ΰ· π§ Input, π¦ The cat took a nap. S S-INV S S k candidates, ΰ· π§ NP VP ADJP VP VP NP NP NP NP NP NP NP NP took a nap . The cat took a nap . nap . nap . The cat The cat took a The cat took a β(π§, ΰ· π§) β89 β80 β80 β100 (negative F1) β β β β gradient πΌ log π(ΰ· π§ 1 |π¦; π) πΌ log π(ΰ· π§ 2 |π¦; π) πΌ log π(ΰ· π§ 3 |π¦; π) πΌ log π(π§|π¦; π) for candidate
Experiments
Setup Parsers Training Span-Based [Cross & Huang, 2016] Static oracle x Top-Down [Stern et al. 2016] Dynamic oracle RNNG [Dyer et al. 2016] Policy gradient In-Order [Liu and Zhang, 2017]
English PTB F1 93 Static oracle Policy gradient Dynamic oracle 92.5 92 91.5 91 90.5 90 Span-Based Top-Down RNNG-128 RNNG-256 In-Order
Training Efficiency PTB learning curves for the Top-Down parser 92 91.5 Development F1 91 90.5 90 static oracle dynamic oracle policy gradient 89.5 89 5 10 15 20 25 30 35 40 45 Training Epoch
French Treebank F1 84 Static oracle Policy gradient Dynamic oracle 83 82 81 80 Span-Based Top-Down RNNG-128 RNNG-256 In-Order
Chinese Penn Treebank v5.1 F1 88 Static oracle Policy gradient Dynamic oracle 87 86 85 84 83 Span-Based Top-Down RNNG-128 RNNG-256 In-Order
Conclusions β£ Local decisions can have non-local consequences β£ Loss mismatch β£ Exposure bias β£ How to deal with the issues caused by local decisions? β£ Dynamic oracles: efficient, model specific β£ Policy gradient: slower to train, but general purpose
Thank you!
For Comparison: A Novel Oracle for RNNG (S (NP The man ) (VP had β¦ 1. Close current constituent if itβs a true constituentβ¦ ) (S (NP The man β¦ or it could never be a true constituent. (S (VP ) (NP ) The man 2. Otherwise, open the outermost unopened true constituent at this position. (S (NP The man ) (VP 3. Otherwise, shift the next word. (S (NP The man had ) (VP
Recommend
More recommend