policy gradient as a proxy for dynamic oracles in
play

Policy Gradient as a Proxy for Dynamic Oracles in Constituency - PowerPoint PPT Presentation

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing Daniel Fried and Dan Klein Parsing by Local Decisions S VP NP NP nap . a The cat took (S (NP The cat ) (VP = log ; ) =


  1. Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing Daniel Fried and Dan Klein

  2. Parsing by Local Decisions S VP NP NP nap . a The cat took (S (NP The cat ) (VP … 𝑀 πœ„ = log π‘ž 𝑧 𝑦; πœ„) = ෍ log π‘ž(𝑧 𝑒 |𝑧 1:π‘’βˆ’1 , 𝑦; πœ„) 𝑒

  3. Non-local Consequences Loss-Evaluation Mismatch S S 𝑧 NP 𝑧 ො VP VP NP VP NP NP took a nap . took a nap . The cat The cat βˆ†(𝑧, ො 𝑧) : -F1 (𝑧, ො 𝑧) Exposure Bias True 𝑧 (S (NP The cat … Parse 𝑧 ො Prediction (S (NP (VP ?? [Ranzato et al. 2016; Wiseman and Rush 2016]

  4. Dynamic Oracle Training Explore at training time. Supervise each state with an expert policy. 𝑧 (S (NP The cat … True Parse addresses Prediction 𝑧 ො The (S (NP (VP … exposure (sample, or greedy) bias 𝑧 βˆ— Oracle The The (NP cat addresses βˆ— 𝑧 𝑒 choose to maximize βˆ— |ො 𝑀 πœ„ = ෍ log π‘ž(𝑧 𝑒 𝑧 1:π‘’βˆ’1 , 𝑦; πœ„) loss achievable F1 (typically) mismatch 𝑒 [Goldberg & Nivre 2012; Ballesteros et al. 2016; inter alia]

  5. Dynamic Oracles Help! Expert Policies / Dynamic Oracles mostly Daume III et al., 2009; Ross et al., 2011; dependency Choi and Palmer, 2011; Goldberg and Nivre, 2012; parsing Chang et al., 2015; Ballesteros et al., 2016; Stern et al. 2017 PTB Constituency Parsing F1 Static Dynamic System Oracle Oracle Coavoux and CrabbΓ©, 2016 88.6 89.0 Cross and Huang, 2016 91.0 91.3 FernΓ‘ndez-GonzΓ‘lez and 91.5 91.7 GΓ³mez-RodrΓ­guez, 2018

  6. What if we don’t have a dynamic oracle? Use reinforcement learning

  7. Reinforcement Learning Helps! (in other tasks) machine translation Auli and Gao, 2014; Ranzato et al., 2016; Shen et al., 2016 Xu et al., 2016; Wiseman and Rush, 2016; Edunov et al. 2017 machine several, CCG translation including parsing dependency parsing

  8. Policy Gradient Training Minimize expected sequence-level cost: 𝑧 𝑧 ො True Parse Prediction S S 𝑆(πœ„) = ෍ π‘ž ො 𝑧 𝑦; πœ„ βˆ†(𝑧, ො 𝑧) NP VP VP NP NP ො NP 𝑧 NP idea. The man had an The man had an idea. βˆ†(𝑧, ො 𝑧) 𝛼𝑆 πœ„ = ෍ π‘ž ො 𝑧 𝑦; πœ„ βˆ† 𝑧, ො 𝑧 𝛼 log π‘ž(ො 𝑧|𝑦; πœ„) ො 𝑧 addresses addresses compute in exposure bias loss the same way (compute by mismatch as for the sampling) (compute F1) true tree [Williams, 1992]

  9. Policy Gradient Training 𝛼𝑆 πœ„ = ෍ π‘ž ො 𝑧 𝑦; πœ„ βˆ† 𝑧, ො 𝑧 𝛼 log π‘ž(ො 𝑧|𝑦; πœ„) ො 𝑧 Input, 𝑦 The cat took a nap. S S-INV S S k candidates, ො 𝑧 NP VP ADJP VP VP NP NP NP NP NP NP NP NP took a nap . The cat took a nap . nap . nap . The cat The cat took a The cat took a βˆ†(𝑧, ො 𝑧) βˆ’89 βˆ’80 βˆ’80 βˆ’100 (negative F1) βˆ— βˆ— βˆ— βˆ— gradient 𝛼 log π‘ž(ො 𝑧 1 |𝑦; πœ„) 𝛼 log π‘ž(ො 𝑧 2 |𝑦; πœ„) 𝛼 log π‘ž(ො 𝑧 3 |𝑦; πœ„) 𝛼 log π‘ž(𝑧|𝑦; πœ„) for candidate

  10. Experiments

  11. Setup Parsers Training Span-Based [Cross & Huang, 2016] Static oracle x Top-Down [Stern et al. 2016] Dynamic oracle RNNG [Dyer et al. 2016] Policy gradient In-Order [Liu and Zhang, 2017]

  12. English PTB F1 93 Static oracle Policy gradient Dynamic oracle 92.5 92 91.5 91 90.5 90 Span-Based Top-Down RNNG-128 RNNG-256 In-Order

  13. Training Efficiency PTB learning curves for the Top-Down parser 92 91.5 Development F1 91 90.5 90 static oracle dynamic oracle policy gradient 89.5 89 5 10 15 20 25 30 35 40 45 Training Epoch

  14. French Treebank F1 84 Static oracle Policy gradient Dynamic oracle 83 82 81 80 Span-Based Top-Down RNNG-128 RNNG-256 In-Order

  15. Chinese Penn Treebank v5.1 F1 88 Static oracle Policy gradient Dynamic oracle 87 86 85 84 83 Span-Based Top-Down RNNG-128 RNNG-256 In-Order

  16. Conclusions β€£ Local decisions can have non-local consequences β€£ Loss mismatch β€£ Exposure bias β€£ How to deal with the issues caused by local decisions? β€£ Dynamic oracles: efficient, model specific β€£ Policy gradient: slower to train, but general purpose

  17. Thank you!

  18. For Comparison: A Novel Oracle for RNNG (S (NP The man ) (VP had … 1. Close current constituent if it’s a true constituent… ) (S (NP The man … or it could never be a true constituent. (S (VP ) (NP ) The man 2. Otherwise, open the outermost unopened true constituent at this position. (S (NP The man ) (VP 3. Otherwise, shift the next word. (S (NP The man had ) (VP

Recommend


More recommend