Backpropagating through Structured Argmax using a SPIGOT Hao Peng, - PowerPoint PPT Presentation

Backpropagating through Structured Argmax using a SPIGOT Hao Peng, Sam Thomson, Noah A. Smith @ACL July 17, 2018

Overview Shareholders took their money Parser arg max Shareholders took their money Downstream task Loss L

Overview Shareholders took their money Parser arg max Head token Yang and Mitchell, 2017 Tree-RNN Shareholders took their money Tai et al., 2015 Graph CNN Kipf and Welling, 2017 … Downstream task Loss L

Overview Shareholders took their money Parser arg max Shareholders took their money A layer in the computation graph? Downstream task Loss L

Overview Shareholders took their money Parser Non-di ff erentiable arg max Shareholders took their money A layer in the computation graph? Downstream task Loss L

Overview Aim Shareholders took their money • Structured prediction as a layer. Intermediate parser θ Motivation arg max • Structures help. Shareholders took their money ? Ji and Smith, 2017; Oepen et al., 2017 r θ L Downstream task • Linguistic structures may not be Loss L universally optimal. Williams, 2017

Overview Aim Shareholders took their money • Structured prediction as a layer. Intermediate parser θ Motivation arg max • Structures help. Shareholders took their money ? Ji and Smith, 2017; Oepen et al., 2017 r θ L Downstream task • Linguistic structures may not be Loss L universally optimal. Williams, 2017 Challenges • argmax is non-di ff erentiable.

Overview Aim Shareholders took their money • Structured prediction as a layer. Intermediate parser θ Motivation arg max • Structures help. Shareholders took their money ? Ji and Smith, 2017; Oepen et al., 2017 r θ L Downstream task • Linguistic structures may not be Loss L A proxy universally optimal. Williams, 2017 Method Challenges S tructured P rediction I ntermediate G radients O ptimization T echnique • argmax is non-di ff erentiable. SPIGOT

Outline ❖ Background: structured prediction as linear programs ❖ Method: SPIGOT algorithm ❖ Experiments

Structured Prediction Reviewed Input Shareholders took their money Output Shareholders took their money

Structured Prediction Reviewed Input Shareholders took their money Score S θ ( ) Shareholders took their money = X s θ ( ) head mod arcs

Structured Prediction Reviewed Input Shareholders took their money Score ⇤ > ⇥ s θ = s θ ( ) , s θ ( ) , s θ ( ) , . . . , s θ ( ) their took took their took money their money ] > z = [ 1? , 0? , 1? , 0? . . . , Output z > s θ arg max s.t. z forms a tree ˆ z Shareholders took their money

Linear Programming Formulation ˆ Shareholders took their money z =   s θ ( ) their money s θ ( ) took their   arg max z >   s θ ( ) took money   .   s.t. z forms a tree .   .   s θ ( ) their took Az ≤ b Roth and Yih, 2004; Martins et al., 2009

Linear Programming Formulation ˆ Shareholders took their money z =   s θ ( ) their money s θ ( ) took their   arg max z >   s θ ( ) took money   .   s.t. z forms a tree .   .   s θ ( ) their took z i ∈ { 0 , 1 } relaxation z i ∈ [0 , 1] Az ≤ b Roth and Yih, 2004; Martins et al., 2009

Outline ❖ Background: structured prediction as linear programs ❖ Method: SPIGOT algorithm ❖ Experiments

Backprop   s θ ( ) their money s θ ( ) took their   arg max z > r θ L ˆ   s θ ( ) z = took money   .   s.t. z forms a tree .   .   s θ ( ) their took ˆ took Shareholders their money z Downstream task Loss L

Backprop   s θ ( ) their money s θ ( ) took their   arg max z > r θ L ˆ   s θ ( ) z = took money   .   s.t. z forms a tree .   .   s θ ( ) their took ˆ took Shareholders their money z z L r ˆ Downstream task Backprop Loss L

Backprop   s θ ( ) their money s θ ( ) took their   arg max z > r θ L ˆ Backprop   s θ ( ) z = took money   .   s.t. z forms a tree .   .   r s L s θ ( ) their took ˆ took Shareholders their money z z L r ˆ Downstream task Backprop Loss L

Backprop   s θ ( ) their money s θ ( ) took their   arg max z > r θ L ˆ Backprop   s θ ( ) z = took money   .   s.t. z forms a tree .   .   r s L s θ ( ) their took Proxy ˆ took Shareholders their money z z L r ˆ Downstream task Backprop Loss L

Backprop We have: r ˆ We need: r s L z L

Backprop We have: r ˆ We need: r s L z L Leibniz, 1676 “ ” r s L = J r ˆ z L

Backprop We have: r ˆ We need: r s L z L Leibniz, 1676 “ ” r s L = J r ˆ z L z = arg max z > s θ ˆ s.t. z forms a tree Jacobian not defined

Backprop We have: r ˆ We need: r s L z L Leibniz, 1676 “ ” r s L = J r ˆ z L Straight-through Estimator (STE) Hinton, 2012; Bengio et al., 2013 r s L , r ˆ z L

Some Geometry… Straight-through Estimator (STE): r s L , r ˆ z L Az ≤ b z = [1 , 0 , 1 , · · · , 0] > ˆ Shareholders took their money

Some Geometry… Straight-through Estimator (STE): r s L , r ˆ z L Az ≤ b z L = [ � 0 . 3 , 0 . 5 , 0 . 4 , . . . , 0 . 2] �r ˆ z = [1 , 0 , 1 , · · · , 0] > ˆ Shareholders took their money

Some Geometry… Straight-through Estimator (STE): r s L , r ˆ z L z L z � r ˆ p = ˆ Shareholders took their money Az ≤ b z L = [ � 0 . 3 , 0 . 5 , 0 . 4 , . . . , 0 . 2] �r ˆ z = [1 , 0 , 1 , · · · , 0] > ˆ Shareholders took their money

Some Geometry… SPIGOT z L z � r ˆ p = ˆ Shareholders took their money q Az ≤ b z L = [ � 0 . 3 , 0 . 5 , 0 . 4 , . . . , 0 . 2] �r ˆ z = [1 , 0 , 1 , · · · , 0] > ˆ Shareholders took their money

Some Geometry… SPIGOT z L z � r ˆ p = ˆ Shareholders took their money q Az ≤ b z L = [ � 0 . 3 , 0 . 5 , 0 . 4 , . . . , 0 . 2] �r ˆ �r s L z = [1 , 0 , 1 , · · · , 0] > ˆ Shareholders took their money p = ˆ z � r ˆ z L q = proj( p ) r s L , ˆ z � q

Some Geometry… SPIGOT z L z � r ˆ ˆ z L z � r ˆ ˆ �r s L �r s L ˆ ˆ z z

Algorithm Input took Shareholders their money   s θ ( ) their money s θ ( ) took their   ˆ arg max z > Parser θ z =   s θ ( ) took money   .   s.t. z forms a tree .   .   s θ ( ) their took ˆ Shareholders took their money z

Algorithm Input took Shareholders their money   s θ ( ) their money s θ ( ) took their   ˆ arg max z > Parser θ z =   s θ ( ) took money   .   s.t. z forms a tree .   .   s θ ( ) their took ˆ Shareholders took their money z Downstream task φ Loss L

Algorithm Input took Shareholders their money   s θ ( ) their money s θ ( ) took their   ˆ arg max z > Parser θ z =   s θ ( ) took money   .   s.t. z forms a tree .   .   s θ ( ) their took z L ˆ r ˆ Shareholders took their money z Downstream task φ Backprop Loss L

Algorithm Input took Shareholders their money   s θ ( ) their money s θ ( ) took their   ˆ arg max z > Parser θ z =   s θ ( ) took money p = ˆ z � r ˆ z L   .   r s L s.t. z forms a tree .   q = proj( p ) .   s θ ( ) r s L , ˆ their took z � q Project onto z L ˆ r ˆ Shareholders took their money z Downstream task φ Backprop Loss L

Algorithm Input took Shareholders their money r θ L   s θ ( ) their money s θ ( ) took their   ˆ Backprop arg max z > Parser θ z =   s θ ( ) took money p = ˆ z � r ˆ z L   .   r s L s.t. z forms a tree .   q = proj( p ) .   s θ ( ) r s L , ˆ their took z � q Project onto z L ˆ r ˆ Shareholders took their money z Downstream task φ Backprop Loss L

Connections to Related Work SPIGOT STE z � r ˆ z L z � r ˆ z L ˆ ˆ �r s L �r s L ˆ z Pipeline STE Structured Att. SPIGOT Hard decision on ˆ z Backprop Marginal Projection Structured Attention: Kim et al., 2017

Connections to Related Work SPIGOT Structured Attention z � r ˆ z L ˆ ˆ z = softmax( . . . ) �r s L z = arg max ( . . . ) ˆ Pipeline STE Structured Att. SPIGOT Hard decision on ˆ z Backprop Marginal Projection Structured Attention: Kim et al., 2017

Applications Joint learning Swayamdipta et al., 2016 Training data L 1 took Shareholders their money Parser θ r θ L 1 arg max Shareholders took their money

Applications Joint learning Swayamdipta et al., 2016 Training data L 1 took Shareholders their money Parser θ r θ L 1 arg max r θ L 2 Shareholders took their money r φ L 2 Downstream task φ Loss L 2

Backpropagating through Structured Argmax using a SPIGOT Hao Peng, - PowerPoint PPT Presentation

Backpropagating through Structured Argmax using a SPIGOT Hao Peng, Sam Thomson, Noah A. Smith @ACL July 17, 2018 Overview Shareholders took their money Parser arg max Shareholders took their money Downstream task Loss L Overview

A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE A STRUCTURED L IFE

Deep RL Robert Platt Northeastern University Q-learning Q-function Q action argmax state

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Run a Minecraft server using Spigot Justin W. Flory RITlug, 2016-2017 License : CC-BY-SA 4.0

The influence of an air gap on the response of an explosive to spigot impact . Bolaji Adesokan

Scaling Log-Structured KV-Stores featuring Monkey and Dostoevsky SIGMOD17 / SIGMOD18 Niv Dayan

Machine Learning Fall 2017 Structured Prediction (structured perceptron, HMM, structured SVM)

Structured Electronic Design Structured Electronic Design ET 8016 5 ECTS credits 1

L101: Introduction to Structured Prediction Ryan Cotterell What is structured prediction?

Semi-structured data Data is not just text, but is not as well- Semi-structured data

Introduction to SparkSQL Structured Data Processing in Spark 1 Structured Data Processing A

Variational Inference for Tutorial Outline Structured NLP Models 1. Structured Models and Factor

Structured Education Referral to structured education following diagnosis is recommended by

Structured Problem Solving Using Structured Problem Solving Using the Computer ITK 168 Fall,

Greater Manchester Cricket A Structured Approach A Structured Approach Introductions John

Structured Finance Department Who we are An excellent structured financier in the local market

Practical Semantic Parsing for Spoken Language Understanding NAACL 2019 Marco Damonte 1 , Rahul

Parsing Parsing involves: determining if a string belongs to a language, and

Neural Joint Model for Transition-based Chinese Syntactic Analysis Shuhei Kurita Daisuke Kawahara

Heterogeneous Logs Biplob Debnath and Will Dennis NEC Laboratories America, Inc .

Supporting Analysis of SQL Queries in PHP AiR David Anderson and Mark Hills (@hillsma on Twitter)

TF-TRT BEST PRACTICE, EAST AS AN EXAMPLE Xiaowei Wang ( ), Dec 18 th , 2019

Track Filtering/Quality/Merging A proposal for data format of track quality and track merging in

FireCite: Lightweight real-time reference string extraction from web pages Ching Hoi Andy Hong