backpropagating through structured argmax using a spigot
play

Backpropagating through Structured Argmax using a SPIGOT Hao Peng, - PowerPoint PPT Presentation

Backpropagating through Structured Argmax using a SPIGOT Hao Peng, Sam Thomson, Noah A. Smith @ACL July 17, 2018 Overview Shareholders took their money Parser arg max Shareholders took their money Downstream task Loss L Overview


  1. Backpropagating through Structured Argmax using a SPIGOT Hao Peng, Sam Thomson, Noah A. Smith @ACL July 17, 2018

  2. Overview Shareholders took their money Parser arg max Shareholders took their money Downstream task Loss L

  3. Overview Shareholders took their money Parser arg max Head token Yang and Mitchell, 2017 Tree-RNN Shareholders took their money Tai et al., 2015 Graph CNN Kipf and Welling, 2017 … Downstream task Loss L

  4. Overview Shareholders took their money Parser arg max Shareholders took their money A layer in the computation graph? Downstream task Loss L

  5. Overview Shareholders took their money Parser Non-di ff erentiable arg max Shareholders took their money A layer in the computation graph? Downstream task Loss L

  6. Overview Aim Shareholders took their money • Structured prediction as a layer. Intermediate parser θ Motivation arg max • Structures help. Shareholders took their money ? Ji and Smith, 2017; Oepen et al., 2017 r θ L Downstream task • Linguistic structures may not be Loss L universally optimal. Williams, 2017

  7. Overview Aim Shareholders took their money • Structured prediction as a layer. Intermediate parser θ Motivation arg max • Structures help. Shareholders took their money ? Ji and Smith, 2017; Oepen et al., 2017 r θ L Downstream task • Linguistic structures may not be Loss L universally optimal. Williams, 2017 Challenges • argmax is non-di ff erentiable.

  8. Overview Aim Shareholders took their money • Structured prediction as a layer. Intermediate parser θ Motivation arg max • Structures help. Shareholders took their money ? Ji and Smith, 2017; Oepen et al., 2017 r θ L Downstream task • Linguistic structures may not be Loss L A proxy universally optimal. Williams, 2017 Method Challenges S tructured P rediction I ntermediate G radients O ptimization T echnique • argmax is non-di ff erentiable. SPIGOT

  9. Outline ❖ Background: structured prediction as linear programs ❖ Method: SPIGOT algorithm ❖ Experiments

  10. Structured Prediction Reviewed Input Shareholders took their money Output Shareholders took their money

  11. Structured Prediction Reviewed Input Shareholders took their money Score S θ ( ) Shareholders took their money = X s θ ( ) head mod arcs

  12. Structured Prediction Reviewed Input Shareholders took their money Score ⇤ > ⇥ s θ = s θ ( ) , s θ ( ) , s θ ( ) , . . . , s θ ( ) their took took their took money their money ] > z = [ 1? , 0? , 1? , 0? . . . , Output z > s θ arg max s.t. z forms a tree ˆ z Shareholders took their money

  13. Linear Programming Formulation ˆ Shareholders took their money z =   s θ ( ) their money s θ ( ) took their   arg max z >   s θ ( ) took money   .   s.t. z forms a tree .   .   s θ ( ) their took Az ≤ b Roth and Yih, 2004; Martins et al., 2009

  14. Linear Programming Formulation ˆ Shareholders took their money z =   s θ ( ) their money s θ ( ) took their   arg max z >   s θ ( ) took money   .   s.t. z forms a tree .   .   s θ ( ) their took z i ∈ { 0 , 1 } relaxation z i ∈ [0 , 1] Az ≤ b Roth and Yih, 2004; Martins et al., 2009

  15. Outline ❖ Background: structured prediction as linear programs ❖ Method: SPIGOT algorithm ❖ Experiments

  16. Backprop   s θ ( ) their money s θ ( ) took their   arg max z > r θ L ˆ   s θ ( ) z = took money   .   s.t. z forms a tree .   .   s θ ( ) their took ˆ took Shareholders their money z Downstream task Loss L

  17. Backprop   s θ ( ) their money s θ ( ) took their   arg max z > r θ L ˆ   s θ ( ) z = took money   .   s.t. z forms a tree .   .   s θ ( ) their took ˆ took Shareholders their money z z L r ˆ Downstream task Backprop Loss L

  18. Backprop   s θ ( ) their money s θ ( ) took their   arg max z > r θ L ˆ Backprop   s θ ( ) z = took money   .   s.t. z forms a tree .   .   r s L s θ ( ) their took ˆ took Shareholders their money z z L r ˆ Downstream task Backprop Loss L

  19. Backprop   s θ ( ) their money s θ ( ) took their   arg max z > r θ L ˆ Backprop   s θ ( ) z = took money   .   s.t. z forms a tree .   .   r s L s θ ( ) their took Proxy ˆ took Shareholders their money z z L r ˆ Downstream task Backprop Loss L

  20. Backprop We have: r ˆ We need: r s L z L

  21. Backprop We have: r ˆ We need: r s L z L Leibniz, 1676 “ ” r s L = J r ˆ z L

  22. Backprop We have: r ˆ We need: r s L z L Leibniz, 1676 “ ” r s L = J r ˆ z L z = arg max z > s θ ˆ s.t. z forms a tree Jacobian not defined

  23. Backprop We have: r ˆ We need: r s L z L Leibniz, 1676 “ ” r s L = J r ˆ z L Straight-through Estimator (STE) Hinton, 2012; Bengio et al., 2013 r s L , r ˆ z L

  24. Some Geometry… Straight-through Estimator (STE): r s L , r ˆ z L Az ≤ b z = [1 , 0 , 1 , · · · , 0] > ˆ Shareholders took their money

  25. Some Geometry… Straight-through Estimator (STE): r s L , r ˆ z L Az ≤ b z L = [ � 0 . 3 , 0 . 5 , 0 . 4 , . . . , 0 . 2] �r ˆ z = [1 , 0 , 1 , · · · , 0] > ˆ Shareholders took their money

  26. Some Geometry… Straight-through Estimator (STE): r s L , r ˆ z L z L z � r ˆ p = ˆ Shareholders took their money Az ≤ b z L = [ � 0 . 3 , 0 . 5 , 0 . 4 , . . . , 0 . 2] �r ˆ z = [1 , 0 , 1 , · · · , 0] > ˆ Shareholders took their money

  27. Some Geometry… SPIGOT z L z � r ˆ p = ˆ Shareholders took their money q Az ≤ b z L = [ � 0 . 3 , 0 . 5 , 0 . 4 , . . . , 0 . 2] �r ˆ z = [1 , 0 , 1 , · · · , 0] > ˆ Shareholders took their money

  28. Some Geometry… SPIGOT z L z � r ˆ p = ˆ Shareholders took their money q Az ≤ b z L = [ � 0 . 3 , 0 . 5 , 0 . 4 , . . . , 0 . 2] �r ˆ �r s L z = [1 , 0 , 1 , · · · , 0] > ˆ Shareholders took their money p = ˆ z � r ˆ z L q = proj( p ) r s L , ˆ z � q

  29. Some Geometry… SPIGOT z L z � r ˆ ˆ z L z � r ˆ ˆ �r s L �r s L ˆ ˆ z z

  30. Algorithm Input took Shareholders their money   s θ ( ) their money s θ ( ) took their   ˆ arg max z > Parser θ z =   s θ ( ) took money   .   s.t. z forms a tree .   .   s θ ( ) their took ˆ Shareholders took their money z

  31. Algorithm Input took Shareholders their money   s θ ( ) their money s θ ( ) took their   ˆ arg max z > Parser θ z =   s θ ( ) took money   .   s.t. z forms a tree .   .   s θ ( ) their took ˆ Shareholders took their money z Downstream task φ Loss L

  32. Algorithm Input took Shareholders their money   s θ ( ) their money s θ ( ) took their   ˆ arg max z > Parser θ z =   s θ ( ) took money   .   s.t. z forms a tree .   .   s θ ( ) their took z L ˆ r ˆ Shareholders took their money z Downstream task φ Backprop Loss L

  33. Algorithm Input took Shareholders their money   s θ ( ) their money s θ ( ) took their   ˆ arg max z > Parser θ z =   s θ ( ) took money p = ˆ z � r ˆ z L   .   r s L s.t. z forms a tree .   q = proj( p ) .   s θ ( ) r s L , ˆ their took z � q Project onto z L ˆ r ˆ Shareholders took their money z Downstream task φ Backprop Loss L

  34. Algorithm Input took Shareholders their money r θ L   s θ ( ) their money s θ ( ) took their   ˆ Backprop arg max z > Parser θ z =   s θ ( ) took money p = ˆ z � r ˆ z L   .   r s L s.t. z forms a tree .   q = proj( p ) .   s θ ( ) r s L , ˆ their took z � q Project onto z L ˆ r ˆ Shareholders took their money z Downstream task φ Backprop Loss L

  35. Connections to Related Work SPIGOT STE z � r ˆ z L z � r ˆ z L ˆ ˆ �r s L �r s L ˆ z Pipeline STE Structured Att. SPIGOT Hard decision on ˆ z Backprop Marginal Projection Structured Attention: Kim et al., 2017

  36. Connections to Related Work SPIGOT Structured Attention z � r ˆ z L ˆ ˆ z = softmax( . . . ) �r s L z = arg max ( . . . ) ˆ Pipeline STE Structured Att. SPIGOT Hard decision on ˆ z Backprop Marginal Projection Structured Attention: Kim et al., 2017

  37. Applications Joint learning Swayamdipta et al., 2016 Training data L 1 took Shareholders their money Parser θ r θ L 1 arg max Shareholders took their money

  38. Applications Joint learning Swayamdipta et al., 2016 Training data L 1 took Shareholders their money Parser θ r θ L 1 arg max r θ L 2 Shareholders took their money r φ L 2 Downstream task φ Loss L 2

Recommend


More recommend