human ai collaboration for neural text generation with
play

Human-AI Collaboration for Neural Text Generation with - PowerPoint PPT Presentation

Human-AI Collaboration for Neural Text Generation with Interpretable Neural Networks Sebastian Gehrmann Thesis Defense Committee Members Barbara Grosz Sasha Rush Oct 18, 2019 Stuart Shieber This is Jesse, a journalist. Jesse


  1. Human-AI Collaboration 
 for Neural Text Generation 
 with Interpretable Neural Networks Sebastian Gehrmann 
 Thesis Defense Committee Members 
 Barbara Grosz Sasha Rush Oct 18, 2019 Stuart Shieber

  2. This is Jesse, a journalist. Jesse has a ton of work.

  3. Maybe AI can help reduce the workload? Introduce AI-reen, a text-generation model.

  4. Jesse could give some of the workload to AI-reen. Doing so, Jesse would give up her agency over that work.

  5. But AI-reen is biased and makes mistakes! Jesse still needs to provide oversight over its work.

  6. By collaborating with AI-reen, Jesse could gain 
 the benefits of automation without losing her agency.

  7. Problem Explain Suggestion Provide Feedback Update Suggestion t p e c c A Accepted solution

  8. They want to collaboratively summarize a document. Source

  9. Both have an idea how to summarize it.

  10. If AI-reen was human, it could communicate its reasoning. But its prediction are not interpretable . ???

  11. Even if it could explain its suggestion, 
 it can’t incorporate feedback from Jesse. I picked this phrase, because… I don’t like it.

  12. Interpretability is necessary, but we also need controllability . I picked this phrase, because… How about … instead?

  13. Let’s empower humans to collaborate with AI! Summarization [EMNLP ’18] 
 Data2Text [INLG ’18] ++ Section Title Generation [NAACL ’19] TL;DR Generation [INLG ’19] LSTMVis [InfoVis ’17] Phenotyping Saliency [PloS one, ’17] Seq2Seq-Vis [VAST ’18] Model Selection [DeepStruct ’19] Modeling Capacity [Formal Languages ’19] Collaborative Semantic Inference [VAST ’19] Detecting Fake Text with GLTR [ACL Demo ’19] Automated Mediation [Behavior & Technology ’19]

  14. Outline 1. Background: Sequence Modeling for NLP 2. Incorporating Content Selection into a Summarization Model 3. How to Understand Predictions? 4. Collaborating with the Model to Summarize

  15. p ( y t +1 | y 1 , …, y t ) The small ? dog ? owns ? a ? yellow ? ball ? ? . y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8

  16. The small dog owns a yellow ball . y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 [Elman ’90, Hochreiter & Schmidhuber ’97]

  17. The small dog owns a yellow ball . y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 [Elman ’90, Hochreiter & Schmidhuber ’97]

  18. p … large small child dog The small dog owns a yellow ball . y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 [Bengio ‘03]

  19. Source x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 p ( y t +1 | y 1 , …, y t ) The small dog owns a yellow ball . p ( y 3 | y 1 , y 2 , x ) Target p ( y t +1 | y 1 , …, y t , x ) p ( next word | Der kleine , The small dog... ) Der kleine Hund besitzt einen gelben Ball . y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8

  20. The small dog owns a yellow ball . Der kleine x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y 1 y 2 Encoder Decoder [Bahdanau et al. ’14, Sutskever et al. ’14]

  21. Attention p ( a t | x , y 1: t ) The small dog owns a yellow ball . Der kleine x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y 1 y 2 Encoder Decoder [Bahdanau et al. ’14, Sutskever et al. ’14]

  22. S Context ∑ a s t x s s =1 The small dog owns a yellow ball . Der kleine x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y 1 y 2 Encoder Decoder [Bahdanau et al. ’14, Sutskever et al. ’14]

  23. p … das Hund Kind große The small dog owns a yellow ball . Der kleine x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y 1 y 2 Encoder Decoder [Bahdanau et al. ’14, Sutskever et al. ’14]

  24. Consider an abstractive summarization problem, with Input x 1 , …, x S y 1 , …, y T Summary p ( y | x ) Train a summarizer to maximize . [ Gehrmann , Deng, and Rush, EMNLP ’18]

  25. Attention p ( a t | x , y 1: t ) p … dog The a ball The small dog owns a yellow ball . Dog owns x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y 1 y 2 Encoder Decoder [Vinyals et al. ’15, Filippova et al. ’15, Gu et al. ’16, See et al. ’17]

  26. z t The copy mechanism uses a binary soft switch 
 that determines whether the model copies or generates. p ( y t +1 | x , y 1: t ) = p ( | x , y 1: t ) × + p ( | x , y 1: t ) × p … das Hund Kind große

  27. z t The copy mechanism uses a binary soft switch 
 that determines whether the model copies or generates. σ ( Wh t + b ) Reusing p ( a t | x , y 1: t ) } } p ( y t +1 | x , y 1: t ) = p ( z t = 1 | x , y 1: t ) × p ( y t +1 | z t = 1, x , y 1: t ) + p ( z t = 0 | x , y 1: t ) × p ( y t +1 | z t = 0, x , y 1: t ) } } 1 − σ ( Wh t + b ) Standard model prediction

  28. Just because a model can copy, should it?

  29. Summarizer Copy Mechanism Text

  30. Summarizer Copy Mechanism Text

  31. Text Summarizer Copy Mechanism Copy Mechanism Text

  32. Abstractive summarizers over-extract. “Angela Merkel and her husband, chemistry professor Joachim Sauer, 
 are spotted on their annual easter trip 
 to the island of ischia, near Naples. ”

  33. 
 The model fails at content selection! Consider the content selection as 
 word-level extractive summarization . Let denote a binary indicator 
 t 1 , …, t S whether a source word is used in a summary. 
 p ( t | x ) Train a model to maximize .

  34. How to generate supervised data? The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

  35. The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

  36. The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

  37. The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

  38. The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

  39. The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

  40. The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Big dog chases small dog ’ s ball.

  41. t The small dog owns a large yellow ball. 
 The big dog from next door chases the ball. Content Selector Model based on ELMo

  42. Control copied content with Bottom-Up Attention by restricting what can be copied to important content. Content Selection Bottom-Up Attention Source Masked Source Summary

  43. Control copied content with Bottom-Up Attention by restricting what can be copied to important content. Let denote the selection probability from the content selector. q s ϵ Let denote an importance threshold. Modify the copy-attention such that t | x , y 1: t ) = { p ( a s t | x , y 1: t ) q s > ϵ a s p ( ˜ ow. 0

  44. Bottom-Up Attention p ( ˜ a t | x , y 1: t ) p … dog The a ball The small dog owns a yellow ball . Dog owns x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y 1 y 2 Encoder Decoder

  45. +2 ROUGE The improvements were consistent across two evaluated datasets.

  46. “Angela Merkel and her husband, chemistry professor Joachim Sauer, 
 Without Bottom-Up are spotted on their annual easter trip 
 to the island of ischia, near Naples. ” “Angela Merkel and her husband 
 With Bottom-Up are spotted on their easter trip. ”

  47. There is still work to be done…

  48. Summarization models struggle in real-world scenarios! How do we make the generation of a summary collaborative ?

  49. The Users of Interpretability and Collaboration Architect Trainer End User [Strobelt*, Gehrmann , et al,. InfoVis ’17]

  50. ̂ The Target of Interpretability and Collaboration y θ Model Decision [ Gehrmann* , Strobelt*, et al., VAST ’19]

  51. The Coupling of Model and Interface (c) Interactive Collaboration (a) Passive Obervation (b) Interactive Obervation x o x o ABCDEF (b) Interactive Obervation (c) Interactive Collaboration Passive Observation Interactive Observation Interactive Collaboration [ Gehrmann* , Strobelt*, et al., VAST ’19]

  52. (a) Passive Obervation θ (b) Interactive Obervation [Wongsuphasawat et al,. VAST ’17]

  53. ̂ y (b) Interactive Obervation x o [Strobelt*, Gehrmann* , et al,. VAST ’18] (c) Interactive Collaboration

Recommend


More recommend