Human-AI Collaboration for Neural Text Generation with Interpretable Neural Networks Sebastian Gehrmann Thesis Defense Committee Members Barbara Grosz Sasha Rush Oct 18, 2019 Stuart Shieber
This is Jesse, a journalist. Jesse has a ton of work.
Maybe AI can help reduce the workload? Introduce AI-reen, a text-generation model.
Jesse could give some of the workload to AI-reen. Doing so, Jesse would give up her agency over that work.
But AI-reen is biased and makes mistakes! Jesse still needs to provide oversight over its work.
By collaborating with AI-reen, Jesse could gain the benefits of automation without losing her agency.
Problem Explain Suggestion Provide Feedback Update Suggestion t p e c c A Accepted solution
They want to collaboratively summarize a document. Source
Both have an idea how to summarize it.
If AI-reen was human, it could communicate its reasoning. But its prediction are not interpretable . ???
Even if it could explain its suggestion, it can’t incorporate feedback from Jesse. I picked this phrase, because… I don’t like it.
Interpretability is necessary, but we also need controllability . I picked this phrase, because… How about … instead?
Let’s empower humans to collaborate with AI! Summarization [EMNLP ’18] Data2Text [INLG ’18] ++ Section Title Generation [NAACL ’19] TL;DR Generation [INLG ’19] LSTMVis [InfoVis ’17] Phenotyping Saliency [PloS one, ’17] Seq2Seq-Vis [VAST ’18] Model Selection [DeepStruct ’19] Modeling Capacity [Formal Languages ’19] Collaborative Semantic Inference [VAST ’19] Detecting Fake Text with GLTR [ACL Demo ’19] Automated Mediation [Behavior & Technology ’19]
Outline 1. Background: Sequence Modeling for NLP 2. Incorporating Content Selection into a Summarization Model 3. How to Understand Predictions? 4. Collaborating with the Model to Summarize
p ( y t +1 | y 1 , …, y t ) The small ? dog ? owns ? a ? yellow ? ball ? ? . y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8
The small dog owns a yellow ball . y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 [Elman ’90, Hochreiter & Schmidhuber ’97]
The small dog owns a yellow ball . y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 [Elman ’90, Hochreiter & Schmidhuber ’97]
p … large small child dog The small dog owns a yellow ball . y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8 [Bengio ‘03]
Source x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 p ( y t +1 | y 1 , …, y t ) The small dog owns a yellow ball . p ( y 3 | y 1 , y 2 , x ) Target p ( y t +1 | y 1 , …, y t , x ) p ( next word | Der kleine , The small dog... ) Der kleine Hund besitzt einen gelben Ball . y 1 y 2 y 3 y 4 y 5 y 6 y 7 y 8
The small dog owns a yellow ball . Der kleine x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y 1 y 2 Encoder Decoder [Bahdanau et al. ’14, Sutskever et al. ’14]
Attention p ( a t | x , y 1: t ) The small dog owns a yellow ball . Der kleine x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y 1 y 2 Encoder Decoder [Bahdanau et al. ’14, Sutskever et al. ’14]
S Context ∑ a s t x s s =1 The small dog owns a yellow ball . Der kleine x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y 1 y 2 Encoder Decoder [Bahdanau et al. ’14, Sutskever et al. ’14]
p … das Hund Kind große The small dog owns a yellow ball . Der kleine x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y 1 y 2 Encoder Decoder [Bahdanau et al. ’14, Sutskever et al. ’14]
Consider an abstractive summarization problem, with Input x 1 , …, x S y 1 , …, y T Summary p ( y | x ) Train a summarizer to maximize . [ Gehrmann , Deng, and Rush, EMNLP ’18]
Attention p ( a t | x , y 1: t ) p … dog The a ball The small dog owns a yellow ball . Dog owns x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y 1 y 2 Encoder Decoder [Vinyals et al. ’15, Filippova et al. ’15, Gu et al. ’16, See et al. ’17]
z t The copy mechanism uses a binary soft switch that determines whether the model copies or generates. p ( y t +1 | x , y 1: t ) = p ( | x , y 1: t ) × + p ( | x , y 1: t ) × p … das Hund Kind große
z t The copy mechanism uses a binary soft switch that determines whether the model copies or generates. σ ( Wh t + b ) Reusing p ( a t | x , y 1: t ) } } p ( y t +1 | x , y 1: t ) = p ( z t = 1 | x , y 1: t ) × p ( y t +1 | z t = 1, x , y 1: t ) + p ( z t = 0 | x , y 1: t ) × p ( y t +1 | z t = 0, x , y 1: t ) } } 1 − σ ( Wh t + b ) Standard model prediction
Just because a model can copy, should it?
Summarizer Copy Mechanism Text
Summarizer Copy Mechanism Text
Text Summarizer Copy Mechanism Copy Mechanism Text
Abstractive summarizers over-extract. “Angela Merkel and her husband, chemistry professor Joachim Sauer, are spotted on their annual easter trip to the island of ischia, near Naples. ”
The model fails at content selection! Consider the content selection as word-level extractive summarization . Let denote a binary indicator t 1 , …, t S whether a source word is used in a summary. p ( t | x ) Train a model to maximize .
How to generate supervised data? The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
The small dog owns a large yellow ball. The big dog from next door chases the ball. Big dog chases small dog ’ s ball.
t The small dog owns a large yellow ball. The big dog from next door chases the ball. Content Selector Model based on ELMo
Control copied content with Bottom-Up Attention by restricting what can be copied to important content. Content Selection Bottom-Up Attention Source Masked Source Summary
Control copied content with Bottom-Up Attention by restricting what can be copied to important content. Let denote the selection probability from the content selector. q s ϵ Let denote an importance threshold. Modify the copy-attention such that t | x , y 1: t ) = { p ( a s t | x , y 1: t ) q s > ϵ a s p ( ˜ ow. 0
Bottom-Up Attention p ( ˜ a t | x , y 1: t ) p … dog The a ball The small dog owns a yellow ball . Dog owns x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 y 1 y 2 Encoder Decoder
+2 ROUGE The improvements were consistent across two evaluated datasets.
“Angela Merkel and her husband, chemistry professor Joachim Sauer, Without Bottom-Up are spotted on their annual easter trip to the island of ischia, near Naples. ” “Angela Merkel and her husband With Bottom-Up are spotted on their easter trip. ”
There is still work to be done…
Summarization models struggle in real-world scenarios! How do we make the generation of a summary collaborative ?
The Users of Interpretability and Collaboration Architect Trainer End User [Strobelt*, Gehrmann , et al,. InfoVis ’17]
̂ The Target of Interpretability and Collaboration y θ Model Decision [ Gehrmann* , Strobelt*, et al., VAST ’19]
The Coupling of Model and Interface (c) Interactive Collaboration (a) Passive Obervation (b) Interactive Obervation x o x o ABCDEF (b) Interactive Obervation (c) Interactive Collaboration Passive Observation Interactive Observation Interactive Collaboration [ Gehrmann* , Strobelt*, et al., VAST ’19]
(a) Passive Obervation θ (b) Interactive Obervation [Wongsuphasawat et al,. VAST ’17]
̂ y (b) Interactive Obervation x o [Strobelt*, Gehrmann* , et al,. VAST ’18] (c) Interactive Collaboration
Recommend
More recommend