Algorithms for NLP CS 11-711 · Fall 2020 Lecture 14: Graph-based dependency parsing Emma Strubell
Announcements ■ No recitation on Friday (Tartan Community Day). 2
Dependency parsing 3
Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn s1 Dependency Parser Relations s2 Oracle ... Stack sn 3
Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... Stack sn 3
Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... ■ Examples: MaltParser [Nivre et al. 2008], Stack Stack LSTM [Dyer et al. 2015] sn 3
Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... ■ Examples: MaltParser [Nivre et al. 2008], Stack Stack LSTM [Dyer et al. 2015] sn ■ Graph-based dependency parsing: 4 4 12 5 8 root Book that flight 6 7 7 5 3
Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... ■ Examples: MaltParser [Nivre et al. 2008], Stack Stack LSTM [Dyer et al. 2015] sn ■ Graph-based dependency parsing: ■ Given scores for every pair of words, find 4 4 the (globally) highest scoring set of edges. 12 5 8 root Book that flight 6 7 7 5 3
Dependency parsing Input buffer ■ Transition-based (shift-reduce) parsing: w1 w2 wn ■ Greedy choice of local transitions guided by a good classifier. s1 Dependency Parser Relations s2 Oracle ... ■ Examples: MaltParser [Nivre et al. 2008], Stack Stack LSTM [Dyer et al. 2015] sn ■ Graph-based dependency parsing: ■ Given scores for every pair of words, find 4 4 the (globally) highest scoring set of edges. 12 5 8 ■ Examples: MSTParser [McDonald et al. root Book that flight 6 7 2005], TurboParser [Martins et al. 2009], 7 Deep Biaffine [Dozat et al. 2017] 5 3
Graph-based dependency parsing 4 4 12 5 8 root Book that flight 6 7 7 5 4
Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: 4 4 12 5 8 root Book that flight 6 7 7 5 4
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit> Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: ■ Score of a tree decomposes as sum of edge scores: r X Ψ ( y , w ; θ ) = ψ ( i → j , w , θ ) − r → j ∈ y − i 4 4 12 5 8 root Book that flight 6 7 7 5 4
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit> Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: ■ Score of a tree decomposes as sum of edge scores: r X Ψ ( y , w ; θ ) = ψ ( i → j , w , θ ) − r → j ∈ y − i 4 ■ Start with a fully-connected directed graph 4 12 5 8 root Book that flight 6 7 7 5 4
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit> Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: ■ Score of a tree decomposes as sum of edge scores: r X Ψ ( y , w ; θ ) = ψ ( i → j , w , θ ) − r → j ∈ y − i 4 ■ Start with a fully-connected directed graph 4 12 ■ How to infer the highest scoring tree? 5 8 root Book that flight 6 7 7 5 4
<latexit sha1_base64="B8n3543ays3HiDAwfo7VpnAkHG8=">AD9XicfVJLixNBEO4kPtb42KwevbSGhVCSFZBQYRFPXgRI5jdhXQIPT01kzb9GLp78mCYn+JNvPpbPpLvNozSZadzWLD9HxUfVX1VUHieDW9Xp/avXGjZu3bu/dad69d/Bfuvg4anVqWEwZFpocx5QC4IrGDruBJwnBqgMBJwFs/eF/2wOxnKtvrpVAmNJY8UjzqjzpklrRgaWH5FAZqu8g4v/In+DiZuCo8/wW0xsKicZJ0vD46mjxuhFZnL8DROu8DosJ4lPsUvZputs01a7V63Vx68C/ob0EabM5gc1A0JNUslKMcEtXbU7yVunFHjOBOQN0lqIaFsRmMYeaioBDvOyqnk+NBbQhxp4z/lcGm9HJFRae1KBp4pqZvaq7CeJ1vlLro9TjKkdKLYuFKUCO42LEeOQG2BOrDygzHCvFbMpNZQ5v4jm4eUyUxBzcNVGmBxnNiqrVyQFMq8GL9eNokBQumpaQqfJ6RiEouViFENBUuz4iNtvi6eXCOU/sZnQXKQU4ov1CuaJCQORIcVXN5bpJeTfJB/ALMvDJq/6cgKFOG6+EmljSZe4XFpMnpID/Y3J1wfSw2lZWCvDNFHPRCagsLyET2gIJYqPTpCJ4J74U6hPQyK9hzYdq2JrhX2n/6pvcBafH3f6L7vGXl+2Td5v3uoceo6foCPXRK3SCPqIBGiKGfqO/NVSrNRaN740fjZ9rar2iXmEKqfx6x+lY1wO</latexit> Graph-based dependency parsing ■ Edge-factored (or arc-factored ) approaches: ■ Score of a tree decomposes as sum of edge scores: r X Ψ ( y , w ; θ ) = ψ ( i → j , w , θ ) − r → j ∈ y − i 4 ■ Start with a fully-connected directed graph 4 12 ■ How to infer the highest scoring tree? 5 8 root Book that flight 6 7 ■ Find a maximum directed spanning tree : 7 5 Chu and Liu (1965) and Edmonds (1967) algorithm 4
Chu-Liu-Edmonds algorithm function M AX S PANNING T REE ( G=(V,E) , root , score ) returns spanning tree F ← [] T’ ← [] score’ ← [] for each v ∈ V do bestInEdge ← argmax e =( u , v ) ∈ E score[e] F ← F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e] ← score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C ← a cycle in F G’ ← C ONTRACT ( G , C ) T’ ← M AX S PANNING T REE ( G’ , root , score’ ) T ← E XPAND ( T’ , C ) return T function C ONTRACT ( G , C ) returns contracted graph function E XPAND ( T , C ) returns expanded graph 5
Chu-Liu-Edmonds algorithm function M AX S PANNING T REE ( G=(V,E) , root , score ) returns spanning tree F ← [] T’ ← [] score’ ← [] for each v ∈ V do select best incoming edge for each node bestInEdge ← argmax e =( u , v ) ∈ E score[e] F ← F ∪ bestInEdge for each e=(u,v) ∈ E do score’[e] ← score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C ← a cycle in F G’ ← C ONTRACT ( G , C ) T’ ← M AX S PANNING T REE ( G’ , root , score’ ) T ← E XPAND ( T’ , C ) return T function C ONTRACT ( G , C ) returns contracted graph function E XPAND ( T , C ) returns expanded graph 5
Chu-Liu-Edmonds algorithm function M AX S PANNING T REE ( G=(V,E) , root , score ) returns spanning tree F ← [] T’ ← [] score’ ← [] for each v ∈ V do select best incoming edge for each node bestInEdge ← argmax e =( u , v ) ∈ E score[e] F ← F ∪ bestInEdge for each e=(u,v) ∈ E do subtract its score from all incoming edges score’[e] ← score[e] − score[bestInEdge] if T=(V,F) is a spanning tree then return it else C ← a cycle in F G’ ← C ONTRACT ( G , C ) T’ ← M AX S PANNING T REE ( G’ , root , score’ ) T ← E XPAND ( T’ , C ) return T function C ONTRACT ( G , C ) returns contracted graph function E XPAND ( T , C ) returns expanded graph 5
Recommend
More recommend