Latent Structures for Coreference Resolution Sebastian Martschat and Michael Strube Heidelberg Institute for Theoretical Studies gGmbH 1 / 25
Coreference Resolution Coreference resolution is the task of determining which mentions in a text refer to the same entity. 2 / 25
An Example Vicente del Bosque admits it will be difficult for him to select David de Gea in Spain’s squad if the goalkeeper remains on the sidelines at Manchester United. de Gea’s long-anticipated transfer to Real Madrid fell through on Monday due to miscommunication between the Spanish club and United and he will stay at Old Trafford until at least January. 3 / 25
An Example Vicente del Bosque admits it will be difficult for him to select David de Gea in Spain’s squad if the goalkeeper remains on the sidelines at Manchester United . de Gea’s long-anticipated transfer to Real Madrid fell through on Monday due to miscommunication between the Spanish club and United and he will stay at Old Trafford until at least January. 3 / 25
Outline Motivation Structures for Coreference Resolution Experiments and Analysis Conclusions and Future Work 4 / 25
Outline Motivation Structures for Coreference Resolution Experiments and Analysis Conclusions and Future Work 5 / 25
General Paradigm Vicente del Bosque admits it will be difficult for him to select David de Gea in Spain’s squad if the goalkeeper remains on the sidelines at Manchester United . Consolidate pairwise decisions for anaphor-antecedent pairs de Gea’s long-anticipated transfer to Real Madrid fell through on Monday due to miscommunication between the Spanish club and United and he will stay at Old Trafford until at least January. 6 / 25
General Paradigm Vicente del Bosque admits it will be difficult for him to select David de Gea in Spain’s squad if the goalkeeper remains on the sidelines at Manchester United . Consolidate pairwise decisions for anaphor-antecedent pairs de Gea’s long-anticipated transfer to Real Madrid fell through on Monday due to miscommunication between the Spanish club and United and he will stay at Old Trafford until at least January. 6 / 25
Mention Pairs Vicente del Bosque admits it will be difficult for him to select − m 6 m 7 David de Gea in Spain’s squad if the goalkeeper remains on − m 5 m 7 the sidelines at Manchester United . + m 4 m 7 de Gea’s long-anticipated − m 3 m 7 transfer to Real Madrid fell through on Monday due to − m 2 m 7 miscommunication between the Spanish club and United and − m 1 m 7 he will stay at Old Trafford until at least January. 7 / 25
Mention Ranking Vicente del Bosque admits it will be difficult for him to select m 6 David de Gea in Spain’s squad if the goalkeeper remains on m 5 the sidelines at Manchester United . m 4 m 7 de Gea’s long-anticipated m 3 transfer to Real Madrid fell through on Monday due to m 2 miscommunication between the Spanish club and United and m 1 he will stay at Old Trafford until at least January. 8 / 25
Antecedent Trees Vicente del Bosque admits it m 4 m 1 will be difficult for him to select David de Gea in Spain’s squad m 7 m 10 if the goalkeeper remains on m 3 the sidelines at Manchester United . m 9 m 17 m 12 de Gea’s long-anticipated transfer to Real Madrid fell m 16 through on Monday due to m 15 miscommunication between the Spanish club and United and he will stay at Old Trafford until m 2 m 5 m 6 m 19 ... at least January. 9 / 25
Unifying Approaches • approaches operate on structures not annotated in training data • we can view these structures as latent structures 10 / 25
Unifying Approaches • approaches operate on structures not annotated in training data • we can view these structures as latent structures → devise unified representation of approaches in terms of these structures 10 / 25
Outline Motivation Structures for Coreference Resolution Experiments and Analysis Conclusions and Future Work 11 / 25
Final Goal Learn a mapping f : X → H×Z 12 / 25
Final Goal Learn a mapping f : X → H×Z • x ∈ X : structured input • documents containing mentions and linguistic information 12 / 25
Final Goal Learn a mapping f : X → H×Z • h ∈ H : document-level latent structure we actually predict • mention pairs, antecedent trees, ... • employ graph-based latent structures 12 / 25
Final Goal Learn a mapping f : X → H×Z Latent structures: subclass of directed labeled graphs G = ( V , A , L ) 12 / 25
Final Goal Learn a mapping f : X → H×Z Latent structures: subclass of directed labeled graphs G = ( V , A , L ) m 0 m 1 m 2 m 3 Nodes V : mentions plus dummy mention m 0 for anaphoricity detection 12 / 25
Final Goal Learn a mapping f : X → H×Z Latent structures: subclass of directed labeled graphs G = ( V , A , L ) + + m 0 m 1 m 2 m 3 + Arcs A : subset of all backward arcs 12 / 25
Final Goal Learn a mapping f : X → H×Z Latent structures: subclass of directed labeled graphs G = ( V , A , L ) + + m 0 m 1 m 2 m 3 + Labels L : labels for arcs 12 / 25
Final Goal Learn a mapping f : X → H×Z Latent structures: subclass of directed labeled graphs G = ( V , A , L ) + + m 0 m 1 m 2 m 3 + Graph can be split into substructures which are handled individually 12 / 25
Final Goal Learn a mapping f : X → H×Z • z ∈ Z : mapping of mentions to entity identifiers • inferred via latent h ∈ H 12 / 25
Linear Models Employ an edge-factored linear model: 13 / 25
Linear Models Employ an edge-factored linear model: f ( x ) = argmax ( h , z ) ∈ H x × Z x ∑ � θ , φ ( x , a , z ) � a ∈ h 13 / 25
Linear Models Employ an edge-factored linear model: f ( x ) = argmax ( h , z ) ∈ H x × Z x ∑ � θ , φ ( x , a , z ) � a ∈ h m 3 m 3 m 3 m 0 m 1 m 2 m 0 m 1 m 2 m 0 m 1 m 2 13 / 25
Linear Models Employ an edge-factored linear model: f ( x ) = argmax ( h , z ) ∈ H x × Z x ∑ � θ , φ ( x , a , z ) � a ∈ h m 3 m 3 m 3 sentDist=2 anaType=PRO m 0 m 1 m 2 m 0 m 1 m 2 m 0 m 1 m 2 13 / 25
Linear Models Employ an edge-factored linear model: f ( x ) = argmax ( h , z ) ∈ H x × Z x ∑ � θ , φ ( x , a , z ) � a ∈ h m 3 m 3 m 3 -0.5 0.8 m 0 m 1 m 2 m 0 m 1 m 2 m 0 m 1 m 2 13 / 25
Linear Models Employ an edge-factored linear model: f ( x ) = argmax ( h , z ) ∈ H x × Z x ∑ � θ , φ ( x , a , z ) � a ∈ h m 3 0 m 3 3.7 m 3 10 m 0 m 1 m 2 m 0 m 1 m 2 m 0 m 1 m 2 13 / 25
Linear Models Employ an edge-factored linear model: f ( x ) = argmax ( h , z ) ∈ H x × Z x ∑ � θ , φ ( x , a , z ) � a ∈ h m 3 0 m 3 3.7 m 3 10 m 0 m 1 m 2 m 0 m 1 m 2 m 0 m 1 m 2 13 / 25
Learning: Perceptron Input: Training set D , cost function c , number of epochs n function P ERCEPTRON ( D , c , n ) Set θ = ( 0 ,..., 0 ) for epoch = 1 ,..., n do for ( x , z ) ∈ D do ˆ h opt = argmax � θ , φ ( x , h , z ) � h ∈ H x , z (ˆ � θ , φ ( x , h , z ) � + c ( x , h , ˆ h , ˆ z ) = h opt , z ) argmax ( h , z ) ∈ H x × Z x if ˆ h does not encode z then Set θ = θ + φ ( x , ˆ h opt , z ) − φ ( x , ˆ h , ˆ z ) Output: Weight vector θ 14 / 25
Learning: Perceptron Input: Training set D , cost function c , number of epochs n function P ERCEPTRON ( D , c , n ) Set θ = ( 0 ,..., 0 ) for epoch = 1 ,..., n do for ( x , z ) ∈ D do ˆ h opt = argmax � θ , φ ( x , h , z ) � h ∈ H x , z (ˆ � θ , φ ( x , h , z ) � + c ( x , h , ˆ h , ˆ z ) = h opt , z ) argmax ( h , z ) ∈ H x × Z x if ˆ h does not encode z then Set θ = θ + φ ( x , ˆ h opt , z ) − φ ( x , ˆ h , ˆ z ) Output: Weight vector θ 14 / 25
Learning: Perceptron Input: Training set D , cost function c , number of epochs n function P ERCEPTRON ( D , c , n ) Set θ = ( 0 ,..., 0 ) for epoch = 1 ,..., n do for ( x , z ) ∈ D do ˆ h opt = argmax � θ , φ ( x , h , z ) � h ∈ H x , z (ˆ � θ , φ ( x , h , z ) � + c ( x , h , ˆ h , ˆ z ) = h opt , z ) argmax ( h , z ) ∈ H x × Z x if ˆ h does not encode z then Set θ = θ + φ ( x , ˆ h opt , z ) − φ ( x , ˆ h , ˆ z ) Output: Weight vector θ 14 / 25
Learning: Perceptron Input: Training set D , cost function c , number of epochs n function P ERCEPTRON ( D , c , n ) Set θ = ( 0 ,..., 0 ) for epoch = 1 ,..., n do for ( x , z ) ∈ D do ˆ h opt = argmax � θ , φ ( x , h , z ) � h ∈ H x , z (ˆ � θ , φ ( x , h , z ) � + c ( x , h , ˆ h , ˆ z ) = h opt , z ) argmax ( h , z ) ∈ H x × Z x if ˆ h does not encode z then Set θ = θ + φ ( x , ˆ h opt , z ) − φ ( x , ˆ h , ˆ z ) Output: Weight vector θ 14 / 25
Learning: Perceptron Input: Training set D , cost function c , number of epochs n function P ERCEPTRON ( D , c , n ) Set θ = ( 0 ,..., 0 ) for epoch = 1 ,..., n do for ( x , z ) ∈ D do ˆ h opt = argmax � θ , φ ( x , h , z ) � h ∈ H x , z (ˆ � θ , φ ( x , h , z ) � + c ( x , h , ˆ h , ˆ z ) = h opt , z ) argmax ( h , z ) ∈ H x × Z x if ˆ h does not encode z then Set θ = θ + φ ( x , ˆ h opt , z ) − φ ( x , ˆ h , ˆ z ) Output: Weight vector θ 14 / 25
Recommend
More recommend