Entity- & Topic-Based Information Ordering Ling 573 Systems - PowerPoint PPT Presentation

Entity- & Topic-Based Information Ordering Ling 573 Systems and Applications May 5, 2016

Roadmap  Entity-based cohesion model:  Model entity based transitions  Topic-based cohesion model:  Models sequence of topic transitions  Ordering as optimization

Entity-Centric Cohesion  Continuing to talk about same thing(s) lends cohesion to discourse  Incorporated variously in discourse models  Lexical chains: Link mentions across sentences  Fewer lexical chains crossing à shift in topic  Salience hierarchies, information structure  Subject > Object > Indirect > Oblique > ….  Centering model of coreference  Combines grammatical role preference with  Preference for types of reference/focus transitions

Entity-Based Ordering  Idea:  Leverage patterns of entity (re)mentions  Intuition:  Captures local relations b/t sentences, entities  Models cohesion of evolving story  Pros:  Largely delexicalized  Less sensitive to domain/topic than other models  Can exploit state-of-the-art syntax, coreference tools

Entity Grid  Need compact representation of:  Mentions, grammatical roles, transitions  Across sentences  Entity grid model:  Rows: sentences  Columns: entities  Values: grammatical role of mention in sentence  Roles: (S)ubject, (O)bject, X (other), __ (no mention)  Multiple mentions: ? Take highest

Grids à Features  Intuitions:  Some columns dense: focus of text (e.g. MS)  Likely to take certain roles: e.g. S, O  Others sparse: likely other roles (x)  Local transitions reflect structure, topic shifts  Local entity transitions: {s,o,x,_} n  Continuous column subsequences (role n-grams?)  Compute probability of sequence over grid:  # occurrences of that type/# of occurrences of that len

Vector Representation  Document vector:  Length: # of transition types  Values: Probabilities of each transition type  Can vary by transition types:  E.g. most frequent; all transitions of some length, etc

Dependencies & Comparisons  Tools needed:  Coreference: Link mentions  Full automatic coref system vs  Noun clusters based on lexical match  Grammatical role:  Extraction based on dependency parse (+passive rule) vs  Simple present vs absent (X, _)  Salience:  Distinguish focused vs not:? By frequency  Build different transition models by saliency group

Experiments & Analysis  Trained SVM:  Salient: >= 2 occurrences; Transition length: 2  Train/Test: Is higher manual score set higher by system?  Feature comparison: DUC summaries

Discussion  Best results:  Use richer syntax and salience models  But NOT coreference (though not significant)  Why? Automatic summaries in training, unreliable coref  Worst results:  Significantly worse with both simple syntax, no salience  Extracted sentences still parse reliably  Still not horrible: 74% vs 84%  Much better than LSA model (52.5%)  Learning curve shows 80-100 pairs good enough

State-of-the-Art Comparisons  Two comparison systems:  Latent Semantic Analysis (LSA)  Barzilay & Lee (2004)

Comparison I  LSA model:  Motivation: Lexical gaps  Pure surface word match misses similarity  Discover underlying concept representation  Based on distributional patterns  Create term x document matrix over large news corpus  Perform SVD to create 100-dimensional dense matrix  Score summary as:  Sentence represented as mean of its word vectors  Average of cosine similarity scores of adjacent sents  Local “concept” similarity score

“Catching the Drift”  Barzilay and Lee, 2004 (NAACL best paper)  Intuition:  Stories:  Composed of topics/subtopics  Unfold in systematic sequential way  Can represent ordering as sequence modeling over topics  Approach: HMM over topics

Strategy  Lightly supervised approach:  Learn topics in unsupervised way from data  Assign sentences to topics  Learn sequences from document structure  Given clusters, learn sequence model over them  No explicit topic labeling, no hand-labeling of sequence

Topic Induction  How can we induce a set of topics from doc set?  Assume we have multiple documents in a domain  Unsupervised approach:? Clustering  Similarity measure?  Cosine similarity over word bigrams  Assume some irrelevant/off-topic sentences  Merge clusters with few members into “etcetera” cluster  Result: m topics, defined by clusters

Sequence Modeling  Hidden Markov Model  States = Topics  State m: special insertion state  Transition probabilities:  Evidence for ordering?  Document ordering  Sentence from topic a appears before sentence from topic b p ( s j | s i ) = D ( c i , c j ) + δ 2 D ( c i ) + δ 2 m

Sequence Modeling II  Emission probabilities:  Standard topic state:  Probability of observation given state (topic)  Probability of sentence under topic-specific bigram LM  Bigram probabilities p s i ( w ' | w ) = f c i ( ww ') + δ 1 f c i ( w ) + | V |  Etcetera state:  Forced complementary to other states 1 − max i : i < m p s i ( w ' | w ) p s m = ∑ (1 − max i : i < m p s i ( u | w )) u ∈ V

Sequence Modeling III  Viterbi re-estimation:  Intuition: Refine clusters, etc based on sequence info  Iterate:  Run Viterbi decoding over original documents  Assign each sentence to cluster most likely to generate it  Use new clustering to recompute transition/emission  Until stable (or fixed iterations)

Sentence Ordering Comparison  Restricted domain text:  Separate collections of earthquake, aviation accidents  LSA predictions: which order has higher score  Topic/content model: highest probability under HMM

Summary Coherence Scoring Comparison  Domain independent:  Too little data per domain to estimate topic-content model  Train: 144 pairwise summary rankings  Test: 80 pairwise summary rankings  Entity grid model (best): 83.8%  LSA model: 52.5%  Likely issue:  Bad auto summaries highly repetitive è  High inter-sentence similarity

Entity- & Topic-Based Information Ordering Ling 573 Systems - PowerPoint PPT Presentation

Entity- & Topic-Based Information Ordering Ling 573 Systems and Applications May 5, 2016 Roadmap Entity-based cohesion model: Model entity based transitions Topic-based cohesion model: Models sequence of topic

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

Entity- & Topic-Based Information Ordering Ling 573 Systems and Applications May 7, 2015

Information Ordering Ling573 Systems & Applications April 20, 2017 Roadmap

Information Ordering Ling573 Systems & Applications May 2, 2017 Roadmap Information

Information Ordering Ling 573 Systems and Applications May 5, 2015 Roadmap Ordering

Information Ordering Ling 573 Systems and Applications May 3, 2016 Roadmap Ordering

CS5412: HOW MUCH ORDERING? Lecture XVI Ken Birman Ordering 2 The key to consistency turns

Variable & Value Ordering Heuristics Heuristics for backtracking algorithms Variable

UNIT TOPICS TOPIC 1: MINERALS TOPIC 2: IGNEOUS ROCKS TOPIC 3: SEDIMENTARY ROCKS

TOPIC #X: TOPIC NAME DATE, 2020 PRESENTATION OUTLINE Main topic #1 Main topic #2 Main

COMP31212: Concurrency Topic 5.3: Liveness and Topic 5.4 Fairness Topic 5.3: Liveness Properties

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

http://ceds.ed.gov CEDS Data Model The CEDS Data Model Process Domain Normalized CEDS Entity

GOVERNANCE for Victorian Croquet Clubs ENTITY TYPES LEGAL ENTITY TYPES Unincorporated

Entity Matching for Semistructured Data in the Cloud Marcus Paradies IBM F2CE Workshop December

Entity-Relationship Modelling 5DV119 Introduction to Database Management Ume a University

Active Manifolds: A Geometric Approach to Dimension Reduction for Sensitivity Analysis Anthony

Model Validation, Sensitivity Analysis and Policy Analysis Jayendran Venkateswaran Model

Inter-procedural CAT Simone Campanoni simonec@eecs.northwestern.edu Procedures/functions void

Classes and Structs in C++ Based on materials by Bjarne Stroustrup www.stroustrup.com/Programming

Recent advances in fluid boundary layer theory Anne-Laure Dalibard (Sorbonne Universit e,

M Introduction F Research methods Results x Laminar to turbulent transition in separated

OptimizationoverZonotopes andTrainingSupportVectorMachines MarshallBern

Entity- & Topic-Based Information Ordering Ling 573 Systems - PowerPoint PPT Presentation

Entity- & Topic-Based Information Ordering Ling 573 Systems and Applications May 5, 2016 Roadmap Entity-based cohesion model: Model entity based transitions Topic-based cohesion model: Models sequence of topic

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

Entity- &amp; Topic-Based Information Ordering Ling 573 Systems and Applications May 7, 2015

Information Ordering Ling573 Systems &amp; Applications April 20, 2017 Roadmap

Information Ordering Ling573 Systems &amp; Applications May 2, 2017 Roadmap Information

Information Ordering Ling 573 Systems and Applications May 5, 2015 Roadmap Ordering

Information Ordering Ling 573 Systems and Applications May 3, 2016 Roadmap Ordering

CS5412: HOW MUCH ORDERING? Lecture XVI Ken Birman Ordering 2 The key to consistency turns

Variable &amp; Value Ordering Heuristics Heuristics for backtracking algorithms Variable

UNIT TOPICS TOPIC 1: MINERALS TOPIC 2: IGNEOUS ROCKS TOPIC 3: SEDIMENTARY ROCKS

TOPIC #X: TOPIC NAME DATE, 2020 PRESENTATION OUTLINE Main topic #1 Main topic #2 Main

COMP31212: Concurrency Topic 5.3: Liveness and Topic 5.4 Fairness Topic 5.3: Liveness Properties

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

http://ceds.ed.gov CEDS Data Model The CEDS Data Model Process Domain Normalized CEDS Entity

GOVERNANCE for Victorian Croquet Clubs ENTITY TYPES LEGAL ENTITY TYPES Unincorporated

Entity Matching for Semistructured Data in the Cloud Marcus Paradies IBM F2CE Workshop December

Entity-Relationship Modelling 5DV119 Introduction to Database Management Ume a University

Active Manifolds: A Geometric Approach to Dimension Reduction for Sensitivity Analysis Anthony

Model Validation, Sensitivity Analysis and Policy Analysis Jayendran Venkateswaran Model

Inter-procedural CAT Simone Campanoni simonec@eecs.northwestern.edu Procedures/functions void

Classes and Structs in C++ Based on materials by Bjarne Stroustrup www.stroustrup.com/Programming

Recent advances in fluid boundary layer theory Anne-Laure Dalibard (Sorbonne Universit e,

M Introduction F Research methods Results x Laminar to turbulent transition in separated

OptimizationoverZonotopes andTrainingSupportVectorMachines MarshallBern

Entity- & Topic-Based Information Ordering Ling 573 Systems and Applications May 7, 2015

Information Ordering Ling573 Systems & Applications April 20, 2017 Roadmap

Information Ordering Ling573 Systems & Applications May 2, 2017 Roadmap Information

Variable & Value Ordering Heuristics Heuristics for backtracking algorithms Variable