entity topic based information ordering
play

Entity- & Topic-Based Information Ordering Ling 573 Systems - PowerPoint PPT Presentation

Entity- & Topic-Based Information Ordering Ling 573 Systems and Applications May 5, 2016 Roadmap Entity-based cohesion model: Model entity based transitions Topic-based cohesion model: Models sequence of topic


  1. Entity- & Topic-Based Information Ordering Ling 573 Systems and Applications May 5, 2016

  2. Roadmap — Entity-based cohesion model: — Model entity based transitions — Topic-based cohesion model: — Models sequence of topic transitions — Ordering as optimization

  3. Entity-Centric Cohesion — Continuing to talk about same thing(s) lends cohesion to discourse — Incorporated variously in discourse models — Lexical chains: Link mentions across sentences — Fewer lexical chains crossing à shift in topic — Salience hierarchies, information structure — Subject > Object > Indirect > Oblique > …. — Centering model of coreference — Combines grammatical role preference with — Preference for types of reference/focus transitions

  4. Entity-Based Ordering — Idea: — Leverage patterns of entity (re)mentions — Intuition: — Captures local relations b/t sentences, entities — Models cohesion of evolving story — Pros: — Largely delexicalized — Less sensitive to domain/topic than other models — Can exploit state-of-the-art syntax, coreference tools

  5. Entity Grid — Need compact representation of: — Mentions, grammatical roles, transitions — Across sentences — Entity grid model: — Rows: sentences — Columns: entities — Values: grammatical role of mention in sentence — Roles: (S)ubject, (O)bject, X (other), __ (no mention) — Multiple mentions: ? Take highest

  6. Grids à Features — Intuitions: — Some columns dense: focus of text (e.g. MS) — Likely to take certain roles: e.g. S, O — Others sparse: likely other roles (x) — Local transitions reflect structure, topic shifts — Local entity transitions: {s,o,x,_} n — Continuous column subsequences (role n-grams?) — Compute probability of sequence over grid: — # occurrences of that type/# of occurrences of that len

  7. Vector Representation — Document vector: — Length: # of transition types — Values: Probabilities of each transition type — Can vary by transition types: — E.g. most frequent; all transitions of some length, etc

  8. Dependencies & Comparisons — Tools needed: — Coreference: Link mentions — Full automatic coref system vs — Noun clusters based on lexical match — Grammatical role: — Extraction based on dependency parse (+passive rule) vs — Simple present vs absent (X, _) — Salience: — Distinguish focused vs not:? By frequency — Build different transition models by saliency group

  9. Experiments & Analysis — Trained SVM: — Salient: >= 2 occurrences; Transition length: 2 — Train/Test: Is higher manual score set higher by system? — Feature comparison: DUC summaries

  10. Discussion — Best results: — Use richer syntax and salience models — But NOT coreference (though not significant) — Why? Automatic summaries in training, unreliable coref — Worst results: — Significantly worse with both simple syntax, no salience — Extracted sentences still parse reliably — Still not horrible: 74% vs 84% — Much better than LSA model (52.5%) — Learning curve shows 80-100 pairs good enough

  11. State-of-the-Art Comparisons — Two comparison systems: — Latent Semantic Analysis (LSA) — Barzilay & Lee (2004)

  12. Comparison I — LSA model: — Motivation: Lexical gaps — Pure surface word match misses similarity — Discover underlying concept representation — Based on distributional patterns — Create term x document matrix over large news corpus — Perform SVD to create 100-dimensional dense matrix — Score summary as: — Sentence represented as mean of its word vectors — Average of cosine similarity scores of adjacent sents — Local “concept” similarity score

  13. “Catching the Drift” — Barzilay and Lee, 2004 (NAACL best paper) — Intuition: — Stories: — Composed of topics/subtopics — Unfold in systematic sequential way — Can represent ordering as sequence modeling over topics — Approach: HMM over topics

  14. Strategy — Lightly supervised approach: — Learn topics in unsupervised way from data — Assign sentences to topics — Learn sequences from document structure — Given clusters, learn sequence model over them — No explicit topic labeling, no hand-labeling of sequence

  15. Topic Induction — How can we induce a set of topics from doc set? — Assume we have multiple documents in a domain — Unsupervised approach:? Clustering — Similarity measure? — Cosine similarity over word bigrams — Assume some irrelevant/off-topic sentences — Merge clusters with few members into “etcetera” cluster — Result: m topics, defined by clusters

  16. Sequence Modeling — Hidden Markov Model — States = Topics — State m: special insertion state — Transition probabilities: — Evidence for ordering? — Document ordering — Sentence from topic a appears before sentence from topic b p ( s j | s i ) = D ( c i , c j ) + δ 2 D ( c i ) + δ 2 m

  17. Sequence Modeling II — Emission probabilities: — Standard topic state: — Probability of observation given state (topic) — Probability of sentence under topic-specific bigram LM — Bigram probabilities p s i ( w ' | w ) = f c i ( ww ') + δ 1 f c i ( w ) + | V | — Etcetera state: — Forced complementary to other states 1 − max i : i < m p s i ( w ' | w ) p s m = ∑ (1 − max i : i < m p s i ( u | w )) u ∈ V

  18. Sequence Modeling III — Viterbi re-estimation: — Intuition: Refine clusters, etc based on sequence info — Iterate: — Run Viterbi decoding over original documents — Assign each sentence to cluster most likely to generate it — Use new clustering to recompute transition/emission — Until stable (or fixed iterations)

  19. Sentence Ordering Comparison — Restricted domain text: — Separate collections of earthquake, aviation accidents — LSA predictions: which order has higher score — Topic/content model: highest probability under HMM

  20. Summary Coherence Scoring Comparison — Domain independent: — Too little data per domain to estimate topic-content model — Train: 144 pairwise summary rankings — Test: 80 pairwise summary rankings — Entity grid model (best): 83.8% — LSA model: 52.5% — Likely issue: — Bad auto summaries highly repetitive è — High inter-sentence similarity

Recommend


More recommend