semantische technologien
play

Semantische Technologien (M-TANI) Christian Chiarcos Angewandte - PowerPoint PPT Presentation

Aktuelle Themen der Angewandten Informatik Semantische Technologien (M-TANI) Christian Chiarcos Angewandte Computerlinguistik chiarcos@informatik.uni-frankfurt.de 18. Juli 2013 Global coherence: Discourse Motivation & Theory


  1. How are Discourse Relations declared? • Broadly, there are two ways of specifying discourse relations • Abstract specification – Relations between are always inferred, and declared by choosing from a pre-defined set of abstract categories. – Lexical elements can serve as partial, ambiguous evidence for inference. • Lexically grounded – Relations can be grounded in lexical elements. – Where lexical elements are absent, relations may be inferred. Joshi et al. (2006)

  2. Example: Evidence • Constraints on the Nucleus – The reader may not believe N to a degree satisfactory to the writer • Constraints on the Satellite – The reader believes S or will find it credible • Constraints on the combination of N+S – The reader’s comprehending S increases their belief of N • Effect (the intention of the writer) – The reader’s belief of N is increased • Assuming a written text and readers and writers; extensions of RST to spoken language discussed later • Definitions of most common relations are available from the RST web site (www.sfu.ca/rst) Taboada & Stede (2009)

  3. RST Relation types • Relations are of different types – Subject matter: they relate the content of the text spans • Cause, Purpose, Condition, Summary – Presentational: more rhetorical in nature. They are meant to achieve some effect on the reader • Motivation, Antithesis, Background, Evidence Taboada & Stede (2009)

  4. Other possible classifications • Relations that hold outside the text – Condition, Cause, Result • vs. those that are only internal to the text – Summary, Elaboration • Relations frequently marked by a discourse marker – Concession (although, however); Condition (if, in case) • vs. relations that are rarely, or never, marked – Background, Restatement, Interpretation • Preferred order of spans: nucleus before satellite – Elaboration – usually first the nucleus (material being elaborated on) and then satellite (extra information) • vs. satellite-nucleus – Concession – usually the satellite (the although-type clause or span) before the nucleus Taboada & Stede (2009)

  5. Relation names (in M&T 1988) Circumstance Antithesis and Concession Solutionhood Antithesis Elaboration Concession Background Condition and Otherwise Enablement and Motivation Condition Enablement Otherwise Motivation Interpretation and Evaluation Evidence and Justify Interpretation Evidence Evaluation Justify Restatement and Summary Relations of Cause Restatement Volitional Cause Summary Non-Volitional Cause Other Relations Volitional Result Sequence Non-Volitional Result Contrast Purpose Other classifications are possible, and longer and shorter lists have been proposed Taboada & Stede (2009)

  6. Graphical representation • A horizontal line covers a span of text (possibly made up of further spans • A vertical line signals the nucleus or nuclei • A curve represents a relation, and the direction of the arrow, the direction of satellite towards nucleus Taboada & Stede (2009)

  7. RST Resources • RST web page – www.sfu.ca/rst • RST tool (for annotation / drawing diagrams) – http://www.wagsoft.com/RSTTool/ Taboada & Stede (2009)

  8. How to do an RST analysis • Given a segmentation S of the text into elementary discourse units (edus) – edu size may vary, in RST usually clauses • for each u in S and any of its neighbours u’ in S if there a clear relation r holding between u and u’ then mark that relation r else u might be at the boundary of a higher-level relation. Look at relations holding between larger units (spans) if a relation r was created between any u 1 , u 2 in S then update 𝑇 → 𝑇\{𝑣 1 , 𝑣 2 } ∪ 𝑣 1∘2 with the unit 𝑣 1∘2 as concatenation of 𝑣 1 , 𝑣 2 iterate until | S |=1 Taboada & Stede (2009)

  9. RST issues • Annotation is possible … … but not very reliable, slow and expensive • Definitions of units – Vary from researcher to researcher, depending on the level of granularity needed • Relations inventory – conflate different aspects of meaning and impose rigid constraints (tree structure) => multiple analyses possible • Problems in identifying relations – Judgments are plausibility judgments. Two analysts might differ in their analyses • A theory purely of intentions Taboada & Stede (2009)

  10. RST issues • Annotation is possible … … but not very reliable, slow and expensive • Definitions of units – Vary from researcher to researcher, depending on the Possible solutions include level of granularity needed • Relations inventory - alterative, more strictly formalized theories – conflate different aspects of meaning and impose rigid (SDRT) constraints (tree structure) => multiple analyses - alternative, simplified models for annotation possible (Penn Discourse Treebank, PDTB) • Problems in identifying relations - data-driven, weakly supervised approaches – Judgments are plausibility judgments. Two analysts might differ in their analyses • A theory purely of intentions Taboada & Stede (2009)

  11. RST issues • Annotation is possible … … but not very reliable, slow and expensive • Definitions of units – Vary from researcher to researcher, depending on the Possible solutions include level of granularity needed • Relations inventory - alterative, more strictly formalized theories – conflate different aspects of meaning and impose rigid (SDRT) constraints (tree structure) => multiple analyses But first: - alternative, simplified models for annotation possible (Penn Discourse Treebank, PDTB) • Problems in identifying relations Discourse Segmentation: - data-driven, weakly supervised approaches – Judgments are plausibility judgments. Two analysts How to identify the building blocks of might differ in their analyses discourse structure (annotation) ? • A theory purely of intentions Taboada & Stede (2009)

  12. Discourse Segmentation • Separating a document into a linear sequence of subtopics – For example: scientific articles are segmented into Abstract, Introduction, Methods, Results, Conclusions – This is often a simplification of a higher level structure of a discourse, e.g., building blocks for hierarchical models of discourse • Applications of automatic discourse segmentation: – For Summarization: Summarize each segment separately – For Information Retrieval or Information Extraction: Apply to an appropriate segment • Related task: Paragraph segmentation, for example of a speech transcript Rohit Kate (2010)

  13. Unsupervised Discourse Segmentation • Given raw text, segment it into multiple paragraph subtopics • Unsupervised: No training data is given for the task • Cohesion-based approach: Segment into subtopics in which sentences/paragraphs are cohesive with each other; A dip is cohesion at subtopic boundaries Rohit Kate (2010)

  14. Cohesion • Cohesion : Links between text units due to linguistic devices, i.e., similar expressions • Lexical Cohesion : Use of same or similar words to link text units – Today was Jack's birthday. Penny and Janet went to the store. They were going to get presents. Janet decided to get a kite. "Don't do that," said Penny. "Jack has a kite . He will make you take it back.” • Non-lexical Cohesion : For example, using the same gesture Rohit Kate (2010)

  15. Cohesion • Cohesion : Links between text units due to linguistic devices, i.e., similar expressions • Lexical Cohesion : Use of same or similar words to link text units – Today was Jack's birthday. Penny and Janet went to the store. They were going to get presents. Janet decided to get a kite. "Don't do that," said Penny. "Jack has a kite . He will make you take it back.” Cohesion is not to be confused with coherence! • Non-lexical Cohesion : For example, using the Coherence : Text units are related same gesture by meaning relations But coherence may be indicated by cohesion

  16. Cohesion-based Unsupervised Discourse Segmentation • TextTiling algorithm (Hearst, 1997) – compare adjacent blocks of text – look for shifts in vocabulary • Do pre-processing: Tokenization, remove stop words, stemming • Divide text into pseudo-sentences of equal length (say 20 words) Rohit Kate (2010)

  17. Cohesion-based Unsupervised Discourse Segmentation • TextTiling algorithm (Hearst, 1997) – compare adjacent blocks of text – look for shifts in vocabulary • Do pre-processing: Tokenization, remove stop words, stemming • Divide text into pseudo-sentences of equal length (say 20 words) Rohit Kate (2010)

  18. TextTiling Algorithm contd. • Compute lexical cohesion score at each gap between pseudo-sentences • Lexical cohesion score: Similarity of words before and after the gap (take say 10 pseudo- sentences before and 10 pseudo-sentences after) • Similarity: Cosine similarity between the word vectors (high if words co-occur) Gap Rohit Kate (2010)

  19. TextTiling Algorithm contd. • Compute lexical cohesion score at each gap between pseudo-sentences • Lexical cohesion score: Similarity of words before and after the gap (take say 10 pseudo- sentences before and 10 pseudo-sentences after) • Similarity: Cosine similarity between the word vectors (high if words co-occur) Similarity Gap Rohit Kate (2010)

  20. TextTiling Algorithm contd. • Plot the similarity and compute the depth scores of the “similarity valleys”, (a -b)+(c-b) • Assign segmentation if the depth score is larger than a threshold (e.g. one standard deviation deeper than mean valley depth) valley c a b Rohit Kate (2010)

  21. TextTiling Algorithm contd. • Plot the similarity and compute the depth scores of the “similarity valleys”, (a -b)+(c-b) • Assign segmentation if the depth score is larger than a threshold (e.g. one standard deviation deeper than mean valley depth) Rohit Kate (2010)

  22. TextTiling Algorithm contd. From (Hearst, 1994) Rohit Kate (2010)

  23. Supervised Discourse Segmentation • Easy to get supervised data for some segmentation tasks – For e.g., paragraph segmentation – Useful to find paragraphs in speech recognition output Rohit Kate (2010)

  24. Supervised Discourse Segmentation • Easy to get supervised data for some segmentation tasks – For e.g., paragraph segmentation – Useful to find paragraphs in speech recognition output • Model as a classification task: Classify if the sentence boundary is a paragraph boundary – Use any classifier SVM, Naïve Bayes, Maximum Entropy etc. Rohit Kate (2010)

  25. Supervised Discourse Segmentation • Easy to get supervised data for some segmentation tasks – For e.g., paragraph segmentation – Useful to find paragraphs in speech recognition output • Model as a classification task: Classify if the sentence boundary is a paragraph boundary – Use any classifier SVM, Naïve Bayes, Maximum Entropy etc. • Or model as a sequence labeling task: Label a sentence boundary with “paragraph boundary” or “not a paragraph boundary label” Rohit Kate (2010)

  26. Supervised Discourse Segmentation • Features: – Use cohesion features: word overlap, word cosine similarity, anaphoras etc. – Additional features: Discourse markers or cue word • Discourse marker or cue phrase/word: A word or phrase that signal discourse structure – For example, “good evening”, “joining us now” in broadcast news – “Coming up next” at the end of a segment, “Company Incorporated” at the beginning of a segment etc. – Either hand-code or automatically determine by feature selection Rohit Kate (2010)

  27. Discourse Segmentation Evaluation • Not a good idea to measure precision, recall and F- measure because that won’t be sensitive to near misses => WindowDiff (Pevzner & Hearst, 2002) – Slide a window of length k across the reference (correct) and the hypothesized segmentation and count the number of segmentation boundaries in each – WindowDiff metric: Average difference in the number of boundaries in the sliding window Rohit Kate (2010)

  28. Discourse Phenomena • Motivation & Theory – Rhetorical Structure Theory • Building Blocks – Discourse Segmentation: Text Tiling • Theory-based approaches – Segmented Discourse Representation Theory • Annotation-based approaches – Penn Discourse Treebank • Data-driven approaches

  29. Theory-based Approaches • SDRT as an example – Segmented Discourse Representation Theory (Asher 1993, Asher & Lascarides 2003) – dynamic semantics (Discourse Representation Theory, Kamp 1982) – extended with discourse relations Hobbs (1978), Mann & Thompson (1987) – hierarchical discourse structure Polanyi (1985), Webber (1988)

  30. Discourse Analysis with SDRT Max pushed John.  1 Discourse segment (utterance) x variable (discourse referent) for Max x y e 1 n y variable (discourse referent) for John e 1 variable (event) described by the utterance Max(x) n Reference time (present) John(y) e 1 : push(x,y) e 1 < n unary predicates that represent noun attributes binary predicate that reflects the semantics of the verb the event precedes the present time parse and create segment

  31. Discourse Analysis with SDRT Max pushed John.  1  1 : x y e 1 n Max(x) John(y) e 1 : push(x,y) e 1 < n integrate segment with the (previously empty) context

  32. Discourse Analysis with SDRT Max pushed John. He fell.  1  1 :  2 : x y e 1 n z e 2 n e 2 : fall(z) Max(x) e 2 < n John(y) e 1 : push(x,y) e 1 < n process next utterance construct new segment

  33. Discourse Analysis with SDRT Max pushed John. He fell.  1 ,  2  1 :  2 : x y e 1 n z e 2 n e 2 : fall(z) Max(x) e 2 < n John(y) z = y e 1 : push(x,y) e 1 < n Result(  1 ,  2 ) Narration(  1 ,  2 ) update with the anaphor resolution new segment inferred discourse relations

  34. Discourse Analysis with SDRT • SDRT accounts for – anaphoric reference – lexical disambiguation – bridging – presupposition – ellipsis – coherence • but only, if discourse relations can be inferred

  35. Inference of Discourse Relations SDRT defeasible (nonmonotonic) inference (Glue logic)  semantic constraints on the new segment  structural constraints on potential attachment points semantic constraints on potential attachment point > discourse relation to be applied > defeasible inference,  monotone inference (e.g., if a discourse connector signals the relation unambiguously)

  36. Inference of Discourse Relations if segment  can be attached to segment  in context t and the event described in  involves a pushing event with arguments x and y and the event described in  involves a falling event of argument y then, normally, the discourse relation between  and  is a Result (< t ,  ,  >  [Push(e  ,x,y)]K   [Fall(e  ,y)]K  ) > Result(  ,  )

  37. Inference of Discourse Relations if segment  can be attached to segment  and the event described in  is a pushing event with arguments x and y and the event described in  involves a falling event of argument y then, normally, the discourse relation between  and  is a Result (< t ,  ,  >  [Push(e  ,x,y)]K   [Fall(e  ,y)]K  ) > Result(  ,  )

  38. Inference of Discourse Relations if segment  can be attached to segment  and the event described in  is a pushing event with arguments x and y and the event described in  is a falling event of argument y then, normally, the discourse relation between  and  is a Result (< t ,  ,  >  [Push(e  ,x,y)]K   [Fall(e  ,y)]K  ) > Result(  ,  )

  39. Inference of Discourse Relations if segment  can be attached to segment  and the event described in  is a pushing event with arguments x and y and the event described in  is a falling event of argument y then, normally, the discourse relation between  and  is a Result (< t ,  ,  >  [Push(e  ,x,y)]K   [Fall(e  ,y)]K  ) > Result(  ,  )

  40. Inference of Discourse Relations • „GLUE logic“ – accesses to operationalize SDRT as it is stated , we need an • structural and propositional contents of the context • propositional contents of the new segment exhaustive formal model of shared – employs knowledge • generic pragmatic principles (e.g., Gricean) and • specific pragmatic principles (e.g., shared world formally defined rules to infer every knowledge) possible discourse relation, etc. • monotonic axioms (gather discourse clues from logical form) • defeasible (non-monotonic) rules (infer discourse relations)

  41. Inference of Discourse Relations • „GLUE logic“ – accesses to operationalize SDRT as it is stated , we need an In this form, these resources are not • structural and propositional contents of the context available. • propositional contents of the new segment exhaustive formal model of shared – employs knowledge State of the art: • generic pragmatic principles (e.g., Gricean) and • specific pragmatic principles (e.g., shared world Underspecified discourse analysis formally defined rules to infer every knowledge) Discourse relations only for explicit cues possible discourse relation • monotonic axioms (gather discourse clues from logical Approximate shared knowledge with lexical- form) semantic resources (FrameNet, etc.) • defeasible (non-monotonic) rules (infer discourse (Bos 2008) relations)

  42. Boxer (Bos 2008) • Based on DRT, augmented with RST/SDRT-like relations – Manually corrected training data: Groningen Meaning Bank (http://gmb.let.rug.nl) – Demo & download: http://svn.ask.it.usyd.edu.au/trac/candc/wiki/box er – RDF wrapper: Fred (http://wit.istc.cnr.it/stlab- tools/fred/) • Discourse relations only where explicitly signalled

  43. Fred (Boxer) Max pushed John. He fell. (We don‘t get the coherence relation, and the anaphor isn‘t correctly resolved.) http://wit.istc.cnr.it/stlab-tools/fred/

  44. Fred (Boxer+WikiFier) Max pushed John. He fell. – with Named Entity Recognition plus DBpedia links • the latter are incorrect for the example. http://wit.istc.cnr.it/stlab-tools/fred/

  45. Discourse Phenomena • Motivation & Theory – Rhetorical Structure Theory • Building Blocks – Discourse Segmentation: Text Tiling • Theory-based approaches – Segmented Discourse Representation Theory • Annotation-based approaches – Penn Discourse Treebank • Data-driven approaches

  46. Penn Discourse Treebank • Recently released corpus that is likely to lead to better systems for discourse processing • Has coherence relations encoded associated with the discourse connectives • Linked to the Penn Treebank – http://www.seas.upenn.edu/~pdtb/ • Goal: reliable annotation, theory-neutrality  lexical definition of discourse relations  discourse relations only, no discoursen structure Rohit Kate (2010)

  47. Lexical Definition of Discourse Relations • Discourse relations associated with “conjunctive elements” (discourse markers) (Halliday & Hasan 1976) – Coordinating and subordinating conjunctions – Conjunctive adjuncts (aka discourse adjuncts), including • Adverbs such as but, so, next, accordingly, actually, instead, etc. • Prepositional phrases (PPs) such as as a result, in addition, etc. • PPs with that or other referential item such as in addition to that, in spite of that, in that case, etc. • Each such element conveys a cohesive relation between – its matrix sentence and – a presupposed predication from the surrounding discourse => represented as a string of text in the preceding text Joshi et al. (2006)

  48. No Discourse Structure, Discourse Relations only • Discourse relations are not associated with discourse structure because some theories explicitly reject any notion of structure in discourse – Whatever relation there is among the parts of a text – the sentences, the paragraphs, or turns in a dialogue – it is not the same as structure in the usual sense, the relation which links the parts of a sentence or a clause. [Halliday & Hasan, 1976, p. 6] – Between sentences, there are no structural relations. [Halliday & Hasan, 1976, p. 27] Joshi et al. (2006)

  49. Corpus and Annotation Representation • Wall Street Journal – 2304 articles, ~1M words – partial overlap with RST discourse treebank (http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp? catalogId=LDC2002T07) • Annotation – the text spans of connectives and their arguments – features encoding the semantic classification of connectives, and attribution of connectives and their arguments. – If a connective can be inferred , annotated as “implicit” connective (PTDB 2) Joshi et al. (2006)

  50. Explicit Connectives Explicit connectives are the lexical items that trigger discourse relations. • Subordinating conjunctions (e.g., when , because , although, etc.)  The federal government suspended sales of U.S. savings bonds because Congress hasn't lifted the ceiling on government debt . • Coordinating conjunctions (e.g., and , or , so , nor , etc.)  The subject will be written into the plots of prime-time shows , and viewers will be given a 900 number to call . • Discourse adverbials (e.g., then , however , as a result , etc.)  In the past, the socialist policies of the government strictly limited the size of … industrial concerns to conserve resources and restrict the profits businessmen could make . As a result , industry operated out of small, expensive, highly inefficient industrial units .  Only 2 AO arguments, labeled Arg1 and Arg2  Arg2 : clause with which connective is syntactically associated  Arg1 : the other argument Joshi et al. (2006)

  51. Implicit Connectives When there is no Explicit connective present to relate adjacent sentences, it may be possible to infer a discourse relation between them due to adjacency .  Some have raised their cash positions to record levels . Implicit=because (causal) High cash positions help buffer a fund when the market falls .  The projects already under construction will increase Las Vegas's supply of hotel rooms by 11,795, or nearly 20%, to 75,500 . Implicit=so (consequence) By a rule of thumb of 1.5 new jobs for each new hotel room, Clark County will have nearly 18,000 new jobs . Such discourse relations are annotated by inserting an “Implicit connective” that “best” captures the relation . Joshi et al. (2006)

  52. Arguments  Arg2 is the sentence/clause with which connective is syntactically associated.  Arg1 is the other argument.  No constraints on relative order. Discontinuous annotation is allowed. • Linear:  The federal government suspended sales of U.S. savings bonds because Congress hasn't lifted the ceiling on government debt . • Interposed:  Most oil companies , when they set exploration and production budgets for this year , forecast revenue of $15 for each barrel of crude produced .  The chief culprits , he says, are big companies and business groups that buy huge amounts of land "not for their corporate use, but for resale at huge profit ." … The Ministry of Finance , as a result , has proposed a series of measures that would restrict business investment in real estate even more tightly than restrictions aimed at individuals . Joshi et al. (2006)

  53. Extent of Arguments  arguments of connectives can be sentential, sub-sentential, multi-clausal or multi-sentential :  Legal controversies in America have a way of assuming a symbolic significance far exceeding what is involved in the particular case. They speak volumes about the state of our society at a given moment. It has always been so . Implicit=for example (exemplification) In the 1920s, a young schoolteacher, John T. Scopes, volunteered to be a guinea pig in a test case sponsored by the American Civil Liberties Union to challenge a ban on the teaching of evolution imposed by the Tennessee Legislature. The result was a world-famous trial exposing profound cultural conflicts in American life between the "smart set," whose spokesman was H.L. Mencken, and the religious fundamentalists, whom Mencken derided as benighted primitives. Few now recall the actual outcome: Scopes was convicted and fined $100, and his conviction was reversed on appeal because the fine was excessive under Tennessee law . Joshi et al. (2006)

  54. Location of Arg1  Same sentence as Arg2:  The federal government suspended sales of U.S. savings bonds because Congress hasn't lifted the ceiling on government debt .  Sentence immediately previous to Arg2:  Why do local real-estate markets overreact to regional economic cycles? Because real-estate purchases and leases are such major long-term commitments that most companies and individuals make these decisions only when confident of future economic stability and growth .  Previous sentence non-contiguous to Arg2 :  Mr. Robinson … said Plant Genetic's success in creating genetically engineered male steriles doesn't automatically mean it would be simple to create hybrids in all crops . That's because pollination, while easy in corn because the carrier is wind, is more complex and involves insects as carriers in crops such as cotton. "It's one thing to say you can sterilize, and another to then successfully pollinate the plant," he said. Nevertheless , he said, he is negotiating with Plant Genetic to acquire the technology to try breeding hybrid cotton . Joshi et al. (2006)

  55. Semantic Classification for Connectives visualized with Protégé 4.1, http://sourceforge.net/p/olia/code/45/tree/trunk/owl/experimental/discourse/PDTB.owl

  56. Other Relations: AltLex, EntRel, NoRel Implicit connectives cannot be inserted between adjacent sentences if one of the following three relations is found – AltLex, EntRel, NoRel  AltLex : A discourse relation is inferred, but insertion of an Implicit connective leads to redundancy because the relation is Alternatively Lexicalized by some non-connective expression:  Ms. Bartlett's previous work, which earned her an international reputation in the non-horticultural art world, often took gardens as its nominal subject . AltLex = (consequence) Mayhap this metaphorical connection made the BPC Fine Arts Committee think she had a literal green thumb . Joshi et al. (2006)

  57. Non-insertability of Implicit Connectives  EntRel : the coherence is due to an entity-based relation.  Hale Milgrim, 41 years old, senior vice president, marketing at Elecktra Entertainment Inc., was named president of Capitol Records Inc., a unit of this entertainment concern . EntRel Mr. Milgrim succeeds David Berman, who resigned last month .  NoRel : Neither discourse nor entity-based relation is inferred.  Jacobs is an international engineering and construction concern . NoRel Total capital investment at the site could be as much as $400 million, according to Intel .  EntRel and NoRel do not express discourse relations, hence no semantic classification is provided for them. AltLex is subcategorized like explicit and implicit connectives. Joshi et al. (2006)

  58. Annotation Overview (PDTB 1.0): Explicit Connectives • All WSJ sections (25 sections; 2304 texts) • 100 distinct types • Subordinating conjunctions – 31 types • Coordinating conjunctions – 7 types • Discourse Adverbials – 62 types • 18505 distinct tokens Joshi et al. (2006)

  59. Data Challenges Data sparsity High annotation costs and limited reliability limit the size of corpora Annotation compatiblity Different annotation schemes for the same phenomenon are not necessarily comparable

  60. Data Challenges Limited agreement If your classifier performs better than the annotators, agreement metrics are uninterpretable. Limited data overlap Dependencies between discourse phenomena can only be studied if the same primary data is used

  61. Data Challenges Limited agreement If your classifier performs better than the annotators, agreement metrics are uninterpretable. Limited data overlap Dependencies between discourse phenomena can only be studied if the same primary data is used More and better data may be available if information could be preprocessed to a larger extent

  62. Data-driven Approaches • Motivation & Theory – Rhetorical Structure Theory • Building Blocks – Discourse Segmentation: Text Tiling • Theory-based approaches – Segmented Discourse Representation Theory • Annotation-based approaches – Penn Discourse Treebank • Data-driven approaches

  63. A Data-driven Approach • Idea Employ corpora without discourse annotation (a) to evaluate models and theories of discourse, or (a) to create repositories of discourse information that may be applied in theory-based approaches or to support manual annotation.

  64. A Data-driven Approach • Idea Employ corpora without discourse annotation (a) to evaluate models and theories of discourse, or (a) to create repositories of discourse information that may be applied in theory-based approaches or to support manual annotation.

  65. Inferring Discourse Relations in SDRT if segment  can be attached to segment  and the event described in  is a pushing event with arguments x and y and the event described in  is a falling event of argument y then, normally, the discourse relation between  and  is a Result (< t ,  ,  >  [Push(e  ,x,y)]K   [Fall(e  ,y)]K  ) > Result(  ,  )

  66. Inferring Discourse Relations in SDRT • rules tailored towards specific event types – not provided by any lexical-semantic resource I am aware of – hard to construct manually • distributional hypothesis – Discourse markers that are compatible with the „normal“ discourse relation for a pair of events should occur more frequently than incompatible discourse markers – So, let‘s just count them …

  67. Data Structures • event pair <event1, event2> • triple <event1, relation word, event2> – event1: event type of the external argument – event2: event type of the internal argument – relation word: 0 or a discourse marker* • e.g., <push, fall> , <push, then, fall>

  68. Events • heuristic: event = lemma of main verb – auxiliaries, modal verbs, etc. are stripped – it would be interesting to develop more • heuristic: event1 = event of preceding sentence – external argument is more likely to be the main event of the preceding utterance than anything else • more remote antecedent candidates are subject to structural constraints

  69. Relation Words • adverbs, conjunctions, phrases, relative clauses, etc. • purely syntactic definition – to avoid preemptive restriction to limited set of relation words – relation word is the string representation of a sentence-initial adverbial argument of the main event in the new segment, a sentence-initial conjunction, or (if neither found) 0

  70. Weighing the Evidence • Noisy data – external argument heuristically determined • Coarse-grained approximation of events – relevant level of detail of event description may not be covered => Rigid, theoretically well-founded pruning – significance tests • χ ² where applicable, t-test otherwise

  71. Significance Tests • Given a relation word R and an event pair <x,y> • How probable is it that the relative frequency of R under the condition <x,y> deviates by chance from the unconditioned relative frequency of R ? R observed R not observed <x,y> freq(R|<x,y>) freq(<x,y>) – freq(R|<x,y>) all event pairs freq(R) sum_<a,b> freq(<a,b>) – freq(R)

  72. Significance Tests • Given a relation word R and an event pair <x,y> • How probable is it that the relative frequency of R under the condition <x,y> deviates by chance from the unconditioned relative frequency of R ? • If this probability is below 5%, remove the triple. • Remaining triples are highly significant (p<.05).

  73. Correlation • Given a relation word R and an event pair <x,y> • Assume that the distribution of R for <x,y> differs significantly from the distribution of R in general. • P(R|<x,y>) > P(R): positive correlation • P(R|<x,y>) < P(R): negative correlation

  74. Data • Huge corpora needed – adjacent sentences only – with some 1000 frequent verbs in a language, every event pair has a probability of 1:10^6 – relation words are optional and manifold, need several instantiations to establish significance => several million sentences needed • Syntax-defined relation words => syntax-annotated corpora

  75. Wacky corpora (http://wacky.sslmit.unibo.it/doku.php) • PukWaC – 2G-token dump of the uk domain – tagged and lemmatized with TreeTagger – parsed with MaltParser • Wackypedia – English Wikipedia (2009), 0.8G-token – same annotations • Consider 80% of both corpora – PukWaC: 72.5M sentences – Wackypedia: 33.2M sentences

  76. Evaluation • Goal – Test whether, despite the simplifications, potentially usable results can be obtained with this methodology • Evaluation of the methodology , as preparation for subsequent experiments

  77. Evaluation Criteria • Significance – Are there significant correlations between event pairs and relation words ? • Reproducibility – Can these correlations be confirmed on independent data sets ? • Interpretability – Can these correlations be interpreted in terms of theoretically motivated discourse relations ?

Recommend


More recommend