ev evaluation benchmarks and learning criteria fo for di
play

Ev Evaluation Benchmarks and Learning Criteria fo for Di Discou - PowerPoint PPT Presentation

Ev Evaluation Benchmarks and Learning Criteria fo for Di Discou ourse-Aw Aware Sente ntence Represent ntations Mingda Chen Joint work with Zewei Chu and Kevin Gimpel Prior work on evaluation benchmarks Focus on capabilities of


  1. Ev Evaluation Benchmarks and Learning Criteria fo for Di Discou ourse-Aw Aware Sente ntence Represent ntations Mingda Chen Joint work with Zewei Chu and Kevin Gimpel

  2. Prior work on evaluation benchmarks • Focus on capabilities of representations for stand-alone sentences • Sentiment analysis • Linguistic properties, e.g. verb tense prediction • … • What about the broader context (i.e. discourse) for a sentence? 1

  3. Our contributions • An evaluation suite for evaluating discourse knowledge encoded in sen senten ence r e rep epresen esentati tion ons . • Benchmark and compare several pretrained sentence representations. • Novel learning criteria for capturing discourse structures. 2

  4. Discourse Evaluation (DiscoEval) • Focus on evaluating the role of a sentence in its discourse context. • 7 task groups, covering multiple domains (e.g. Wikipedia, stories, dialogues, and scientific literature). • Probing tasks. Pretrained embeddings are kept fixed and we only use simple classifiers. 3

  5. <latexit sha1_base64="bvaxiqMuIvBbqER7MO1ac5wEjoI=">ACD3icbVC7TsMwFHXKq5RXgJHFogIxQJUJBgrWBiLRB9SGkWO47ZWnTiyHdQq7R+w8CsDCDEysrG3+C0GaDlSLbOPede2f4MaNSWda3UVhaXldK6XNja3tnfM3b2m5InApIE546LtI0kYjUhDUcVIOxYEhT4jLX9wk/mtByIk5dG9GsXEDVEvol2KkdKSZx47Q8+hUOvml027PCAq1k5zuqzjI9dzyxbFWsKuEjsnJRBjrpnfnUCjpOQRAozJKVjW7FyUyQUxYxMSp1EkhjhAeoR9MIhUS6XSfCTzSgC7XOgTKThVf0+kKJRyFPq6M0SqL+e9TPzPcxLVvXJTGsWJIhGePdRNGFQcZuHAgAqCFRtpgrCg+q8Q95FAWOkISzoEe37lRdKsVuzSvXuoly7zuMogNwCE6ADS5BDdyCOmgADB7BM3gFb8aT8WK8Gx+z1oKRz+yDPzA+fwBSoZmx</latexit> <latexit sha1_base64="EZhCq4owAtF2cwbdS8hlXB0MsHM=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4kLJbBT0WvXisYD+gXZsm1Ds8maZKVl6Z/w4kERr/4db/4b03YP2vpg4PHeDPzwoQzbVz321lZXVvf2CxsFbd3dvf2SweHTS1TRWiDSC5VO8SaciZowzDaTtRFMchp61weDv1W09UaSbFgxkn1I9xX7CIEWys1B4F3jkaBdWgVHYr7gxomXg5KUOelD6vYkSWMqDOFY647nJsbPsDKMcDopdlNE0yGuE87lgocU+1ns3sn6NQqPRJZUsYNFN/T2Q41noch7YzxmagF72p+J/XSU107WdMJKmhgswXRSlHRqLp86jHFCWGjy3BRDF7KyIDrDAxNqKiDcFbfHmZNKsV76JSvb8s127yOApwDCdwBh5cQ3uoA4NIMDhGV7hzXl0Xpx352PeuLkM0fwB87nD9ITjys=</latexit> Discourse Evaluation (DiscoEval) • In general, we follow SentEval and use following input for tasks involving pairs of sentences x 1 , x 2 [ x 1 , x 2 , x 1 � x 2 , | x 1 � x 2 | ] 4

  6. <latexit sha1_base64="O13UVs1+Wil2AH1HG0wZF7TLsA=">ACMXicbVA9T8MwEHX4pnwVGFksKiQGqJKCBCOChbFIFJDaKHKcK1g4cWRfUKuQv8TCP0EsHUCIlT+B03aAwkm23r27d2e/MJXCoOsOnKnpmdm5+YXFytLyupadX3jyqhMc2hxJZW+CZkBKRJoUAJN6kGFocSrsP7s7J+/QDaCJVcYj8FP2a3iegKztBSQfW8g9D4ZxcQ1Tk7aIXeHu0FzTKy6MdFSkcpY9lvl/ix0mVXwTVmlt3h0H/Am8MamQczaD60okUz2JIkEtmTNtzU/RzplFwCUWlkxlIGb9nt9C2MGExGD8f7izojmUi2lXangTpkP2pyFlsTD8ObWfM8M5M1kryv1o7w+6xn4skzRASPlrUzSRFRUv7aCQ0cJR9CxjXwr6V8jumGUdrcsWa4E1+S+4atS9g3rj4rB2cjq2Y4FskW2ySzxyRE7IOWmSFuHkibySN/LuPDsD58P5HLVOWPNJvkVztc3mf6p2w=</latexit> <latexit sha1_base64="EZhCq4owAtF2cwbdS8hlXB0MsHM=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4kLJbBT0WvXisYD+gXZsm1Ds8maZKVl6Z/w4kERr/4db/4b03YP2vpg4PHeDPzwoQzbVz321lZXVvf2CxsFbd3dvf2SweHTS1TRWiDSC5VO8SaciZowzDaTtRFMchp61weDv1W09UaSbFgxkn1I9xX7CIEWys1B4F3jkaBdWgVHYr7gxomXg5KUOelD6vYkSWMqDOFY647nJsbPsDKMcDopdlNE0yGuE87lgocU+1ns3sn6NQqPRJZUsYNFN/T2Q41noch7YzxmagF72p+J/XSU107WdMJKmhgswXRSlHRqLp86jHFCWGjy3BRDF7KyIDrDAxNqKiDcFbfHmZNKsV76JSvb8s127yOApwDCdwBh5cQ3uoA4NIMDhGV7hzXl0Xpx352PeuLkM0fwB87nD9ITjys=</latexit> Discourse Evaluation (DiscoEval) • In general, we follow SentEval and use following input for tasks involving pairs of sentences x 1 , x 2 [ x 1 , x 2 , x 1 � x 2 , | x 1 � x 2 | ] 5

  7. <latexit sha1_base64="Cv+ztvsh7fKnajPEbExTQdBEYo=">ACIHicbVDLTsMwEHR4U14FjlwsKiQOUCUFCY4ILhxBogWpjSLH2YKFE0f2BrUK5U+48CtcOIAQ3OBrcNIeK1kazQzu15PmEph0HU/nLHxicmp6ZnZytz8wuJSdXmlZVSmOTS5kpfhMyAFAk0UaCEi1QDi0MJ5+H1UaGf34A2QiVn2E/Bj9lIrqCM7RUN1r9wJvi/aCRnF5dx2EHpZjcw3RIO+oSOHgrtRvrYFuF95bP6jW3LpbFv0LvBGokVGdBNX3TqR4FkOCXDJj2p6bop8zjYJLGFQ6mYGU8Wt2CW0LExaD8fNykwHdsExEu0rbkyAt2e8dOYuN6cehdcYMr8xvrSD/09oZdvf9XCRphpDw4UPdTFJUtEiLRkIDR9m3gHEt7K6UXzHNONpMKzYE7/eX/4JWo+7t1Bunu7WDw1EcM2SNrJN4pE9ckCOyQlpEk7uySN5Ji/Og/PkvDpvQ+uYM+pZJT/K+fwCadaig=</latexit> <latexit sha1_base64="EZhCq4owAtF2cwbdS8hlXB0MsHM=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4kLJbBT0WvXisYD+gXZsm1Ds8maZKVl6Z/w4kERr/4db/4b03YP2vpg4PHeDPzwoQzbVz321lZXVvf2CxsFbd3dvf2SweHTS1TRWiDSC5VO8SaciZowzDaTtRFMchp61weDv1W09UaSbFgxkn1I9xX7CIEWys1B4F3jkaBdWgVHYr7gxomXg5KUOelD6vYkSWMqDOFY647nJsbPsDKMcDopdlNE0yGuE87lgocU+1ns3sn6NQqPRJZUsYNFN/T2Q41noch7YzxmagF72p+J/XSU107WdMJKmhgswXRSlHRqLp86jHFCWGjy3BRDF7KyIDrDAxNqKiDcFbfHmZNKsV76JSvb8s127yOApwDCdwBh5cQ3uoA4NIMDhGV7hzXl0Xpx352PeuLkM0fwB87nD9ITjys=</latexit> Discourse Evaluation (DiscoEval) • In general, we follow SentEval and use following input for tasks involving pairs of sentences x 1 , x 2 [ x 1 , x 2 , x 1 � x 2 , | x 1 � x 2 | ] 6

  8. <latexit sha1_base64="EZhCq4owAtF2cwbdS8hlXB0MsHM=">AB73icbVBNSwMxEJ31s9avqkcvwSJ4kLJbBT0WvXisYD+gXZsm1Ds8maZKVl6Z/w4kERr/4db/4b03YP2vpg4PHeDPzwoQzbVz321lZXVvf2CxsFbd3dvf2SweHTS1TRWiDSC5VO8SaciZowzDaTtRFMchp61weDv1W09UaSbFgxkn1I9xX7CIEWys1B4F3jkaBdWgVHYr7gxomXg5KUOelD6vYkSWMqDOFY647nJsbPsDKMcDopdlNE0yGuE87lgocU+1ns3sn6NQqPRJZUsYNFN/T2Q41noch7YzxmagF72p+J/XSU107WdMJKmhgswXRSlHRqLp86jHFCWGjy3BRDF7KyIDrDAxNqKiDcFbfHmZNKsV76JSvb8s127yOApwDCdwBh5cQ3uoA4NIMDhGV7hzXl0Xpx352PeuLkM0fwB87nD9ITjys=</latexit> <latexit sha1_base64="S8HMBUZD6ql8pU5lTkqauFa2iXg=">ACMXicbVBNSwMxEM36WetX1aOXYBE8aNmtgh6LXjwqWBXaZclmpzY0u1mSWlZ+5e8+E/EiwdFvPonzLY9qHUg4eW9N8zkhakUBl31ZmZnZtfWCwtlZdXVtfWKxub10ZlmkOTK6n0bcgMSJFAEwVKuE01sDiUcBP2zgr95h60ESq5wkEKfszuEtERnKGlgsp5qx94+7Qf1IvLo20VKRw/2wh9HE3INUTD/GFYGA4KcVryg0rVrbmjotPAm4AqmdRFUHluR4pnMSTIJTOm5bkp+jnTKLiEYbmdGUgZ7E7aFmYsBiMn4+GDumuZSLaUdqeBOmI/dmRs9iYQRxaZ8ywa/5qBfmf1sqwc+LnIkzhISPB3UySVHRIj4aCQ0c5cACxrWwu1LeZpxtCGXbQje3y9Pg+t6zTus1S+Pqo3TSRwlsk12yB7xyDFpkHNyQZqEk0fyQt7Iu/PkvDofzufYOuNMerbIr3K+vgFbcqnb</latexit> Discourse Evaluation (DiscoEval) • In general, we follow SentEval and use following input for tasks involving pairs of sentences x 1 , x 2 [ x 1 , x 2 , x 1 � x 2 , | x 1 � x 2 | ] 7

  9. What is a discourse? • A discourse is a coherent, structured group of sentences that acts as a fundamental type of structure in natural language. 8

  10. What is a discourse? • Linearly-structured, e.g. sentence ordering. • The timing of introducing entities. • Tree-structured, e.g. RST discourse tree. “S” represents “satellite”, containing additional information about the nucleus. 1. The European Community's consumer price index rose a provisional 0.6% in September from August NS-Attribution 2. and was up 5.3% from September 1988, 3. according to Eurostat, the EC's statistical agency. NN-Comparison 1 2 3 “N” represents “nucleus”, containing basic information for the relation. 9

  11. Discourse Relations • Two human-annotated datasets: Penn Discourse Treebank (PDTB) and RST Discourse Treebank (RST-DT). • PDTB provides discourse markers for ad adjac jacent sen senten ences es , whereas RST-DT offers do docum ument-le level discourse trees. 10

  12. Discourse Relations – PDTB • Use a pair of sentences to predict discourse relations. • We focus on predicting implicit relations (PDTB-I) and explicit relations (PDTB-E). PDTB-E PDTB-I 1. In any case, the brokerage firms are clearly 1. “A lot of investor confidence comes from the moving faster to create new ads than they did in fact that they can speak to us,” he says. the fall of 1987. 2. But it remains to be seen whether their ads will 2. [so] “To maintain that dialogue is absolutely be any more effective. crucial.” Label La el: Co Comparison.Co Contrast La Label el: Contingency cy.Cause 11

  13. Discourse Relations – RST-DT • Text is segmented into basic units, elementary discourse units (EDUs), upon which a discourse tree is built recursively. • We use 18 fine-grained relations. 1. The European Community's consumer price index rose NS-Attribution a provisional 0.6% in September from August 2. and was up 5.3% from September 1988, NN-Comparison 3. according to Eurostat, the EC's statistical agency. 1 2 3 12

  14. Discourse Relations – RST-DT • Text is segmented into basic units, elementary discourse units (EDUs), upon which a discourse tree is built recursively. • We use 18 fine-grained relations. 1. The European Community's consumer price index rose NS-Attribution a provisional 0.6% in September from August 2. and was up 5.3% from September 1988, NN-Comparison 3. according to Eurostat, the EC's statistical agency. 1 2 3 13

Recommend


More recommend