Mehdi Hosseini Dr. Caroline Sporleder Saarland University 1 - PowerPoint PPT Presentation

By:David K. Elson and Kathleen R. McKeown Columbia University Mehdi Hosseini Dr. Caroline Sporleder Saarland University 1

Abstract  Quoted speech: a block of text within a paragraph falling between quotation marks).  We will see a method for identifying the speakers of quoted speech in natural-language textual stories 2

1815 - 1899 3

Identifying the characters in each scene  The baseline approach: to find named entities near the quote 4

Several named entities near the quote  “Take it,” said Emma , smiling, and pushing the paper towards Harriet – “it is for you. Take your own.” 5

Related Work  Most Work on the NEWS domain  Sarmento and Nunes (2009)  Pouliquen et al. (2007)  Not favorable for literary narrative, which is less structured than news text in term of attributed quoted speech . 6

 Mamede and Chaleira (2004) work with a set Portuguese children’s stories  Glass and Bangay (2007): focus on finding the link between the quote, its speech verb and the verb’s agent. 7

Corpus and its annotation  Six authors who published in 19th century  Four in English, one in French ( translated by Constance Garnett) and one in French (translated by Eleanor Marx Aveling)  Four authors contribute novels, two short stories  Dickens often wrote in serial form, but A Christmas Carol was published as a single novella 8

 111,000 words  3,176 quoted speech instances 9

Methodology  The method for quoted speech attribution: Preprocessing 1.  Identify all names and nominals appear in the passage of text preceding the quote in question. Classification 2.  to classify the quote into one of a set of syntactic categories. Learning 3.  to extract a feature vector from the passage and send it to a trained model. 10

Preprocessing: Finding candidate characters  First step is to identify the candidate speakers by „chunking“ names ( Mr. Holmes) and nominals (the clerk)  Coreferents and proper names link together as the same entity  Example: Mr. Sherlock Holmes  Mr. Holmes  Sherlock Holmes  Sherlock  Holmes 11

 Pronouns won‘t be chunked as character candidates!  9% of quotes are attributed to pronouns  Assign gender to as many names and nominals as possible:  Gendered titles: Mr.  Gendered headwords: nephew  First names: Emma 12

Encoding, cleaning, and normalizing  Before extracting features for each candidate, the passage is encoded between the candidate and the quote  The steps include: Replace the quote and character with symbols 1. Replace verb indicate verbal expression or thought 2. with a single symbol <EXPRESS_VERB> Removing extraneous information 3. Removing paragraphs, sentenses and clauses that have 4. no information to quoted speech attribution 13

Dialogue chains  An author often produces a sequence of quotes by the same speaker, but only attribute the first one  Example: “Bah!” said Scrooge, “Humbug!” 14

Syntactic categories  The quotes and their passgaes are classified to leverage two aspects: Dialogue chains 1. The frequent use of expressions 2. Pattern matching algorithm assigns to each quote one of five syntactic categories: Added Quote 1. Quote Alone 2. Character trigram: Quote-Said-Person : „Bah!“ said Scrooge. 3. Anaphora trigram 4. Back Off 5. 15

 Two categories automatically imply a speaker: Added Quote 1. Character Trigram 2. The rest are divided to three datasets: No Apparent Pattern 1. Quote Alone 2. Anaphora Trigram 3. 16

Feature extraction and learning  To build the mentioned three predictive models, the feature vector ʄ for each candidate -vector pair is used. That include: o Distance between candidate and quote (in words) o The presence and type of punktuations between the candidate and quote o Ordinal position of candidate from the quote among the characters o Proportion of the recent quotes, were spoken by the candidate o Number of names, quotes, and words in each paragraph o Number of apprearance of the candidate o For each word near the candidate and quote, whether the word is an expression verb, a punctuation mark, or another person o Features of the quote itself: length, position in paragraph, the presence or absence of character names within, ... 17

ʄ mean : The average value of each feature across the set Replace the absolute value for each candidate ( ʄ ) with ʄ - ʄ mean ʄ - ʄ median ʄ - ʄ product ʄ - ʄ max ʄ - ʄ min And sending them to the three learners: J48, Jrip, and a two-class logistic regression model 18

Final Step  to reconcile the binary results into a single decision for each quote, using one of the four methods: Label: Ambiguous, Non-dialogue, 1. Missattributions: (Errors): Overattribution, Underattribution  Single Probability: threshold 2. Hybrid: like Label, if more than one candidat  S.P 3. Combined Probability: like S.P, but probability of 4. each candidate is derived from two or three probabilities provided by the classifier: mean, median, product and maximum 19

Results and discussion  High recall of the names and nominals chunker method (97%) 20

 High learning results (83% in average) 21

Thanks For Your Attention  Any Question? Any Questions? 22

Mehdi Hosseini Dr. Caroline Sporleder Saarland University 1 - PowerPoint PPT Presentation

By:David K. Elson and Kathleen R. McKeown Columbia University Mehdi Hosseini Dr. Caroline Sporleder Saarland University 1 Abstract Quoted speech: a block of text within a paragraph falling between quotation marks). We will see a

An Introduction to Formal Concept Analysis Mehdi Kaytoue Mehdi Kaytoue

A Novel Synthesis Algorithm for Reversible Circuits Mehdi Saeedi, Mehdi Sedighi*, Morteza Saheb

A Cycle-Based Synthesis algorithm for Reversible Logic Zahra Sasanian*, Mehdi Saeedi, Mehdi

Be secret like a ninja with Mehdi LARUELLE Hashicorp Vault @D2SI Whoami ? D2SI Me Mehdi

Bits from the DPL Mehdi Dogguy mehdi@debian.org July 5th, 2016 DebConf16 Cape Town, South

Hotspot Mapper for World War II Unlocking the Secrets of the Past: Text Mining for Historical

OpenStack as a Software Factory Mehdi Abaakouk Nick Barcet mehdi@enovance.com

The presentation of a new type of quantum calculus Abdolali Neamaty a and Mehdi Tourani b

Supersymmetric localization and black holes microstates Seyed Morteza Hosseini Kavli IPMU YITP

T8: NodeJS CPSC 513 Dr. P. Federl University of Calgary Arshia Hosseini T01/T02 What is

Smarter Electric Power Grid Smarter Electric Power Grid Mehdi Etezadi-Amoli, PhD. PE Mehdi

Introduction to Multiagent Systems Mehdi Dastani BBL-521 m.m.dastani@uu.nl Webpage:

The Artificially Intelligent Pharma & Healthcare Sector M. Morris Hosseini, MSc, PhD

Software Archeology Mehdi Mirakhorli, Jane Cleland Huang DePaul University Contact me:

Recurrence and Orbit Equivalence Maryam Hosseini University of Ottawa A work under progress with

with Heat Exchanger Naseh Hosseini The 3 rd TANGO Meeting and Workshop KTH Royal Institute of

Tut#14: A4 Prerequisites CPSC 501 Dr. J. Hudson University of Calgary Arshia Hosseini T01/T02

TWO SUPERLATIVE METHODOLOGIES: PICTURE-AIDED TRANSLATION & STORYBOARDS u Golsa Nouri-Hosseini

Management of Large Seismic Datasets S. Kasra Hosseini Zad Dr. Karin Sigloch Simon Sthler

Fabrication of Large Area Plasmonic Grating using Laser Interference Fatemeh Hosseini Alast,

Modeling and Mitigating the Coremelt Attack Guosong Yang 1 , Hossein Hosseini 2 , Dinuka Sahabandu

Tut#15-16: Pandas/Numpy CPSC 501 Dr. J. Hudson University of Calgary Arshia Hosseini T01/T02

Retirement Financing: An Optimal Reform Approach Roozbeh Hosseini Ali Shourideh University of

Beaverlodge case study Phase 1 Ali Hosseini EMRAS II TM, IAEA HQ, Vienna 25-29 January 2010

Mehdi Hosseini Dr. Caroline Sporleder Saarland University 1 - PowerPoint PPT Presentation

By:David K. Elson and Kathleen R. McKeown Columbia University Mehdi Hosseini Dr. Caroline Sporleder Saarland University 1 Abstract Quoted speech: a block of text within a paragraph falling between quotation marks). We will see a

An Introduction to Formal Concept Analysis Mehdi Kaytoue Mehdi Kaytoue

A Novel Synthesis Algorithm for Reversible Circuits Mehdi Saeedi, Mehdi Sedighi*, Morteza Saheb

A Cycle-Based Synthesis algorithm for Reversible Logic Zahra Sasanian*, Mehdi Saeedi, Mehdi

Be secret like a ninja with Mehdi LARUELLE Hashicorp Vault @D2SI Whoami ? D2SI Me Mehdi

Bits from the DPL Mehdi Dogguy mehdi@debian.org July 5th, 2016 DebConf16 Cape Town, South

Hotspot Mapper for World War II Unlocking the Secrets of the Past: Text Mining for Historical

OpenStack as a Software Factory Mehdi Abaakouk Nick Barcet mehdi@enovance.com

The presentation of a new type of quantum calculus Abdolali Neamaty a and Mehdi Tourani b

Supersymmetric localization and black holes microstates Seyed Morteza Hosseini Kavli IPMU YITP

T8: NodeJS CPSC 513 Dr. P. Federl University of Calgary Arshia Hosseini T01/T02 What is

Smarter Electric Power Grid Smarter Electric Power Grid Mehdi Etezadi-Amoli, PhD. PE Mehdi

Introduction to Multiagent Systems Mehdi Dastani BBL-521 m.m.dastani@uu.nl Webpage:

The Artificially Intelligent Pharma &amp; Healthcare Sector M. Morris Hosseini, MSc, PhD

Software Archeology Mehdi Mirakhorli, Jane Cleland Huang DePaul University Contact me:

Recurrence and Orbit Equivalence Maryam Hosseini University of Ottawa A work under progress with

with Heat Exchanger Naseh Hosseini The 3 rd TANGO Meeting and Workshop KTH Royal Institute of

Tut#14: A4 Prerequisites CPSC 501 Dr. J. Hudson University of Calgary Arshia Hosseini T01/T02

TWO SUPERLATIVE METHODOLOGIES: PICTURE-AIDED TRANSLATION &amp; STORYBOARDS u Golsa Nouri-Hosseini

Management of Large Seismic Datasets S. Kasra Hosseini Zad Dr. Karin Sigloch Simon Sthler

Fabrication of Large Area Plasmonic Grating using Laser Interference Fatemeh Hosseini Alast,

Modeling and Mitigating the Coremelt Attack Guosong Yang 1 , Hossein Hosseini 2 , Dinuka Sahabandu

Tut#15-16: Pandas/Numpy CPSC 501 Dr. J. Hudson University of Calgary Arshia Hosseini T01/T02

Retirement Financing: An Optimal Reform Approach Roozbeh Hosseini Ali Shourideh University of

Beaverlodge case study Phase 1 Ali Hosseini EMRAS II TM, IAEA HQ, Vienna 25-29 January 2010

The Artificially Intelligent Pharma & Healthcare Sector M. Morris Hosseini, MSc, PhD

TWO SUPERLATIVE METHODOLOGIES: PICTURE-AIDED TRANSLATION & STORYBOARDS u Golsa Nouri-Hosseini