Automatic Disfluency Automatic Disfluency Detection in Multi-party - PowerPoint PPT Presentation

German Research Center for Artificial Intelligence GmbH Automatic Disfluency Automatic Disfluency Detection in Multi-party Detection in Multi-party www.amiproject.org Conversations Conversations Feast, 30th September 2009 , 30th September 2009 Feast Sebastian Germesin Germesin Sebastian

Outline Outline • Motivation • Theoretical Background www.amiproject.org • Data (AMI Corpus) • Disfluency Detection System • Hybrid Classification Approach • Self-arranging Modules • Experimental Results • Conclusions & Outlook German Research Center for Artificial Intelligence GmbH 2 Sebastian Germesin September 09

Motivation Motivation Example Example www.amiproject.org German Research Center for Artificial Intelligence GmbH 3 Sebastian Germesin September 09

Motivation Motivation • Have to detect (and clean) disfluencies in the transcribed speech www.amiproject.org • Readability • Transcription • Extractive Summarization • Post-Processing • NLP-systems’ performance drop when faced with disfluent speech • Human detector? • Too expensive! • Too slow! ⇒ Automatic Detection System! German Research Center for Artificial Intelligence GmbH 4 Sebastian Germesin September 09

Theoretical Background Theoretical Background Definition Definition www.amiproject.org “Disfluencies are syntactical and grammatical [speech] errors that occur in spoken but not in written language.” [Besser, 2006] German Research Center for Artificial Intelligence GmbH 5 Sebastian Germesin September 09

Theoretical Background Theoretical Background Terminology Terminology www.amiproject.org “The cat uh the dog sneaks around the corner.” German Research Center for Artificial Intelligence GmbH 6 Sebastian Germesin September 09

Theoretical Background Theoretical Background Terminology Terminology www.amiproject.org “The cat uh the dog sneaks around the corner.” Reparandum German Research Center for Artificial Intelligence GmbH 7 Sebastian Germesin September 09

Theoretical Background Theoretical Background Terminology Terminology www.amiproject.org Interregnum “The cat uh the dog sneaks around the corner.” Reparandum German Research Center for Artificial Intelligence GmbH 8 Sebastian Germesin September 09

Theoretical Background Theoretical Background Terminology Terminology www.amiproject.org Interregnum complex “The cat uh the dog sneaks around the corner.” Reparandum Reparans German Research Center for Artificial Intelligence GmbH 9 Sebastian Germesin September 09

Theoretical Background Theoretical Background Terminology Terminology www.amiproject.org simple “The d dog sneaks around the corner.” Reparandum German Research Center for Artificial Intelligence GmbH 10 Sebastian Germesin September 09

Theoretical Background Theoretical Background All Types All Types www.amiproject.org Simple disfluencies German Research Center for Artificial Intelligence GmbH 11 Sebastian Germesin September 09

Data Data quantitative quantitative • AMI meeting corpus www.amiproject.org • 135 meetings (~ 100 hours speech) • 4 participants • task: design a remote control • freely interaction • Many annotations, e.g.: • Transcribed speech • Dialogue acts • Gestures • ... German Research Center for Artificial Intelligence GmbH 12 Sebastian Germesin September 09

Data Data quantitative quantitative • 45 meeting enriched with disfluency www.amiproject.org annotation • 31,000 Disfluencies • 15.8% erroneous words • 41.5% disfluent Dialogue Acts • 80% (33) for training • 20% (12) for evaluation German Research Center for Artificial Intelligence GmbH 13 Sebastian Germesin September 09

Data Data qualitative qualitative • Discovered a heterogeneity towards the www.amiproject.org strictness of different disfluency types 1. Some disfluencies have strict structure • ex.: Repetition : “The cat the cat plays “ 2. Some other disfluencies have also strict structure but this structure is very common in natural language • ex.: Replacement : “The dog the cat plays“ • ex.: Fluent : “The dog the cat and the bird play” 3. Some other disfluencies have no obvious structure • ex.: Disruptions : “The dog the cat and“ • ex.: Order : “The plays cat” German Research Center for Artificial Intelligence GmbH 14 Sebastian Germesin September 09

Automatic System Automatic System Design Question Design Question www.amiproject.org • Can we leverage the heterogeneity of disfluencies for their detection? → Yes! → Use modules for subsets of disfluencies → Use different feature-sets for each module (depending on the disfluency types) → Find “optimal” classifier for each module German Research Center for Artificial Intelligence GmbH 15 Sebastian Germesin September 09

Automatic System Automatic System Hybrid Modules Hybrid Modules • SHS: www.amiproject.org • Stuttering, Hesitation, Slip-of-the-Tongue • REP: • Repetition • DNE: • Discourse Marker, Explicit Editing Term • DEL: • Deletion • REV: • Insertion, Replacement, Restart, Other German Research Center for Artificial Intelligence GmbH 16 Sebastian Germesin September 09

How to combine the modules? combine the modules? How to www.amiproject.org German Research Center for Artificial Intelligence GmbH 17 Sebastian Germesin September 09

Training Process Training Process Self-arranging Modules Self-arranging Modules • Immense search space www.amiproject.org • #( modules ) * #( classifier ) * placeInSystem • Solution(s): • Old system: • Choosen manually • Current system: • Automatically trained 1.Use greedy hill-climbing – Use weight for errors to improve Precision! 2.Reduce classifier library – Take 10% results in maximal performance loss of 2.3% (depending on the module) German Research Center for Artificial Intelligence GmbH 18 Sebastian Germesin September 09

GroDi GroDi Greedy Hill-Climbing Greedy Hill-Climbing www.amiproject.org German Research Center for Artificial Intelligence GmbH 19 Sebastian Germesin September 09

Training Process Training Process Self-arranging Modules Self-arranging Modules • Immense search space www.amiproject.org • #( modules ) * #( classifier ) * placeInSystem • Solution(s): • Old system: • Choosen manually • Current system: • Automatically trained 1.Use greedy hill-climbing – Use weight for errors to improve Precision! 2.Reduce classifier library – Take 10% results in maximal performance loss of 2.3% (depending on the module) German Research Center for Artificial Intelligence GmbH 20 Sebastian Germesin September 09

GroDi GroDi Performance-Curve of J48 Performance-Curve of J48 www.amiproject.org Best : J48 "-L -U -M 2 -A" German Research Center for Artificial Intelligence GmbH 21 Sebastian Germesin September 09

Experimental Results Experimental Results www.amiproject.org Train. Eval. System Accuracy avg. F1 RT-factor data data 6 m. 90.3 % 85.7 % baseline -- 0.00 12 m. 88.6 % 83.3 % old 22 m. 6 m. 92.9 % 90.5 % 0.42 22 m. 6 m. 95.3 % 94.8 % 6 m. 95.1 % 94.7 % new 0.11 33 m. 12 m. 94.5 % 93.5 % German Research Center for Artificial Intelligence GmbH 22 Sebastian Germesin September 09

Conclusions Conclusions • Aims: • Development of a system that automatically www.amiproject.org detects a broad set of disfluencies • Fully automatic learning process • Robust and Fast • Achievements: • Stand-alone tool for detection of disfluencies: GroDi - Get rid of Disfluencies • Self-arranging modules • Detection rate: 95% Accuracy • Real-time factor of 0.11 German Research Center for Artificial Intelligence GmbH 23 Sebastian Germesin September 09

Outlook Outlook www.amiproject.org • Develop module(s) for the detection of Mistake, Order, Omission • Embed other learning approaches, e.g.: • Conditional Random Fields • HMMs • Use other corpus like, e.g., Switchboard German Research Center for Artificial Intelligence GmbH 24 Sebastian Germesin September 09

Thank you! Thank you! www.amiproject.org German Research Center for Artificial Intelligence GmbH 25 Sebastian Germesin September 09

Demo? Demo? www.amiproject.org German Research Center for Artificial Intelligence GmbH 26 Sebastian Germesin September 09

GroDi GroDi Diff. Module Arrangements Diff. Module Arrangements www.amiproject.org German Research Center for Artificial Intelligence GmbH 27 Sebastian Germesin September 09

GroDi GroDi www.amiproject.org  Used technologies  WEKA toolkit for machine learning  Maximum Entropy classifier from Stanford NLP group  CRF Tagger from http://crftagger.sourceforge.net/  Features for machine learning:  Lexical: words, lexical parallelism, (POS-Tags)  Prosodic: duration, pauses, pitch, energy  Dynamic: disfluency types of surrounding words  Speaker: age, role in meeting, native language German Research Center for Artificial Intelligence GmbH 28 Sebastian Germesin September 09

Automatic Disfluency Automatic Disfluency Detection in Multi-party - PowerPoint PPT Presentation

German Research Center for Artificial Intelligence GmbH Automatic Disfluency Automatic Disfluency Detection in Multi-party Detection in Multi-party www.amiproject.org Conversations Conversations Feast, 30th September 2009 , 30th September

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Corpus Creation for Disfluency Research Stephanie Strassel Linguistic Data Consortium

A Three-stage Disfluency Classifier for Multi Party Dialogues Margot Mieskes 1 and Michael Strube

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Automatic Defect Detection Andrzej Wasylkowski Overview Automatic Defect Detection

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Automatic Key Detection Computer Music Seminar Leon Wittwer June 28, 2017 Table of Contents

Perimeter Intrusion Detection Mikro Tek Detection Technologies Ltd | +44 (0) 1773 744750 |

Collision Detection Collision detection weaknesses Naive collision detection suffers from 3 known

Local features: detection and description detection and description Kristen Grauman UT Austin

Detection, Segmentation Overview Object Detection deer cat Object Detection as Classification

Automatic Enrollment and Automatic IRAs David C. John The Heritage Foundation The Retirement

Automatic Registration and Calibration Automatic Registration and Calibration Automatic

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Timing Behavior Anomaly Detection for Automatic Failure Detection and Diagnosis Research visit at

Pipeline leak detection eLearning Part 1 of 2 Please turn on your speakers Historical

Writing Effective Language Acquisition Plans 2017 SD State-Wide Title III Consortium Black Hills

Emily Kate Sam Brendan Naty Chaz HCC elders Steve Dan Nolan Ben Bryan Brooke Drew

PHILOSOPHY AND/OF SEMANTICS Martin Stokhof Logic, Language and Computation October 2, 2011

01Language Processing and Inductive Definitions CS4215: Programming Language Implementation

5/25/2016 T he Gua rdia n a d L ite m Wo rking with the Yo ung Child Ne b ra ska Yo ung

ENVIRONMENTAL GEOTECHNICS CE-488 Lecture No. 18 Prof. D N Singh Department of Civil Engineering

Programming Models and Runtime Systems for Heterogeneous Architectures Sylvain Henry

Psychological Abuse NCEA Elder Abuse Presentation: Psychological Abuse www.ncea.aoa.gov 1

Sambuz

Useful Links

Newsletter

Mail Us