Testing the robustness of online word segmentation: Effects of - PowerPoint PPT Presentation

Testing the robustness of online word segmentation: Effects of linguistic diversity and phonetic variation e 1 & Emmanuel Dupoux 2 Luc Boruta 1,2 , Sharon Peperkamp 2 , Benoˆ ıt Crabb´ luc.boruta@inria.fr 1 ALPAGE, Univ. Paris 7 & INRIA 2 LSCP–DEC, EHESS, ENS & CNRS CMCL — June 23, 2011

Yet another study on word segmentation... What this work is not about • New models of word segmentation. What this work is about • The acquisition of word segmentation ; • The acquisition of phonological knowledge ; • Interactions between the two. Boruta et al. | Testing the robustness of online word segmentation 1 / 17

Word segmentation vs. allophonic rules French devoicing allophonic rule � [ X ] before a voiceless consonant / r / → [ K ] otherwise Consequence � [kana X flot˜ A] , canard flottant /kana r / → [kana K Zon] , canard jaune Boruta et al. | Testing the robustness of online word segmentation 2 / 17

Word segmentation The task • Input: /@wUdÙ2kwUdÙ2kwUd/ • Output: /@ wUdÙ2k wUd Ù2k wUd/ Phonemic transcripts = idealized input • Models are typically evaluated using phonemic transcripts; • Assumption: kids know how to undo allophony /coarticulation. Boruta et al. | Testing the robustness of online word segmentation 3 / 17

Related work Rytting, Brew & Fosler-Lussier (2010) • Input unit: probability vector over a finite set of symbols; • Symbols: limited to the phonemic inventory. Daland & Pierrehumbert (2010) • Input: phonemic transcripts, conversational reduction processes; • Reduction processes: implemented by hand; • Transcripts: adult-directed speech. Boruta et al. | Testing the robustness of online word segmentation 4 / 17

Which segmentation models? Desirable properties [Brent, 1999; Gambell & Yang, 2004] • Start without any knowledge specific to a particular language; • Learn in an unsupervised manner and operate incrementally. Which segmentation models? • MDBP-1: Brent, 1999; • NGS-u: Venkataraman, 2001; • Two random baselines. Boruta et al. | Testing the robustness of online word segmentation 5 / 17

Evaluation Now-standard evaluation protocol [Brent, 1999; Goldwater et al., 2009] • Gold standard: orthographic segmentation; • Precision, recall and F-score on the word segmentation; • Precision, recall and F-score on the induced lexicon. Lexicon Segmentation @ wUdÙ2k wUd Ù2k wUd ✓ ✓ @ wUd Ù2k wUdÙ2k wUd ✓ ✗ Boruta et al. | Testing the robustness of online word segmentation 6 / 17

Experimental setup CHILDES corpora of child-directed speech [MacWhinney, 2000] • Derived from transcribed adult-child verbal interactions; • Phonemic transcriptions, orthographic segmentation. English French Japanese Utterance tokens 10k 10k 10k Word tokens 33k 51k 27k Phoneme tokens 96k 121k 103k Phoneme types 50 35 49 Boruta et al. | Testing the robustness of online word segmentation 7 / 17

Cross-linguistic evaluation on phonemic corpora Segmentation F−score FR EN JP 0 10 20 30 40 50 60 70 80 90 Lexicon F−score FR EN MBDP−1 NGS−u Random + JP Random 0 10 20 30 40 50 60 70 80 90 Boruta et al. | Testing the robustness of online word segmentation 8 / 17

Cross-linguistic evaluation on phonemic corpora Segmentation F−score Lexicon F−score FR EN FR EN MBDP−1 NGS−u Random + JP JP Random 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 • Blame it on the data? • Rich morphology (e.g. French clitics)? Hapax rate? • Relative importance of different cues? Boruta et al. | Testing the robustness of online word segmentation 9 / 17

Effects of phonetic variation Phonemic transcripts = idealized input • Models are typically evaluated using phonemic transcripts; • Assumption: kids know how to undo allophony /coarticulation. Corpora and allophonic rules • No phonetic transcripts of child-directed speech are available; • How many allophones do infants have to learn? • Where is the limit between allophony and mere coarticulation? Boruta et al. | Testing the robustness of online word segmentation 10 / 17

Experimental setup Emulating rich phonetic transcriptions [Boruta, 2011a] • Apply artificial allophonic rules to phonemic corpora; • Benchmark models using different allophonic complexities ; • Control the size of the allophonic grammar. Simplifying assumptions [LeCalvez, 2007; Boruta, 2011a] • We only model monolateral rules: p → a / c • No two rules introduce the same phone: [R] /t/ and [R] /d/ Boruta et al. | Testing the robustness of online word segmentation 11 / 17

Lexical complexity ∝ allophonic complexity English French Lexical complexity 3 2 Japanese 1 1 5 10 15 20 Allophonic complexity Boruta et al. | Testing the robustness of online word segmentation 12 / 17

Results: English Segmentation F−score Lexicon F−score 70 70 MBDP−1 NGS−u Random 60 60 Random + 50 50 40 40 30 30 20 20 10 10 0 5 10 15 20 25 0 5 10 15 20 25 Boruta et al. | Testing the robustness of online word segmentation 13 / 17

Results: French Segmentation F−score Lexicon F−score 70 70 MBDP−1 NGS−u Random 60 60 Random + 50 50 40 40 30 30 20 20 10 10 0 5 10 15 20 25 0 5 10 15 20 25 Boruta et al. | Testing the robustness of online word segmentation 14 / 17

Results: Japanese Segmentation F−score Lexicon F−score 70 70 MBDP−1 NGS−u Random 60 60 Random + 50 50 40 40 30 30 20 20 10 10 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Boruta et al. | Testing the robustness of online word segmentation 15 / 17

Effects of phonetic variation Lexicon F−score Lexicon F−score Lexicon F−score 70 70 70 MBDP−1 MBDP−1 MBDP−1 NGS−u NGS−u NGS−u Random Random Random 60 60 60 Random + Random + Random + 50 50 50 40 40 40 30 30 30 20 20 20 10 10 10 0 5 10 15 20 25 0 5 10 15 20 25 0 2 4 6 8 10 12 Unsurprising results • No mechanism for ‘explaining away’ allophonic variation; • Any word form found by the models will be added to the lexicon. Boruta et al. | Testing the robustness of online word segmentation 16 / 17

Testing the robustness of online word segmentation: Effects of - PowerPoint PPT Presentation

Testing the robustness of online word segmentation: Effects of linguistic diversity and phonetic variation e 1 & Emmanuel Dupoux 2 Luc Boruta 1,2 , Sharon Peperkamp 2 , Beno t Crabb luc.boruta@inria.fr 1 ALPAGE, Univ. Paris 7 &

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

UCSD Robustness Summer School David Donoho 20190812 David Donoho UCSD Robustness Summer School

Robustness? Robustness ? Robustness?

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Word Segmentation and their Integration in Machine Translation Advanced MT Seminar ThuyLinh

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

Robustness and Generalization Huan Xu The University of Texas at Austin Department of Electrical

Where Are We? Lecture 9 Robustness through Training 1 Robustness Explicit Handling of Noise

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Co-Segmentation of 3D Shapes via Subspace Clustering Ruizhen Hu Lubin Fan

Introduction to RFM segmentation Karolis Urbonas Head of Data Science, Amazon DataCamp

Looking for exemplar effects: testing the comprehension and memory ry representations of f r'

Saint Oscar Romero 1917-1980 Year 4 Gods People Saint Oscar Romero 1917-1980 A

Improved Modeling of Cross-Decoder Phone Co-occurrences in SVM-based Phonotactic Language

Computer Supported Human-Human Multilingual Communication February 29, 2008 Alex Waibel

Matrix Calculus Marco Chiarandini (marco@imada.sdu.dk) Department of Mathematics and Computer

SPS Presents: A Cosmic Lunch! Who: Dr. Brown will be speaking about Evolution of the

Basic Elec. Engr Basic Elec. Engr. Lab . Lab ECS 204 ECS 204 Asst. Prof. Dr. Prapun Suksompong

QUAD development/ & testbox Fred Hartjes NIKHEF Nikhef/Bonn LepCol meeting February 11,

Testing the robustness of online word segmentation: Effects of - PowerPoint PPT Presentation

Testing the robustness of online word segmentation: Effects of linguistic diversity and phonetic variation e 1 & Emmanuel Dupoux 2 Luc Boruta 1,2 , Sharon Peperkamp 2 , Beno t Crabb luc.boruta@inria.fr 1 ALPAGE, Univ. Paris 7 &

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

UCSD Robustness Summer School David Donoho 20190812 David Donoho UCSD Robustness Summer School

Robustness? Robustness ? Robustness?

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Word Segmentation and their Integration in Machine Translation Advanced MT Seminar ThuyLinh

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

Robustness and Generalization Huan Xu The University of Texas at Austin Department of Electrical

Where Are We? Lecture 9 Robustness through Training 1 Robustness Explicit Handling of Noise

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Co-Segmentation of 3D Shapes via Subspace Clustering Ruizhen Hu Lubin Fan

Introduction to RFM segmentation Karolis Urbonas Head of Data Science, Amazon DataCamp

Looking for exemplar effects: testing the comprehension and memory ry representations of f r'

Saint Oscar Romero 1917-1980 Year 4 Gods People Saint Oscar Romero 1917-1980 A

Improved Modeling of Cross-Decoder Phone Co-occurrences in SVM-based Phonotactic Language

Computer Supported Human-Human Multilingual Communication February 29, 2008 Alex Waibel

Matrix Calculus Marco Chiarandini (marco@imada.sdu.dk) Department of Mathematics and Computer

SPS Presents: A Cosmic Lunch! Who: Dr. Brown will be speaking about Evolution of the

Basic Elec. Engr Basic Elec. Engr. Lab . Lab ECS 204 ECS 204 Asst. Prof. Dr. Prapun Suksompong

QUAD development/ &amp; testbox Fred Hartjes NIKHEF Nikhef/Bonn LepCol meeting February 11,

QUAD development/ & testbox Fred Hartjes NIKHEF Nikhef/Bonn LepCol meeting February 11,