gramophone A hybrid approach to grapheme-phoneme conversion - PowerPoint PPT Presentation

gramophone – A hybrid approach to grapheme-phoneme conversion Kay-Michael W¨ urzner, Bryan Jurish { wuerzner,jurish } @bbaw.de FSMNLP Universit¨ at D¨ usseldorf 24th June 2015 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Overview The Task p Finding the pronunciation of a word given its spelling The Challenge: Ambiguity p a phoneme may be realized by different characters p a character may be represented by different phonemes Our Approach: A combination of p a hand-crafted rule set controlling segmentation and alignment, p a conditional random field model for generating transcription candidates, and p an N -gram language model for selecting the “best” grapheme-phoneme mapping 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Outline 1. Grapheme-phoneme conversion and its applications 2. Existing approaches 3. The gramophone approach (a) Alignment/Encoding (b) Transcription (c) Rating 4. Comparative evaluation and error analysis 5. Discussion & Outlook 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Grapheme-phoneme conversion: Problem description p Symbolic representation of the pronunciation of words p Orthography is ambiguous w.r.t. pronunciation, phonetic alphabets allow for an unambiguous representation c ow /kaU “/ cr ow /kôoU “/ p Complex alignment: Single characters may be represented by multiple phonemes (and vice versa ) ph oe n i x f i: n I ks 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Grapheme-phoneme conversion: Applications Text-to-speech systems (Black & Taylor 1997) p Improvement of speech signal synthesis by disambiguation of the input text Spelling correction / “canonicalization” (Jurish 2010) p Phonetic transcriptions as a normal form for identifying spelling variants Speech recognition (Galescu and Allen 2002) p Inverse application of g2p models Pronunciation dictionaries (TC-Star project; DWDS) p Generation of transcriptions or transcriptions candidates especially in compounding languages 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Previous work: Rule-based approaches p Inspired by The Sound Pattern of English (Chomsky & Halle 1968) p Equivalent to regular grammars and rewriting systems (Johnson 1972) p Successful model for g2p converters in many languages p Used in various text-to-speech systems, e.g. • MITalk (Allen et al. 1987) • TETOS (Wothke 1993) • festival (Taylor et al. 1998) p Drawbacks: • Expertise and effort required in their production and maintenance Versaillesdiktat • Treatment of exceptional pronunciation /vEKzaI “dIkta:t/ e.g. in loan words (or even worse com- engl. ‘Versailles diktat’ pounds of foreign and native words) 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Previous work: Statistical approaches p Automatic inference of regularities in the correspondence of spellings and pronunciations from data (i.e. word+transcription pairs) p Many large data sets exist • NETTalk • CELEX • wiktionary p Many more existing approaches (cf. Reichel et al. 2008) • Neural networks (Sejnowski & Rosenberg 1987) • Joint-sequence N -gram models (Bisani & Ney 2008) • Conditional random fields (Jiampojamarn & Kondrak 2009) p Drawback: • No direct control of results, linguisti- Getue �→ * /g@Ù@/ cally implausible transcriptions may be inferred engl. ‘fuss’ 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Alignment Starting point p Association of transcriptions with entire words � Alignment on the grapheme-substring level necessary p n : m relation between grapheme-phoneme string pairs n, m ∈ N \ { 0 } 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Alignment Starting point p Association of transcriptions with entire words � Alignment on the grapheme-substring level necessary p n : m relation between grapheme-phoneme string pairs n, m ∈ N \ { 0 } ph oe n i x � � � � � f i: n I ks 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Alignment Starting point p Association of transcriptions with entire words � Alignment on the grapheme-substring level necessary p n : m relation between grapheme-phoneme string pairs n, m ∈ N \ { 0 } Approaches p Numerous existing alignment methods (cf. Reichel 2012) p Simplify the n : m relation to a more tractable case n, m ∈ { 0 , 1 } ph oe n i x � � � � � f i: n I ks 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Alignment Starting point p Association of transcriptions with entire words � Alignment on the grapheme-substring level necessary p n : m relation between grapheme-phoneme string pairs n, m ∈ N \ { 0 } Approaches p Numerous existing alignment methods (cf. Reichel 2012) p Simplify the n : m relation to a more tractable case n, m ∈ { 0 , 1 } p h o e n i x ε � � � � � � � � f i: n I k s ε ε 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Alignment Starting point p Association of transcriptions with entire words � Alignment on the grapheme-substring level necessary p n : m relation between grapheme-phoneme string pairs n, m ∈ N \ { 0 } Approaches p Numerous existing alignment methods (cf. Reichel 2012) p Simplify the n : m relation to a more tractable case n, m ∈ { 0 , 1 } p Application of some Levenshtein-like mechanism (Levenshtein, 1966) p h o e n i x ε � � � � � � � � f i: n I k s ε ε 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Alignment Alternatives? p Deletion doubtful in the context of grapheme-phoneme correspondence p Inference of many-to-many alignments error-prone (Jiampojamarn et al. 2007) p Linguistically motivated alignment desirable Constraint-based alignment p Manual definition of possible mappings between grapheme sequences and M ⊂ (Σ + G × Σ + phonemic realizations P ) p Compiled as FST E = � Q, Σ G ∪ {|} , Σ P ∪ { } , q 0 , q 0 , δ � • Add a path ( q 0 , q 0 , g · | , p · ) for each mapping ( g, p ) ∈ M • ‘ | ’ and ‘ ’ are reserved delimiter symbols p Generate all admissible segmentations of a word and its transcription • FST I G with a path ( q 0 , q 0 , g, g · | ) for every g in the domain of M • FST I P with a path ( q 0 , q 0 , p, p · ) for every p in the codomain of M 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Alignment p Construct letter FSTs W and T for a word w and its transcription t p Alignment of w and t is generated by a series of compositions which filters out all non-matching pairings A W,T = π 2 ( W ◦ I G ) ◦ E ◦ π 2 ( T ◦ I P ) Example M = { u: /u/ , u: /u:/ , u: /ju:/ , uu: /u:/ } 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Alignment p Construct letter FSTs W and T for a word w and its transcription t p Alignment of w and t is generated by a series of compositions which filters out all non-matching pairings A W,T = π 2 ( W ◦ I G ) ◦ E ◦ π 2 ( T ◦ I P ) Example M = { u: /u/ , u: /u:/ , u: /ju:/ , uu: /u:/ } �� E 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Alignment p Construct letter FSTs W and T for a word w and its transcription t p Alignment of w and t is generated by a series of compositions which filters out all non-matching pairings A W,T = π 2 ( W ◦ I G ) ◦ E ◦ π 2 ( T ◦ I P ) Example M = { u: /u/ , u: /u:/ , u: /ju:/ , uu: /u:/ } �� I G E 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Alignment p Construct letter FSTs W and T for a word w and its transcription t p Alignment of w and t is generated by a series of compositions which filters out all non-matching pairings A W,T = π 2 ( W ◦ I G ) ◦ E ◦ π 2 ( T ◦ I P ) Example M = { u: /u/ , u: /u:/ , u: /ju:/ , uu: /u:/ } �� I G E I P 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Alignment Extended mappings p Procedure allows for more complex mappings, i.e. context restriction p Treatment of multiple alignments: matinee : matine: m a t i ne e � � � � � � m a t i n e: � Conflicting rules may be disambiguated using lookahead conditions Segmentation p I G is used to generate possible grapheme level segmentations for subsequent transcription at runtime 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

Alignment Extended mappings p Procedure allows for more complex mappings, i.e. context restriction p Treatment of multiple alignments: matinee : matine: m a t i n ee � � � � � � m a t i n e: � Conflicting rules may be disambiguated using lookahead conditions Segmentation p I G is used to generate possible grapheme level segmentations for subsequent transcription at runtime 2015-06-24 / FSMNLP / Universit¨ at D¨ usseldorf

gramophone A hybrid approach to grapheme-phoneme conversion - PowerPoint PPT Presentation

gramophone A hybrid approach to grapheme-phoneme conversion Kay-Michael W urzner, Bryan Jurish { wuerzner,jurish } @bbaw.de FSMNLP Universit at D usseldorf 24th June 2015 2015-06-24 / FSMNLP / Universit at D usseldorf

Phonological Constraints and Morphological Preprocessing for Grapheme-to-Phoneme Conversion Vera

More on Speech More on Speech Perception Perception Phoneme Phoneme Discrimination

Activity 1 Word Classes Starter spellings augh/ough The grapheme ough is a very

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

The PRONALSYL Letter-to-Phoneme Challenge Bob Damper and Yannick Marchand University

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

CMPT 413 Computational Linguistics Anoop Sarkar http://www.cs.sfu.ca/~anoop Finite-state

Parent In Information Session 2016 Graphemic Phonemic Awareness Awareness What is THRASS?

Year 1 Phonics Screening Check WEEK BEGINNING MONDAY 8 TH JUNE 2020 Phonics Vocabulary

Dual-route theory of word reading Systematic spelling-sound knowledge takes the form of

A Hybrid, Dynamic Logic for Hybrid-Dynamic Information Flow Brandon Bohrer and Andr e Platzer

Issues and Strategies in Annotation at Phoneme Level Mahwish Farooq Phonological labeling

Language Technology: Research and Development Language Technology Research and Development Sara

r r s t r s

Old English Verbs: Survival Kit P . S. Langeslag Present-Day English Tense Formation Table 1: A

Extending your value beyond the research deliverable Dolly Goulart Qualcomm Library &

Responsive Analytics of Highly-Connected Big Data Dr. Peter Janacik, peter.janacik@tu-berlin.de

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Visualizing Sensor Data Hauptseminar Information Visualization - Wintersemester 2008/2009"

Wild Cards and Bounds Based on the notes from David Fernandez-Baca and Steve Kautz Bryn Mawr

gramophone A hybrid approach to grapheme-phoneme conversion - PowerPoint PPT Presentation

gramophone A hybrid approach to grapheme-phoneme conversion Kay-Michael W urzner, Bryan Jurish { wuerzner,jurish } @bbaw.de FSMNLP Universit at D usseldorf 24th June 2015 2015-06-24 / FSMNLP / Universit at D usseldorf

Phonological Constraints and Morphological Preprocessing for Grapheme-to-Phoneme Conversion Vera

More on Speech More on Speech Perception Perception Phoneme Phoneme Discrimination

Activity 1 Word Classes Starter spellings augh/ough The grapheme ough is a very

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

The PRONALSYL Letter-to-Phoneme Challenge Bob Damper and Yannick Marchand University

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

CMPT 413 Computational Linguistics Anoop Sarkar http://www.cs.sfu.ca/~anoop Finite-state

Parent In Information Session 2016 Graphemic Phonemic Awareness Awareness What is THRASS?

Year 1 Phonics Screening Check WEEK BEGINNING MONDAY 8 TH JUNE 2020 Phonics Vocabulary

Dual-route theory of word reading Systematic spelling-sound knowledge takes the form of

A Hybrid, Dynamic Logic for Hybrid-Dynamic Information Flow Brandon Bohrer and Andr e Platzer

Issues and Strategies in Annotation at Phoneme Level Mahwish Farooq Phonological labeling

Language Technology: Research and Development Language Technology Research and Development Sara

r r s t r s

Old English Verbs: Survival Kit P . S. Langeslag Present-Day English Tense Formation Table 1: A

Extending your value beyond the research deliverable Dolly Goulart Qualcomm Library &amp;

Responsive Analytics of Highly-Connected Big Data Dr. Peter Janacik, peter.janacik@tu-berlin.de

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Visualizing Sensor Data Hauptseminar Information Visualization - Wintersemester 2008/2009&quot;

Wild Cards and Bounds Based on the notes from David Fernandez-Baca and Steve Kautz Bryn Mawr

Extending your value beyond the research deliverable Dolly Goulart Qualcomm Library &

Visualizing Sensor Data Hauptseminar Information Visualization - Wintersemester 2008/2009"