Improved subword modeling for WFST-based speech recognition Peter Smit, Sami Virpioja, Mikko Kurimo Aalto University, Department of Signal Processing and Acoustics August 23, 2017
Research questions Subword modeling WFST implementation Experiments How to do sound WFST modeling for subwords? Recap How to reconstruct words from subwords? Future work What is a good subword vocabulary? Size of vocabulary? Segmentation method? Improved subword modeling for WFST-based speech recognition 2/21 Peter Smit August 23, 2017 Aalto University
How big is your vocabulary? Subword modeling WFST # Word forms implementation WSJ small LM 5.000 Experiments WSJ big LM 20.000 Recap Future work Native English Speaker 20.000 – 35.000 CMU dict 134.000 Improved subword modeling for WFST-based speech recognition 3/21 Peter Smit August 23, 2017 Aalto University
How big is your vocabulary? Subword modeling WFST # Word forms implementation WSJ small LM 5.000 Experiments WSJ big LM 20.000 Recap Future work Native English Speaker 20.000 – 35.000 CMU dict 134.000 Finnish Adult >1.000.000 Finnish Text Collection >4.000.000 Improved subword modeling for WFST-based speech recognition 3/21 Peter Smit August 23, 2017 Aalto University
Is a big vocabulary a problem? Subword modeling WFST implementation Current systems do support vocabularies >4M Experiments Recap But: Future work Out of vocabulary problems Data sparsity – valid words might only appear once Dimensionality problems (e.g. RNNLM input/output layers) Improved subword modeling for WFST-based speech recognition 4/21 Peter Smit August 23, 2017 Aalto University
Subword modeling Subword modeling WFST implementation Experiments Split words into smaller units Recap Future work Reduces vocabulary size Split either knowlegde-driven (e.g. grammatical morphs) or data-driven (e.g. Morfessor) Improved subword modeling for WFST-based speech recognition 5/21 Peter Smit August 23, 2017 Aalto University
Subword marking and reconstruction Subword modeling WFST implementation Style (abbreviation) Example Experiments Recap boundary tag (<w>) <w> two <w> slipp er s <w> Future work left-marked (+m) two slipp +er +s right-marked (m+) two slipp+ er+ s left+right-marked (+m+) two slipp+ +er+ +s Improved subword modeling for WFST-based speech recognition 6/21 Peter Smit August 23, 2017 Aalto University
Subword marking and reconstruction Subword modeling WFST Style (abbreviation) Example implementation Experiments boundary tag (<w>) <w> two <w> slipp er s <w> Recap left-marked (+m) two slipp +er +s Future work right-marked (m+) two slipp+ er+ s left+right-marked (+m+) two slipp+ +er+ +s two <w> slipp er s <w> Vocab size V + 1 Improved subword modeling for WFST-based speech recognition 6/21 Peter Smit August 23, 2017 Aalto University
Subword marking and reconstruction Subword modeling WFST Style (abbreviation) Example implementation Experiments boundary tag (<w>) <w> two <w> slipp er s <w> Recap left-marked (+m) two slipp +er +s Future work right-marked (m+) two slipp+ er+ s left+right-marked (+m+) two slipp+ +er+ +s +two slipp +er +s Vocab size 2 V Improved subword modeling for WFST-based speech recognition 6/21 Peter Smit August 23, 2017 Aalto University
Subword marking and reconstruction Subword modeling WFST Style (abbreviation) Example implementation Experiments boundary tag (<w>) <w> two <w> slipp er s <w> Recap left-marked (+m) two slipp +er +s Future work right-marked (m+) two slipp+ er+ s left+right-marked (+m+) two slipp+ +er+ +s two slipp +er +s Vocab size 4 V Improved subword modeling for WFST-based speech recognition 6/21 Peter Smit August 23, 2017 Aalto University
Subword problems Subword modeling WFST implementation Restricting output of decoder to be valid (don’t start or Experiments end a sentence halfway a word) Recap two slip+ +per+ +s Future work +two slip+ per+ +s Word-position dependent phonemes Longer contexts are needed in language modeling Improved subword modeling for WFST-based speech recognition 7/21 Peter Smit August 23, 2017 Aalto University
Original Lexicon FST (kaldi) Subword modeling AHs:a WFST implementation Experiments Wb:one AHi: ǫ Ne: ǫ 3 4 Recap Future work Tb:two UWe: ǫ 5 ǫ : ǫ start 1 0 #a: ǫ SIL: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University
Original Lexicon FST (kaldi) Subword modeling WFST Input symbol implementation AHs:a Experiments Recap Future work Wb:one AHi: ǫ Ne: ǫ 3 4 Tb:two UWe: ǫ 5 ǫ : ǫ 1 0 start SIL: ǫ #a: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University
Original Lexicon FST (kaldi) Phone position Subword modeling WFST Input symbol implementation AHs:a Experiments Recap Future work Wb:one AHi: ǫ Ne: ǫ 3 4 Tb:two UWe: ǫ 5 ǫ : ǫ 1 0 start SIL: ǫ #a: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University
Original Lexicon FST (kaldi) Phone position Subword modeling WFST Input symbol Output symbol implementation AHs:a Experiments Recap Future work Wb:one AHi: ǫ Ne: ǫ 3 4 Tb:two UWe: ǫ 5 ǫ : ǫ 1 0 start SIL: ǫ #a: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University
Original Lexicon FST (kaldi) Subword modeling AHs:a WFST implementation Experiments Wb:one AHi: ǫ Ne: ǫ 3 4 Recap Future work Tb:two UWe: ǫ 5 ǫ : ǫ start 1 0 #a: ǫ SIL: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University
Original Lexicon FST (kaldi) Subword modeling AHs:a WFST implementation Experiments Wb:one AHi: ǫ Ne: ǫ 3 4 Recap Future work Tb:two UWe: ǫ 5 ǫ : ǫ start 1 0 #a: ǫ SIL: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University
Original Lexicon FST (kaldi) Subword modeling AHs:a WFST implementation Experiments Wb:one AHi: ǫ Ne: ǫ 3 4 Recap Future work Tb:two UWe: ǫ 5 ǫ : ǫ 1 0 start #a: ǫ SIL: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University
Original Lexicon FST (kaldi) Subword modeling AHs:a WFST implementation Experiments Wb:one AHi: ǫ Ne: ǫ 3 4 Recap Future work Tb:two UWe: ǫ 5 ǫ : ǫ start 1 0 #a: ǫ SIL: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University
Original Lexicon FST (kaldi) Subword modeling AHs:a WFST implementation Experiments Wb:one AHi: ǫ Ne: ǫ 3 4 Recap Future work Tb:two UWe: ǫ 5 ǫ : ǫ start 1 0 #a: ǫ SIL: ǫ 2 Improved subword modeling for WFST-based speech recognition 8/21 Peter Smit August 23, 2017 Aalto University
Original Lexicon FST (kaldi) $words Subword modeling WFST ǫ : ǫ implementation start 1 0 Experiments #a: ǫ SIL: ǫ Recap 2 Future work $words Wb:one AHi: ǫ Ne: ǫ 2 3 Tb:two UWe: ǫ start 1 4 0 AHs:a Improved subword modeling for WFST-based speech recognition 9/21 Peter Smit August 23, 2017 Aalto University
Subword Lexicon FST Subword modeling $words WFST implementation Experiments $infix Recap Future work $prefix $suffix #b: ǫ 1 2 3 0 start #a: ǫ SIL:<w> 4 #c:<w> Improved subword modeling for WFST-based speech recognition 10/21 Peter Smit August 23, 2017 Aalto University
Replace FST’s <w>: <w> two <w> slipp er s <w> Subword modeling $words $prefix WFST implementation two Tb UWe two Tb UWi Experiments slipp Sb Li IHi Pe slipp Sb Li IHi Pi Recap er ERs er ERb Future work s Zs s Zb $suffix $infix two Ti UWe two Ti UWi slipp Si Li IHi Pe slipp Si Li IHi Pi er ERe er ERi s Ze s Zi Improved subword modeling for WFST-based speech recognition 11/21 Peter Smit August 23, 2017 Aalto University
Replace FST’s m+: two slipp+ er+ s Subword modeling WFST $prefix $words implementation Experiments two Tb UWe slipp+ Sb Li IHi Pi Recap s Zs er+ ERs Future work $suffix $infix two Ti UWe slipp+ Si Li IHi Pi s Ze er+ ERi Improved subword modeling for WFST-based speech recognition 12/21 Peter Smit August 23, 2017 Aalto University
Experiment Setup Subword modeling AM: Finnish, Kaldi, TDNN, 150 hours, 425 speakers, WFST implementation clean read data (SPEECON) Experiments Recap LM: Variable-order n-gram, Finnish Text Collection, Future work 150M tokens, 4M word forms Test1: R EAD , SPEECON, clean, read, 20 speakers, 1 hours Test2: N EWS , Broadcast news, 5-10 speakers, 5 hours More experiments in the paper Improved subword modeling for WFST-based speech recognition 13/21 Peter Smit August 23, 2017 Aalto University
Recommend
More recommend