Cross-lingual topic prediction for speech using translations Sameer - PowerPoint PPT Presentation

Cross-lingual topic prediction for speech using translations Sameer Bansal Herman Kamper Adam Lopez Sharon Goldwater

Automated speech-to-text Translation Information Retrieval 2

Current systems English audio: ? downstream task: translation, IR 3

Current systems English audio: Where is the nearest hospital? Automatic Speech English text: Recognition downstream task: translation, IR 4

~100 languages supported by Google Translate ... 5

Unwritten languages Mboshi Audio: ASR --- Mboshi text: Aikuma : Bird et al. 2014, LIG-Aikuma : Blachon et al. 2016 Godard et al. 2018 ● ~3,000 languages with no writing system ● Traditional ASR based will not work! 6

Unwritten languages Mboshi Audio: ASR Mboshi text: Aikuma : Bird et al. 2014, LIG-Aikuma : Blachon et al. 2016 French text Godard et al. 2018 Efforts to collect speech and translations using mobile apps 7

Unwritten languages Mboshi Audio: ASR Mboshi text: Aikuma : Bird et al. 2014, LIG-Aikuma : Blachon et al. 2016 French text Godard et al. 2018 Build cross-lingual speech-to-text systems (ST) 8

Why speech input? https://tnw.to/ieUbS “For many Indians, searching by voice rather than text is their first choice.” 9

https://bit.ly/2mL4pf6 Radio content analysis in Uganda 55% households: radio main source of information Quinn and Hidalgo-Sanchis, 2017 10

https://bit.ly/2mL4pf6 Radio content analysis in Uganda Collect data from public radio conversations Quinn and Hidalgo-Sanchis, 2017 11

https://bit.ly/2mL4pf6 Radio content analysis in Uganda “Insights about the spread of infectious diseases, small-scale disasters, etc.” healthcare disasters Quinn and Hidalgo-Sanchis, 2017 12

https://bit.ly/2mL4pf6 Radio content analysis in Uganda Luganda audio Topic? Topic prediction task https://radio.unglobalpulse.net/uganda 13

https://bit.ly/2mL4pf6 Radio content analysis in Uganda Luganda audio Topic? “Eddwaliro lyaffe temuli yadde …” ASR (“… they have built health centers”) Speech to text system https://radio.unglobalpulse.net/uganda 14

https://bit.ly/2mL4pf6 Radio content analysis in Uganda Luganda audio healthcare Topic prediction “ Eddwaliro lyaffe temuli yadde …” ASR (“… they have built health centers ”) Keywords indicate topic information https://radio.unglobalpulse.net/uganda 15

https://bit.ly/2mL4pf6 Radio content analysis in Uganda Luganda audio healthcare Topic prediction “ Eddwaliro lyaffe temuli yadde …” ASR (“… they have built health centers ”) Availability of ASR! https://radio.unglobalpulse.net/uganda 16

https://bit.ly/2mL4pf6 Radio content analysis in Uganda Luganda audio healthcare Topic prediction “ Eddwaliro lyaffe temuli yadde …” ASR (“… they have built health centers ”) Can we predict topics using ST? https://radio.unglobalpulse.net/uganda 17

https://bit.ly/2mL4pf6 Radio content analysis in Uganda Luganda audio healthcare Topic prediction “ Eddwaliro lyaffe temuli yadde …” ASR (“… they have built health centers ”) Can we predict topics using ST? https://radio.unglobalpulse.net/uganda 18

https://bit.ly/2mL4pf6 Radio content analysis in Uganda Luganda audio healthcare Topic prediction “ Eddwaliro lyaffe temuli yadde …” ASR (“… they have built health centers ”) UN study dataset not available! https://radio.unglobalpulse.net/uganda 19

Our work: topic prediction for Spanish speech Spanish audio topic? Topic prediction English text prediction ST ST trained in simulated low-resource settings 20

ST performance in low-resource settings Spanish-English BLEU 160 hours - Weiss et al. 46 *for comparison text-to-text = 58 Good performance if trained on 100+ hours 21

ST performance in low-resource settings Spanish-English BLEU 160 hours - Weiss et al. 46 20 hours - Bansal et al. 2019 19 *for comparison text-to-text = 58 Mediocre performance in low-resource settings 22

ST performance in low-resource settings Spanish-English BLEU 160 hours - Weiss et al. 46 20 hours - Bansal et al. 2019 19 *for comparison text-to-text = 58 “Good applications for crummy machine translation” Church & Hovy, 1993 23

Sample translations Spanish soy cat ́ olica pero no en realidad casi no voy a laiglesia English i am catholic but actually i hardly go to church 24

Sample translations Spanish soy cat ́ olica pero no en realidad casi no voy a laiglesia English i am catholic but actually i hardly go to church 20h i’m catholics but reality i don’t go to the church “Crummy” translation 25

Sample translations Spanish soy cat ́ olica pero no en realidad casi no voy a laiglesia English i am catholic but actually i hardly go to church 20h i’m catholics but reality i don’t go to the church topic religion Keywords can be useful for topic prediction 26

Our work: topic prediction for Spanish speech Spanish audio topic? Topic prediction English text prediction ST ST trained in simulated low-resource settings 27

Our work: topic prediction for Spanish speech Spanish audio topic? Topic prediction English text prediction ST Gold topics labels not available! 28

Learning topic labels Spanish audio Gold topic label? 29

Learning topic labels Spanish audio Gold topic label? I like to listen to jazz Gold translation 30

Learning topic labels Spanish audio Gold topic label? I like to listen to jazz Gold translation Use gold translations to infer topic labels 31

Learning topic labels Spanish audio Silver topic label I like to listen to jazz Gold translation Use gold translations to infer topic labels 32

Learning topic labels Spanish audio Gold human translation I listen to english music I am catholic Topic model hello how are you Training set 33

Learning topic labels Spanish audio Gold human translation I listen to english music I am catholic Topic model hello how are you Topic Terms small-talk hello, fine, name music dance, listen, music religion god, bible, believe ... ... Training set 34

Learning topic labels Spanish audio Gold human translation I listen to english music I am catholic Topic model hello how are you Topic Terms small-talk hello, fine, name music dance, listen, music religion god, bible, believe ... ... Number of topics set to 10 35

Learning topic labels Spanish audio Gold human translation I listen to english music I am catholic Topic model hello how are you Topic Terms small-talk hello, fine, name music dance, listen, music religion god, bible, believe ... ... small-talk most frequent 36

Topic prediction and evaluation Spanish audio Topic model Evaluation set 37

Topic prediction and evaluation Gold translation Silver I like to listen to jazz music Spanish audio Topic model Evaluation set 38

Topic prediction and evaluation Gold translation Silver I like to listen to jazz music Spanish audio Topic model ST translation Predicted I like jazz music Compare predicted and silver topic label 39

Topic prediction and evaluation Gold translation Silver I like to listen to jazz music Spanish audio Topic model ST translation Predicted I like jazz music Good prediction 40

Topic prediction and evaluation Gold translation Silver I like to listen to jazz music Spanish audio Topic model ST translation Predicted I like like small-talk Poor prediction 41

Topic prediction and evaluation Gold translation Silver Spanish audio Topic model ST translation Predicted Evaluate over a 100 hour test set 42

Topic prediction accuracy ● ST trained on <= 20 hours of Spanish-English ● Pretrained on English ASR 43

Topic prediction accuracy small-talk topic is the majority class baseline 44

Topic prediction accuracy Poor performance <= 5 hours ST models 45

Topic prediction accuracy 10-20h ST models outperform majority baseline 46

Topic prediction accuracy BLEU = 13 10-20h ST models outperform majority baseline 47

Topic prediction accuracy 48

Takeaways ● Low-resource ST can still be useful for building downstream applications ● Silver evaluation for this preliminary study ○ Future: human evaluation ● Experiments on low-resource/unwritten languages ○ Datasets required ● Keyword spotting Thanks! ● Check out: “Analyzing ASR pretraining for low-resource speech-to-text translation”, Stoian et al. 49

Backup 50

Topic prediction accuracy 51

Silver labels Speakers were provided discussion prompts 52

Topic labels 53

Spanish dataset discussion prompts 54

Spanish speech to English text Spanish Audio ● Telephone speech (unscripted) ● Realistic noise conditions ● Multiple speakers and dialects Encoder ● Crowdsourced English text translations Attention Closer to real-world conditions Decoder English text

Neural ST model yo vivo en bronx 1.5 s MFCCs i live in bronx EOS 150 x 13 FF-Softmax 37 x 512 CNN LSTM biLSTM Attention Embedding 37 x 512 previous time step Code available on Github 56

Cross-lingual topic prediction for speech using translations Sameer - PowerPoint PPT Presentation

Cross-lingual topic prediction for speech using translations Sameer Bansal Herman Kamper Adam Lopez Sharon Goldwater Automated speech-to-text Translation Information Retrieval 2 Current systems English audio: ? downstream task:

ConnectHome ConnectHome Topic 2 Topic 2 Nation Webinar Nation Webinar Topic 3 Topic 3 Topic

Virtual Student Orientation Information for Families SLIDESMANIA.COM TOPIC TOPIC TOPIC TOPIC

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction. Rob van der Goot, Nikola

WMT 2016 Shared Task on Cross-lingual Pronoun Prediction . Liane Guillou, Christian Hardmeier,

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

02 | 27 SOUTHERN CROSS 23.04 03 | 27 SOUTHERN CROSS 23.04 04 | 27 SOUTHERN CROSS 23.04 06

Cross-lingual Information Retrieval Pavel Pecina Institute of Formal and Applied Linguistics

Cross-Lingual Information Retrieval Language Technology I Language Technology I Crosslingual

Cross-lingual NLP Sara Stymne Uppsala University Department of Linguistics and Philology

The Shadow of the Cross The Cross of Jesus part 1B The Shadow of the Cross Hebrews 10:1-14 The

Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning Guillaume Wisniewski Nicolas

Visually grounded cross-lingual keyword spotting in speech SLTU, August 2018 Herman Kamper 1 and

Visually grounded cross-lingual keyword spotting in speech SLTU, August 2018 Herman Kamper 1 and

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

CS184a: Computer Architecture (Structures and Organization) Day19: November 27, 2000

Federal Budget 2019-20 T op 5 Budget impacts for SMSFs Doug McBirnie and Melanie Dunn Agenda

Fourth Quarter and to end all bond investor litigation against Credit Suisse. As a result of this

Transition-Age Youth FRIDAY, FEBRUARY 23, 2018 10:00 TO 11:30 A.M. March 9, 2016 Information to

NAISTs Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task Ryo

The inverse conductivity problem with power densities in dimension n 2 Fran cois Monard

The Tiniest Bit of reality The Tiniest Bit of reality An introductory course on neutrino An

Foxp1 Syndrome Joseph D Buxbaum, PhD Director, Seaver Autism Center Deputy Chair, Department of

Sambuz

Useful Links

Newsletter

Mail Us