leveraging supplemental representations for sequential
play

Leveraging supplemental representations for sequential transduction - PowerPoint PPT Presentation

Leveraging supplemental representations for sequential transduction University of Toronto University of Alberta NAACL-HLT 2012 1 / 31 Aditya Bhargava 1 Grzegorz Kondrak 2 1 Department of Computer Science 2 Department of Computing Science


  1. Leveraging supplemental representations for sequential transduction University of Toronto University of Alberta NAACL-HLT 2012 1 / 31 Aditya Bhargava 1 Grzegorz Kondrak 2 1 Department of Computer Science 2 Department of Computing Science �

  2. Pronunciation-based tasks ⁞ 2 / 31 ⁞ orthography Dickens transliterations transcriptions /dɪkɪnz/ �डक � स dIkInz ディケンズ D IH K AH N Z Диккенс dIk@nz Ντίκενς d I k x n z �

  3. Pronunciation-based tasks ⁞ 2 / 31 ⁞ orthography Dickens transliterations transcriptions /dɪkɪnz/ �डक � स dIkInz ディケンズ D IH K AH N Z Диккенс dIk@nz Ντίκενς d I k x n z �

  4. Pronunciation-based tasks ⁞ ⁞ 2 / 31 orthography Dickens MTL G2P BTL P2G SR TTS transliterations transcriptions /dɪkɪnz/ �डक � स dIkInz ディケンズ D IH K AH N Z Диккенс dIk@nz Ντίκενς d I k x n z �

  5. Pronunciation-based tasks ⁞ ⁞ 2 / 31 orthography Dickens MTL G2P BTL P2G SR TTS transliterations transcriptions /dɪkɪnz/ �डक � स dIkInz ディケンズ D IH K AH N Z Диккенс dIk@nz Ντίκενς d I k x n z �

  6. Overview x supplemental data for y Rerank outputs from existing system Features similar to base system, but applied to supplemental data n -grams, alignment/similarity scores Same approach for system combination Use another G2P/MTL system’s outputs as supplemental data 3 / 31 x ∈ { transcription, transliteration } y ∈ { G2P, MTL } �

  7. Overview Excellent results Up to 8.7% error reduction for system combination MTL sees error reduction up to 14% from transliterations and 18% from transcriptions G2P sees error reduction up to 43% from transcriptions But transliterations help G2P for names only 4 / 31 �

  8. Overview Excellent results Up to 8.7% error reduction for system combination MTL sees error reduction up to 14% from transliterations and 18% from transcriptions G2P sees error reduction up to 43% from transcriptions But transliterations help G2P for names only 4 / 31 �

  9. Overview Excellent results (mostly) Up to 8.7% error reduction for system combination MTL sees error reduction up to 14% from transliterations and 18% from transcriptions G2P sees error reduction up to 43% from transcriptions But transliterations help G2P for names only 4 / 31 �

  10. Reranking method From ACL 2011 Looks specifically at transliterations as supplemental data Names are hard(er) Transliteration is generally applied to named entities Encodes relevant pronunciation information Using supplemental data, rerank n -best output list of G2P base system Additional findings: Simple similarity-based methods don’t work Multiple languages are helpful 5 / 31 for G2P of names �

  11. Reranking method Here, we experiment with: 1 Transcriptions as supplemental data for both G2P and MTL 2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data 6 / 31 �

  12. Reranking method Here, we experiment with: 1 Transcriptions as supplemental data for both G2P and MTL 2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data 6 / 31 �

  13. Reranking method Here, we experiment with: 1 Transcriptions as supplemental data for both G2P and MTL 2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data 6 / 31 �

  14. Reranking method Here, we experiment with: 1 Transcriptions as supplemental data for both G2P and MTL 2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data 6 / 31 �

  15. Reranking method Here, we experiment with: 1 Transcriptions as supplemental data for both G2P and MTL 2 Transcriptions and transliterations simultaneously 3 G2P in general, rather than names only 4 System combination as supplemental data 6 / 31 �

  16. Related work G2P systems learning (DirecTL+) MTL systems Similarly many approaches Lately Sequitur and DirecTL+ have performed quite well at NEWS 7 / 31 Neural networks, instance-based learning, . . . . . . , joint n -gram models (Sequitur), online discriminative �

  17. Related work Using heterogeneous data Pivot through a third language for transliteration Mostly useful for low-resource environments Hard to incorporate more languages Linear combination of system scores 8 / 31 �

  18. Method 9 / 31 input word Sudan �

  19. Method 9 / 31 input word Sudan base system �

  20. Method ⁞ 9 / 31 input word n -best outputs Sudan base system sud@n sud{n sud#n �

  21. Method 9 / 31 ⁞ ⁞ input word n -best outputs Sudan base system sud@n re-ranker sud{n sud#n sudAn S UW D AE N スーダン सूडान Судан supplemental representations �

  22. Method ⁞ ⁞ ⁞ 9 / 31 input word n -best outputs re-ranked n -best list Sudan base system sud@n re-ranker sud#n sud{n sUd#n sud#n sud@n sudAn S UW D AE N スーダン सूडान Судан supplemental representations �

  23. Method 10 / 31 input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs �

  24. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  25. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  26. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  27. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  28. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  29. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  30. 10 / 31 Method input Gershwin /d͡ʒɜːʃwɪn/ /ɡɜːʃwɪn/ /d͡ʒɛɹʃwɪn/ n -best outputs गश�िवन Гершвин transliterations ガーシュウィン (/ɡʌrʃʋɪn/) (/ɡaːɕuwiɴ/) (/ɡerʂvin/) �

  31. Data and base systems Transcriptions from Combilex and CELEX Transliterations from NEWS 2011 Experiment on English-to-Japanese transliteration 80/10/10 train/dev/test split Sequitur and DirecTL+ as base systems 11 / 31 �

  32. G2P experiments Supplemental transliterations input McGee candidate outputs m@kJi m@gi ... m@CJi 12 / 31 �

  33. G2P experiments Supplemental transliterations input McGee candidate outputs m@kJi m@gi ... m@CJi 12 / 31 �

  34. G2P experiments Supplemental transliterations input McGee candidate outputs m@kJi m@gi ... m@CJi supplemental 12 / 31 मगी マギー Макги �

  35. G2P experiments: names Supplemental transliterations Sequitur DirecTL+ 70 80 90 Word accuracy (%) Base Reranked 13 / 31 �

  36. G2P experiments: full set Supplemental transliterations Sequitur DirecTL+ 70 80 90 Word accuracy (%) Base Reranked 14 / 31 �

  37. G2P experiments: core vocab Supplemental transliterations Sequitur DirecTL+ 70 80 90 Word accuracy (%) Base Reranked 15 / 31 �

  38. G2P experiments Supplemental transcriptions (word/name) input Sudan (CELEX) candidate outputs sud@n sud{n ... sud#n 16 / 31 �

  39. G2P experiments Supplemental transcriptions (word/name) input Sudan (CELEX) candidate outputs sud@n sud{n ... sud#n 16 / 31 �

  40. G2P experiments Supplemental transcriptions (word/name) input Sudan (CELEX) candidate outputs sud@n sud{n ... sud#n (Combilex) supplemental sudAn 16 / 31 �

  41. G2P experiments: baselines Supplemental transcriptions MERGE 1 Convert Combilex to CELEX 2 Merge with CELEX 3 Train on combined set P2P: phoneme-to-phoneme converter 1 Intersect Combilex and CELEX 2 Train a transduction system to convert Combilex to CELEX 3 If a test word appears in Combilex, grab it from there and convert it to CELEX format 17 / 31 �

  42. G2P experiments: baselines Supplemental transcriptions MERGE 1 Convert Combilex to CELEX 2 Merge with CELEX 3 Train on combined set P2P: phoneme-to-phoneme converter 1 Intersect Combilex and CELEX 2 Train a transduction system to convert Combilex to CELEX 3 If a test word appears in Combilex, grab it from there and convert it to CELEX format 17 / 31 �

Recommend


More recommend