an italian to catalan rbmt system reusing data from
play

An Italian to Catalan RBMT system reusing data from existing - PowerPoint PPT Presentation

Introduction Methodology Evaluation Conclusions An Italian to Catalan RBMT system reusing data from existing language pairs Antonio Toral , Mireia Ginest -Rosell, Francis Tyers 2 nd International Workshop on Free/Open-Source Rule-Based


  1. Introduction Methodology Evaluation Conclusions An Italian to Catalan RBMT system reusing data from existing language pairs Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers 2 nd International Workshop on Free/Open-Source Rule-Based Machine Translation 2011/01/21 Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  2. Introduction Methodology Evaluation Conclusions Contents Introduction 1 Methodology 2 Crossdics Inconsistencies Coverage Evaluation 3 Setting Results Conclusions 4 Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  3. Introduction Methodology Evaluation Conclusions Two main approaches in Machine Translation: Rule-Based and Statistical Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  4. Introduction Methodology Evaluation Conclusions Two main approaches in Machine Translation: Rule-Based and Statistical What is needed to build a system for a new language pair? Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  5. Introduction Methodology Evaluation Conclusions Two main approaches in Machine Translation: Rule-Based and Statistical What is needed to build a system for a new language pair? RBMT: dictionaries and rules Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  6. Introduction Methodology Evaluation Conclusions Two main approaches in Machine Translation: Rule-Based and Statistical What is needed to build a system for a new language pair? RBMT: dictionaries and rules SMT: parallel corpus Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  7. Introduction Methodology Evaluation Conclusions Two main approaches in Machine Translation: Rule-Based and Statistical What is needed to build a system for a new language pair? RBMT: dictionaries and rules SMT: parallel corpus Drawbacks: Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  8. Introduction Methodology Evaluation Conclusions Two main approaches in Machine Translation: Rule-Based and Statistical What is needed to build a system for a new language pair? RBMT: dictionaries and rules SMT: parallel corpus Drawbacks: RBMT: linguistic expertise on both languages, manual construction Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  9. Introduction Methodology Evaluation Conclusions Two main approaches in Machine Translation: Rule-Based and Statistical What is needed to build a system for a new language pair? RBMT: dictionaries and rules SMT: parallel corpus Drawbacks: RBMT: linguistic expertise on both languages, manual construction SMT: only applicable to language pairs with big amounts of parallel data Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  10. Introduction Methodology Evaluation Conclusions This paper: build RBMT system by exploiting data from existing pairs Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  11. Introduction Methodology Evaluation Conclusions This paper: build RBMT system by exploiting data from existing pairs We build an MT system for pair a – b given existing systems for pairs a – c and b – c Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  12. Introduction Methodology Evaluation Conclusions This paper: build RBMT system by exploiting data from existing pairs We build an MT system for pair a – b given existing systems for pairs a – c and b – c Italian → Catalan from Apertium’s Italian–Spanish and Catalan–Spanish Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  13. Introduction Methodology Evaluation Conclusions This paper: build RBMT system by exploiting data from existing pairs We build an MT system for pair a – b given existing systems for pairs a – c and b – c Italian → Catalan from Apertium’s Italian–Spanish and Catalan–Spanish Motivation: Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  14. Introduction Methodology Evaluation Conclusions This paper: build RBMT system by exploiting data from existing pairs We build an MT system for pair a – b given existing systems for pairs a – c and b – c Italian → Catalan from Apertium’s Italian–Spanish and Catalan–Spanish Motivation: RBMT competitive and useful for languages without parallel corpora Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  15. Introduction Methodology Evaluation Conclusions This paper: build RBMT system by exploiting data from existing pairs We build an MT system for pair a – b given existing systems for pairs a – c and b – c Italian → Catalan from Apertium’s Italian–Spanish and Catalan–Spanish Motivation: RBMT competitive and useful for languages without parallel corpora Reusing data from similar pairs significantly reduces the amount of work Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  16. Introduction Crossdics Methodology Inconsistencies Evaluation Coverage Conclusions Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  17. Introduction Crossdics Methodology Inconsistencies Evaluation Coverage Conclusions Input dictionaries: Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  18. Introduction Crossdics Methodology Inconsistencies Evaluation Coverage Conclusions Input dictionaries: es–it: mono es 11k, mono it 10k, bi 12k Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  19. Introduction Crossdics Methodology Inconsistencies Evaluation Coverage Conclusions Input dictionaries: es–it: mono es 11k, mono it 10k, bi 12k es–ca: mono es 44k, mono ca 40k, bi 51k Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  20. Introduction Crossdics Methodology Inconsistencies Evaluation Coverage Conclusions Input dictionaries: es–it: mono es 11k, mono it 10k, bi 12k es–ca: mono es 44k, mono ca 40k, bi 51k Output dictionaries: Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  21. Introduction Crossdics Methodology Inconsistencies Evaluation Coverage Conclusions Input dictionaries: es–it: mono es 11k, mono it 10k, bi 12k es–ca: mono es 44k, mono ca 40k, bi 51k Output dictionaries: it–ca: mono it 7k, mono ca 8k, bi 9k Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  22. Introduction Crossdics Methodology Inconsistencies Evaluation Coverage Conclusions Input dictionaries: es–it: mono es 11k, mono it 10k, bi 12k es–ca: mono es 44k, mono ca 40k, bi 51k Output dictionaries: it–ca: mono it 7k, mono ca 8k, bi 9k Other linguistic data: Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  23. Introduction Crossdics Methodology Inconsistencies Evaluation Coverage Conclusions Input dictionaries: es–it: mono es 11k, mono it 10k, bi 12k es–ca: mono es 44k, mono ca 40k, bi 51k Output dictionaries: it–ca: mono it 7k, mono ca 8k, bi 9k Other linguistic data: it tagger and disambiguation probabilities taken from it–es Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  24. Introduction Crossdics Methodology Inconsistencies Evaluation Coverage Conclusions Input dictionaries: es–it: mono es 11k, mono it 10k, bi 12k es–ca: mono es 44k, mono ca 40k, bi 51k Output dictionaries: it–ca: mono it 7k, mono ca 8k, bi 9k Other linguistic data: it tagger and disambiguation probabilities taken from it–es transfer rules: 35 taken from oc–ca (mainly noun phrases) + 9 manually created (verbs and clitic pronouns) Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  25. Introduction Crossdics Methodology Inconsistencies Evaluation Coverage Conclusions Reasons Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  26. Introduction Crossdics Methodology Inconsistencies Evaluation Coverage Conclusions Reasons Differences of gender and number (it–ca) Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

  27. Introduction Crossdics Methodology Inconsistencies Evaluation Coverage Conclusions Reasons Differences of gender and number (it–ca) Different ways of categorising lemmas and morphological features (es–it and es–ca) Antonio Toral , Mireia Ginest´ ı-Rosell, Francis Tyers it → ca RBMT reusing data from existing language pairs

Recommend


More recommend