automatic acquisition of named entities for rule based
play

Automatic acquisition of Named Entities for Rule-Based Machine - PowerPoint PPT Presentation

Introduction MINELex Methodology Evaluation Conclusions Automatic acquisition of Named Entities for Rule-Based Machine Translation Antonio Toral , Andy Way DCU 2 nd International Workshop on Free/Open-Source Rule-Based Machine Translation


  1. Introduction MINELex Methodology Evaluation Conclusions Automatic acquisition of Named Entities for Rule-Based Machine Translation Antonio Toral , Andy Way – DCU 2 nd International Workshop on Free/Open-Source Rule-Based Machine Translation 2011/01/20 Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  2. Introduction MINELex Methodology Evaluation Conclusions Contents Introduction 1 MINELex 2 Methodology 3 Motivation Procedure Example Evaluation 4 Environment Experiments Conclusions 5 Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  3. Introduction MINELex Methodology Evaluation Conclusions Named Entities (NEs) refer to proper nouns (e.g. person, location, organization). Information Extraction, MUC Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  4. Introduction MINELex Methodology Evaluation Conclusions Named Entities (NEs) refer to proper nouns (e.g. person, location, organization). Information Extraction, MUC Distribution of NEs compared to other PoS Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  5. Introduction MINELex Methodology Evaluation Conclusions Named Entities (NEs) refer to proper nouns (e.g. person, location, organization). Information Extraction, MUC Distribution of NEs compared to other PoS English Europarl, tagged with Freeling Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  6. Introduction MINELex Methodology Evaluation Conclusions Named Entities (NEs) refer to proper nouns (e.g. person, location, organization). Information Extraction, MUC Distribution of NEs compared to other PoS English Europarl, tagged with Freeling Mean: 1 NEs, 3 common nouns, 7 verbs Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  7. Introduction MINELex Methodology Evaluation Conclusions Named Entities (NEs) refer to proper nouns (e.g. person, location, organization). Information Extraction, MUC Distribution of NEs compared to other PoS English Europarl, tagged with Freeling Mean: 1 NEs, 3 common nouns, 7 verbs Avg num occurrences: 24 NEs, 295 common nouns, 888 verbs Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  8. Introduction MINELex Methodology Evaluation Conclusions Named Entities (NEs) refer to proper nouns (e.g. person, location, organization). Information Extraction, MUC Distribution of NEs compared to other PoS English Europarl, tagged with Freeling Mean: 1 NEs, 3 common nouns, 7 verbs Avg num occurrences: 24 NEs, 295 common nouns, 888 verbs Num different instances: 88k NEs, 26k common nouns, 7k verbs Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  9. Introduction MINELex Methodology Evaluation Conclusions Multilingual and Interoperable Named Entity Lexicon (MINELex) Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  10. Introduction MINELex Methodology Evaluation Conclusions Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs: Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  11. Introduction MINELex Methodology Evaluation Conclusions Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs: Semantic units of dictionaries (en, es, it, ar). E.g. Tim Robbins instance-of actor, film director Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  12. Introduction MINELex Methodology Evaluation Conclusions Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs: Semantic units of dictionaries (en, es, it, ar). E.g. Tim Robbins instance-of actor, film director Nodes of ontologies (SUMO, SIMPLE). E.g. Tim Robbins + Position, believes Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  13. Introduction MINELex Methodology Evaluation Conclusions Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs: Semantic units of dictionaries (en, es, it, ar). E.g. Tim Robbins instance-of actor, film director Nodes of ontologies (SUMO, SIMPLE). E.g. Tim Robbins + Position, believes Equivalent NEs in different languages connected by interlingual links Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  14. Introduction MINELex Methodology Evaluation Conclusions Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs: Semantic units of dictionaries (en, es, it, ar). E.g. Tim Robbins instance-of actor, film director Nodes of ontologies (SUMO, SIMPLE). E.g. Tim Robbins + Position, believes Equivalent NEs in different languages connected by interlingual links NEs associated with confidence scores (num occurrences, % occurs capitalised) Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  15. Introduction MINELex Methodology Evaluation Conclusions Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs: Semantic units of dictionaries (en, es, it, ar). E.g. Tim Robbins instance-of actor, film director Nodes of ontologies (SUMO, SIMPLE). E.g. Tim Robbins + Position, believes Equivalent NEs in different languages connected by interlingual links NEs associated with confidence scores (num occurrences, % occurs capitalised) Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  16. Introduction MINELex Methodology Evaluation Conclusions Multilingual and Interoperable Named Entity Lexicon (MINELex) NEs acquired from Wikipedia for 11 languages and connected to LRs: Semantic units of dictionaries (en, es, it, ar). E.g. Tim Robbins instance-of actor, film director Nodes of ontologies (SUMO, SIMPLE). E.g. Tim Robbins + Position, believes Equivalent NEs in different languages connected by interlingual links NEs associated with confidence scores (num occurrences, % occurs capitalised) English Spanish NEs 948,410 99,330 Variants 1,541,993 128,796 Instance relations 1,366,899 128,796 Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  17. Introduction MINELex Methodology Evaluation Conclusions Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  18. Introduction MINELex Motivation Methodology Procedure Evaluation Example Conclusions Aim: automatically add NEs to RBMT dictionaries Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  19. Introduction MINELex Motivation Methodology Procedure Evaluation Example Conclusions Aim: automatically add NEs to RBMT dictionaries Reasons: Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  20. Introduction MINELex Motivation Methodology Procedure Evaluation Example Conclusions Aim: automatically add NEs to RBMT dictionaries Reasons: Distributional properties + dynamic nature of NEs → impractical to build dictionaries manually Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  21. Introduction MINELex Motivation Methodology Procedure Evaluation Example Conclusions Aim: automatically add NEs to RBMT dictionaries Reasons: Distributional properties + dynamic nature of NEs → impractical to build dictionaries manually 1/3 of entries in Apertium English–Spanish dic regard NEs Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  22. Introduction MINELex Motivation Methodology Procedure Evaluation Example Conclusions Aim: automatically add NEs to RBMT dictionaries Reasons: Distributional properties + dynamic nature of NEs → impractical to build dictionaries manually 1/3 of entries in Apertium English–Spanish dic regard NEs Simpler morphology of NEs* Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  23. Introduction MINELex Motivation Methodology Procedure Evaluation Example Conclusions Aim: automatically add NEs to RBMT dictionaries Reasons: Distributional properties + dynamic nature of NEs → impractical to build dictionaries manually 1/3 of entries in Apertium English–Spanish dic regard NEs Simpler morphology of NEs* Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  24. Introduction MINELex Motivation Methodology Procedure Evaluation Example Conclusions Extract bilingual pairs of NEs from MINELex and insert into Apertium dics Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  25. Introduction MINELex Motivation Methodology Procedure Evaluation Example Conclusions Extract bilingual pairs of NEs from MINELex and insert into Apertium dics Subset of NEs that satisfy restrictions Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

  26. Introduction MINELex Motivation Methodology Procedure Evaluation Example Conclusions Extract bilingual pairs of NEs from MINELex and insert into Apertium dics Subset of NEs that satisfy restrictions Min num of occurrences Antonio Toral , Andy Way – DCU Automatic acquisition of NEs for RBMT

Recommend


More recommend