building multilingual domain wordnets in a wiki way
play

building multilingual domain WordNets in a Wiki Way Andrea - PowerPoint PPT Presentation

building multilingual domain WordNets in a Wiki Way Andrea Marchetti, Francesco Ronzano, Maurizio Tesconi, Salvatore Minutoli Web Applications for the Future Internet Group Institute of Informatics and Telematics IIT-CNR, Pisa Overview


  1. building multilingual domain WordNets in a Wiki Way Andrea Marchetti, Francesco Ronzano, Maurizio Tesconi, Salvatore Minutoli Web Applications for the Future Internet Group Institute of Informatics and Telematics IIT-CNR, Pisa

  2. Overview • Multilingual Web • The Wiki paradigm: collaborative management of knowledge resources • Wikyoto Knowledge Editor • Wikyoto and the KYOTO System • Knowledge editing features of Wikyoto • External resources: references to model domain knowledge • Architectural overview • Wikyoto on-line • Ongoing work

  3. Source Ethnologue

  4. Source Netz-Tipp.De 2002

  5. Languages used to access Google Source http://www.netz-tipp.de/languages.html

  6. Multilingual Web: some statistic... The languages spoken over the Web (June 2010) 537 English June 30, 2010 - Source: Internet World Stats 445 Chinese 153 Spanish 99 Japanese 82 Portuguese 75 German 65 Arabic 60 French 59 Russian 39 Korean 350 REST 0 100 200 300 400 500 600 Million of native speaker Internet users

  7. Multilingual Web: some statistic... The growth of language communities between 2000 and 2010 281% English June 30, 2010 - Source: Internet World Stats 1277% Arabic, Russian, Chinese 743% Portuguese and Spanish 110% Spanish are the most Japanese 989% Portuguese growing Web languages 173% German 2501% … thus the accesss to Web Arabic 398% content across different French 1825% languages is becoming Russian 421% Korean fundamental 588% REST 0 5 10 15 20 25 30 Percentage of growth of the Intenet user language community from 2000 to 2010

  8. KYOTO Overview Multilingual Knowledge Base represents the knowledge Web documents from 7 background necessary for each languages are uploaded steps HTML Syntactic & Semantic They are annotated by a Annotation pipeline of linguistic tools Language indipendent facts are Multilingual Fact Extraction Knowledge Base exctracted Users can perform queries in Cross-lingual one of the 7 language Semantic Search ¿Cuál es el impacto Qual’è l’impatto del cambio What is the impact del cambiamento climático sobre la of climate change climatico sulla biodiversidad? on biodiversity? biodiversità?

  9. Multilingual Knowledge Base Architecture Domain Model , language-independent describing a specific domain with a set of concepts and relations (i.e. an ontology) - Kyoto Central Ontology Linguistic Information specific to each considered language - WordNets Mapping the Linguistic Information over the Domain Model is a Multilingual Knowledge Base cat animal Domain model Linguistic [cat, true cat] NOUN [gatto, micio] NOUN [gata, gato] NOUN Feline mammal Mammifero carnivoro Mamifero felino que ....... usually having soft fut normalmente tiene... Information ENGLISH ITALIAN SPANISH

  10. Extend linguistic information Multilingual Knowledge Base amphibian .................................................................... frog ..................................................................... ..as we can notice from the figure. In the southern part of the island tree frogs and gopher frogs are widely diffused; when in 1994 the great fire destroyed the most of the wood that... ..................................................................... .....................................................................

  11. Extend linguistic information Multilingual To improve the kyoto performance the multilingual Knowledge Base knowledge base has to be extended with linguistic information belonging to the Environment Domain amphibian .................................................................... frog ..................................................................... ..as we can notice from the figure. poison tree gopher In the southern part of the island tree frogs frog frog frog and gopher frogs are widely diffused; when in 1994 the great fire destroyed the most of the wood that... ..................................................................... .....................................................................

  12. Generic & Domain WordNets hyperonym Generic KYOTO Central frog, toad, WordNet toad frog, anurann Ontology Any of various tailless stout-bodied anphibian True frog, ranid Insectivourous usually semiaquatic web-footed hyperonym equivalence Domain Gopher frog WordNet The Gopher Frog (Rana Capito) Is a species of frog in the...

  13. Building the Domain WordNet IT’S IMPORTANT: a richer Knowledge Base improves the semantic analysis Multilingual Knowledge Base IT’S HONEROUS amphibian by involving domain experts to extend, customize and maintain the Multilingual Knolwedge Base frog poison tree robber frog frog frog EXPERIENCE OF SOCIAL WEB THE WIKI PARADIGM

  14. The Wiki paradigm in KYOTO Knowledge Resources Editing Environments Survey Desktop applications Full editing features , used for complex resources, Full editing features, only for knowledge engineers only for knowledge engineers Wikipedia-like Rich Web applications applications Difficult editing of complex Limited editing possibilities, knowledge structures mainly editors of taxonomies of concepts

  15. The Wiki paradigm in KYOTO Wikyoto is a balance between complexity of use and formalization of the edited knowledge Desktop applications Complexity of use Rich Web Wikipedia-like applications applications Knowledge structuring

  16. Wikyoto The Knowledge Editing Flow External SKOS KYOTO resources Thesauri Terminology Create Edit Link Gopher Frog KYOTO Central Ontology

  17. The Wikyoto Knowledge Editor Global architecture pollution Air Water KYOTO SKOS External pollutiion pollution Terminology Thesauri resources Nutrient pollution KYOTO Central frog Wikyoto User Ontology poison tree gopher frog frog frog

  18. The Wikyoto Knowledge Editor System architecture Concept User INTERNET KYOTO Web API Web SPARQL Queries Generic & Domain KYOTO SKOS Thesauri DBpedia WordNets Terminology Kyoto Ontology

  19. The Wikyoto Knowlwdge Editor Main Features • Versioning (like media wiki) • Concurrency Management (synset lock) • Statistical Data • Exploiting External Resources • Semplify linking to the Ontology – TMEKO Procedure

  20. DEMO http://www.wikyoto.net/ More information at: http://www.kyoto-project.eu/ Section: System Architecture and Demo

  21. External Resources – Kyoto Terminology The KYOTO Terminology is: - automatically extracted by KYOTO ... frogs mining KYOTO parsed Terminology represent documents the most frog diffused... gopher endemic poison - terms are organized in ...habitat frog frog frog taxonomies of many frog species... golden poison frog - each term has one or more document occurrences ... with ...with endemic gopher The golden frogs frogs poison that are... represent... frog tipically...

  22. External Resources – Skos Thesauri Simple Knowledge SKOS Thesauri Organization System (SKOS) A Dew pond is an artificial pond - data model for thesauri, usually sited on the top of a hill taxonomies, classification poison dart frog skos:definition schema skos:definition dew pond - W3C standard based on RDF Poison dart frog skos:narrower is the common name - widely exploited by the skos:relatedTo of a group of frogs Semantic Web community in the family Dendrobatidae frog which are native to Central - organized in the basis of: and South America. skos:definition skos:broader • concepts (with labels, amphibian A class of vertebrate descript.) animals skos:definition characterized by • relations : broader/ a moist, glandular skin, Any insectivorous anuran amphibian narrower / relatedTo gills at some stage of the family Ranidae, such as Rana of development... temporaria of Europe,having...

  23. External Resources – Skos Thesauri Simple Knowledge SKOS Thesauri Organization System (SKOS) A Dew pond is an artificial pond Thesauri converted to SKOS usually sited on the top of a hill format: poison datr frog skos:definition - General Multilingual skos:definition dew pond Environmental Thesaurus Poison dart frog skos:narrower (GEMET): 2K concepts is the common name skos:relatedTo of a group of frogs - Species 2000: 2M concepts in the family Dendrobatidae frog which are native to Central - Habitat types from EUNIS and South America. skos:definition skos:broader Biodiversity Database: 1k concepts amphibian A class of vertebrate animals skos:definition - WWF Ecoregions Database: characterized by a moist, 1K concepts glandular skin, Any insectivorous anuran amphibian gills at some stage of the family Ranidae, such as Rana of development... temporaria of Europe,having...

  24. External Resources – DBpedia DBpedia – Wikipedia for the Semantic Web a community effort to extract structured semantic information from Wikipedia and to make this information available on the Web You can access and query : - 2.6 million things (213K persons, 328k places, …) in 30 different languages - 609,000 links to images - 3,150,000 links to external Web pages - 415,000 Wikipedia categories - ...

  25. building multilingual domain WordNets in a Wiki Way Thank You! Web Applications for the Future Internet Group Institute of Informatics and Telematics IIT-CNR, Pisa

Recommend


More recommend