dbpedia extraction of knowledge from wikipedia
play

DBpedia Extraction of Knowledge from Wikipedia Sebastian Hellmann - PowerPoint PPT Presentation

Creating Knowledge out of Interlinked Data DBpedia Extraction of Knowledge from Wikipedia Sebastian Hellmann AKSW, Universitt Leipzig DBpedia is a community project, please see http://dbpedia.org for a full list of contributors LOD2


  1. Creating Knowledge out of Interlinked Data DBpedia Extraction of Knowledge from Wikipedia Sebastian Hellmann AKSW, Universität Leipzig DBpedia is a community project, please see http://dbpedia.org for a full list of contributors LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

  2. Creating Knowledge out of Interlinked Data DBpedia • DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. • DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. Structured Information Semi Structured Wiki Syntax http://lod2.eu KAIST – LOD2 16.8.2011 2 2

  3. Creating Knowledge out of Interlinked Data DBpedia - Overview • Description DBpedia • Data Set • DBpedia Software • LOD Cloud • Collaborative Ontology Engineering • DBpedia-Live • Internationalization http://lod2.eu KAIST – LOD2 16.8.2011 3 3

  4. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  5. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  6. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  7. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  8. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  9. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  10. Structure in Wikipedia  Title  Abstract  Infoboxes  Geo-coordinates  Categories  Images  Links  other language versions  other Wikipedia pages  To the Web  Redirects  Disambiguations

  11. Infobox Templates Wikitext-Syntax {{Infobox Korean settlement | title = Busan Metropolitan City | img = Busan.jpg | imgcaption = A view of the [[Geumjeong]] district in Busan | hangul = 부산 광역시 ... | area_km2 = 763.46 | pop = 3635389 | popyear = 2006 | mayor = Hur Nam-sik | divs = 15 wards (Gu), 1 county (Gun) | region = [[Yeongnam]] | dialect = [[Gyeongsang]] }} RDF representation dbp:Busan dbp:title ″Busan Metropolitan City″ dbp:Busan dbp:hangul ″ 부산 광역시 ″ @Hang dbp:Busan dbp:area_km2 ″763.46“^xsd:float dbp:Busan dbp:pop ″3635389“^xsd:int dbp:Busan dbp:region dbp:Yeongnam dbp:Busan dbp:dialect dbp:Gyeongsang ...

  12. Creating Knowledge out of Interlinked Data DBpedia – Data Set Simple Questions – hard to answer: • What have Innsbruck and Leipzig in common? • Who are mayors of central European towns elevated more than 1000m? • All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants DBpedia can answer these questions and provides a public SPARQL endpoint for developing (hosted on a Virtuoso server) http://lod2.eu KAIST – LOD2 16.8.2011 12 12

  13. Creating Knowledge out of Interlinked Data DBpedia – Data Set • “A little Semantics goes a long way” - Jim Hendler http://tinyurl.com/2uhuow9 http://lod2.eu KAIST – LOD2 16.8.2011 13 13

  14. Creating Knowledge out of Interlinked Data DBpedia - Overview http://lod2.eu KAIST – LOD2 16.8.2011 14 14

  15. Creating Knowledge out of Interlinked Data DBpedia – Data Set http://lod2.eu KAIST – LOD2 16.8.2011 15 15

  16. Creating Knowledge out of Interlinked Data DBpedia – Data Set http://en.wikipedia.org/wiki/Daejeon http://dbpedia.org/resource/Daejeon - stable IDs - useful data (population, pictures ...) http://lod2.eu KAIST – LOD2 16.8.2011 16 16

  17. Creating Knowledge out of Interlinked Data DBpedia – Software DIEF - DBpedia Information Extraction Framework http://lod2.eu KAIST – LOD2 16.8.2011 17 17

  18. Creating Knowledge out of Interlinked Data DBpedia – Software DIEF - DBpedia Information Extraction Framework • Hosted on Sourceforge • More than 30 developers • Written in Scala • Can potentially be adapted to other MediaWikis (currently Wiktionary) http://lod2.eu KAIST – LOD2 16.8.2011 18 18

  19. DIEF

  20. DIEF

  21. DIEF

  22. DIEF

  23. DIEF

  24. Creating Knowledge out of Interlinked Data DBpedia – Collaborative Ontology Engineering • “A little Semantics goes a long way” - Jim Hendler http://lod2.eu KAIST – LOD2 16.8.2011 24 24

  25. Creating Knowledge out of Interlinked Data DBpedia – Collaborative Ontology Engineering • “A little Semantics goes a long way” - Jim Hendler • More Semantics go a longer way... • Schema and mapping is created by a community to improve data quality • improves precision and recall of queries http://lod2.eu KAIST – LOD2 16.8.2011 25 25

  26. Creating Knowledge out of Interlinked Data A closer look at infoboxes http://lod2.eu KAIST – LOD2 16.8..2011 . Page 26

  27. Creating Knowledge out of Interlinked Data A closer look at infoboxes http://lod2.eu KAIST – LOD2 16.8..2011 . Page 27

  28. Creating Knowledge out of Interlinked Data A closer look at infoboxes http://lod2.eu KAIST – LOD2 16.8..2011 . Page 28

  29. Creating Knowledge out of Interlinked Data Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister) office = Prime Minister of the UK birth_date = 20.4.1951 birth_place = Govan Romero (Actor) occupation = Actor, Editor birthdate = 4.2.1940 birthplace = New York http://lod2.eu KAIST – LOD2 16.8..2011 . Page 29

  30. Creating Knowledge out of Interlinked Data Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister) office = Prime Minister of the UK birth_date = 20.4.1951 birth_place = Govan Romero (Actor) occupation = Actor, Editor birthdate = 4.2.1940 birthplace = New York http://lod2.eu KAIST – LOD2 16.8..2011 . Page 30

  31. Creating Knowledge out of Interlinked Data Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister) office = Prime Minister of the UK birth_date = 20.4.1951 birth_place = Govan Romero (Actor) occupation = Actor, Editor birthdate = 4.2.1940 birthplace = New York http://lod2.eu KAIST – LOD2 16.8..2011 . Page 31

  32. Creating Knowledge out of Interlinked Data DBpedia – Collaborative Ontology Engineering • Correct Semantics: • Combine what belongs together (birth_place, birthplace) • Separate what is different (bornIn, birthplace) • Mappings Wiki • Mapping Rules • http://mappings.dbpedia.org/ • Everybody can contribute • About 120 editors http://lod2.eu KAIST – LOD2 16.8.2011 32 32

  33. Creating Knowledge out of Interlinked Data DBpedia – Live DBpedia Live http://lod2.eu KAIST – LOD2 16.8.2011 33 33

  34. Creating Knowledge out of Interlinked Data DBpedia – Live • DBpedia dumps were created based on Wikipedia dumps • About 100,000 – 150,000 page edits on Wikipedia per day • Page edits are pulled, transformed into RDF and loaded into a triple store • 5 minute delay increases performance by 15% http://lod2.eu KAIST – LOD2 16.8.2011 34 34

  35. Creating Knowledge out of Interlinked Data DBpedia – Live http://lod2.eu KAIST – LOD2 16.8.2011 35 35

  36. Creating Knowledge out of Interlinked Data DBpedia – Live • SPARQL Endpoint: http://live.dbpedia.org/sparql • Documentation: http://wiki.dbpedia.org/DBpediaLive • Statistics: http://live.dbpedia.org/LiveStats/ http://lod2.eu KAIST – LOD2 16.8.2011 36 36

  37. Creating Knowledge out of Interlinked Data DBpedia – Internationalization DBpedia Internationalization http://lod2.eu KAIST – LOD2 16.8.2011 37 37

  38. Creating Knowledge out of Interlinked Data DBpedia – Internationalization • DBpedia Internationalization Committee founded: http://wiki.dbpedia.org/Internationalization • Available DBpedias: • Korean, Greece, German, Polish, Russian, Netherlands • Mappings available for over 12 languages http://lod2.eu KAIST – LOD2 16.8.2011 38 38

  39. Creating Knowledge out of Interlinked Data DBpedia – Internationalization • DBpedia Internationalization http://lod2.eu KAIST – LOD2 16.8.2011 39 39

  40. 40

  41. Creating Knowledge out of Interlinked Data Thank you for your attention! DBpedia is a community project, please see http://dbpedia.org for a full list of contributors. LOD2 Presentation . 02.09.2010 . Page http://lod2.eu

Recommend


More recommend