Creating Knowledge out of Interlinked Data DBpedia Extraction of Knowledge from Wikipedia Sebastian Hellmann AKSW, Universität Leipzig DBpedia is a community project, please see http://dbpedia.org for a full list of contributors LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
Creating Knowledge out of Interlinked Data DBpedia • DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. • DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. Structured Information Semi Structured Wiki Syntax http://lod2.eu KAIST – LOD2 16.8.2011 2 2
Creating Knowledge out of Interlinked Data DBpedia - Overview • Description DBpedia • Data Set • DBpedia Software • LOD Cloud • Collaborative Ontology Engineering • DBpedia-Live • Internationalization http://lod2.eu KAIST – LOD2 16.8.2011 3 3
Structure in Wikipedia Title Abstract Infoboxes Geo-coordinates Categories Images Links other language versions other Wikipedia pages To the Web Redirects Disambiguations
Structure in Wikipedia Title Abstract Infoboxes Geo-coordinates Categories Images Links other language versions other Wikipedia pages To the Web Redirects Disambiguations
Structure in Wikipedia Title Abstract Infoboxes Geo-coordinates Categories Images Links other language versions other Wikipedia pages To the Web Redirects Disambiguations
Structure in Wikipedia Title Abstract Infoboxes Geo-coordinates Categories Images Links other language versions other Wikipedia pages To the Web Redirects Disambiguations
Structure in Wikipedia Title Abstract Infoboxes Geo-coordinates Categories Images Links other language versions other Wikipedia pages To the Web Redirects Disambiguations
Structure in Wikipedia Title Abstract Infoboxes Geo-coordinates Categories Images Links other language versions other Wikipedia pages To the Web Redirects Disambiguations
Structure in Wikipedia Title Abstract Infoboxes Geo-coordinates Categories Images Links other language versions other Wikipedia pages To the Web Redirects Disambiguations
Infobox Templates Wikitext-Syntax {{Infobox Korean settlement | title = Busan Metropolitan City | img = Busan.jpg | imgcaption = A view of the [[Geumjeong]] district in Busan | hangul = 부산 광역시 ... | area_km2 = 763.46 | pop = 3635389 | popyear = 2006 | mayor = Hur Nam-sik | divs = 15 wards (Gu), 1 county (Gun) | region = [[Yeongnam]] | dialect = [[Gyeongsang]] }} RDF representation dbp:Busan dbp:title ″Busan Metropolitan City″ dbp:Busan dbp:hangul ″ 부산 광역시 ″ @Hang dbp:Busan dbp:area_km2 ″763.46“^xsd:float dbp:Busan dbp:pop ″3635389“^xsd:int dbp:Busan dbp:region dbp:Yeongnam dbp:Busan dbp:dialect dbp:Gyeongsang ...
Creating Knowledge out of Interlinked Data DBpedia – Data Set Simple Questions – hard to answer: • What have Innsbruck and Leipzig in common? • Who are mayors of central European towns elevated more than 1000m? • All soccer players, who played as goalkeeper for a club that has a stadium with more than 40.000 seats and who are born in a country with more than 10 million inhabitants DBpedia can answer these questions and provides a public SPARQL endpoint for developing (hosted on a Virtuoso server) http://lod2.eu KAIST – LOD2 16.8.2011 12 12
Creating Knowledge out of Interlinked Data DBpedia – Data Set • “A little Semantics goes a long way” - Jim Hendler http://tinyurl.com/2uhuow9 http://lod2.eu KAIST – LOD2 16.8.2011 13 13
Creating Knowledge out of Interlinked Data DBpedia - Overview http://lod2.eu KAIST – LOD2 16.8.2011 14 14
Creating Knowledge out of Interlinked Data DBpedia – Data Set http://lod2.eu KAIST – LOD2 16.8.2011 15 15
Creating Knowledge out of Interlinked Data DBpedia – Data Set http://en.wikipedia.org/wiki/Daejeon http://dbpedia.org/resource/Daejeon - stable IDs - useful data (population, pictures ...) http://lod2.eu KAIST – LOD2 16.8.2011 16 16
Creating Knowledge out of Interlinked Data DBpedia – Software DIEF - DBpedia Information Extraction Framework http://lod2.eu KAIST – LOD2 16.8.2011 17 17
Creating Knowledge out of Interlinked Data DBpedia – Software DIEF - DBpedia Information Extraction Framework • Hosted on Sourceforge • More than 30 developers • Written in Scala • Can potentially be adapted to other MediaWikis (currently Wiktionary) http://lod2.eu KAIST – LOD2 16.8.2011 18 18
DIEF
DIEF
DIEF
DIEF
DIEF
Creating Knowledge out of Interlinked Data DBpedia – Collaborative Ontology Engineering • “A little Semantics goes a long way” - Jim Hendler http://lod2.eu KAIST – LOD2 16.8.2011 24 24
Creating Knowledge out of Interlinked Data DBpedia – Collaborative Ontology Engineering • “A little Semantics goes a long way” - Jim Hendler • More Semantics go a longer way... • Schema and mapping is created by a community to improve data quality • improves precision and recall of queries http://lod2.eu KAIST – LOD2 16.8.2011 25 25
Creating Knowledge out of Interlinked Data A closer look at infoboxes http://lod2.eu KAIST – LOD2 16.8..2011 . Page 26
Creating Knowledge out of Interlinked Data A closer look at infoboxes http://lod2.eu KAIST – LOD2 16.8..2011 . Page 27
Creating Knowledge out of Interlinked Data A closer look at infoboxes http://lod2.eu KAIST – LOD2 16.8..2011 . Page 28
Creating Knowledge out of Interlinked Data Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister) office = Prime Minister of the UK birth_date = 20.4.1951 birth_place = Govan Romero (Actor) occupation = Actor, Editor birthdate = 4.2.1940 birthplace = New York http://lod2.eu KAIST – LOD2 16.8..2011 . Page 29
Creating Knowledge out of Interlinked Data Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister) office = Prime Minister of the UK birth_date = 20.4.1951 birth_place = Govan Romero (Actor) occupation = Actor, Editor birthdate = 4.2.1940 birthplace = New York http://lod2.eu KAIST – LOD2 16.8..2011 . Page 30
Creating Knowledge out of Interlinked Data Björk (Musician) Occupation = Musician, Actor Born = 21.12.1965, Reykjavík Brown (Prime Minister) office = Prime Minister of the UK birth_date = 20.4.1951 birth_place = Govan Romero (Actor) occupation = Actor, Editor birthdate = 4.2.1940 birthplace = New York http://lod2.eu KAIST – LOD2 16.8..2011 . Page 31
Creating Knowledge out of Interlinked Data DBpedia – Collaborative Ontology Engineering • Correct Semantics: • Combine what belongs together (birth_place, birthplace) • Separate what is different (bornIn, birthplace) • Mappings Wiki • Mapping Rules • http://mappings.dbpedia.org/ • Everybody can contribute • About 120 editors http://lod2.eu KAIST – LOD2 16.8.2011 32 32
Creating Knowledge out of Interlinked Data DBpedia – Live DBpedia Live http://lod2.eu KAIST – LOD2 16.8.2011 33 33
Creating Knowledge out of Interlinked Data DBpedia – Live • DBpedia dumps were created based on Wikipedia dumps • About 100,000 – 150,000 page edits on Wikipedia per day • Page edits are pulled, transformed into RDF and loaded into a triple store • 5 minute delay increases performance by 15% http://lod2.eu KAIST – LOD2 16.8.2011 34 34
Creating Knowledge out of Interlinked Data DBpedia – Live http://lod2.eu KAIST – LOD2 16.8.2011 35 35
Creating Knowledge out of Interlinked Data DBpedia – Live • SPARQL Endpoint: http://live.dbpedia.org/sparql • Documentation: http://wiki.dbpedia.org/DBpediaLive • Statistics: http://live.dbpedia.org/LiveStats/ http://lod2.eu KAIST – LOD2 16.8.2011 36 36
Creating Knowledge out of Interlinked Data DBpedia – Internationalization DBpedia Internationalization http://lod2.eu KAIST – LOD2 16.8.2011 37 37
Creating Knowledge out of Interlinked Data DBpedia – Internationalization • DBpedia Internationalization Committee founded: http://wiki.dbpedia.org/Internationalization • Available DBpedias: • Korean, Greece, German, Polish, Russian, Netherlands • Mappings available for over 12 languages http://lod2.eu KAIST – LOD2 16.8.2011 38 38
Creating Knowledge out of Interlinked Data DBpedia – Internationalization • DBpedia Internationalization http://lod2.eu KAIST – LOD2 16.8.2011 39 39
40
Creating Knowledge out of Interlinked Data Thank you for your attention! DBpedia is a community project, please see http://dbpedia.org for a full list of contributors. LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
Recommend
More recommend