YAGO: Yet Another Great Ontology Fabian M. Suchanek (joint work with Gjergji Kasneci, Mauro Sozio and Gerhard Weikum) (Max-Planck-Institute for Informatics, Saarbrücken/Germany) Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 1
Overview ر Motivation: Why would anybody need Ontologies? ر Building a Core Ontology: YAGO ر Extending the Core Ontology: SOFIE Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 2
The Search for Excellent Scientists Max-Planck Institute DFKI Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 3
The Search for Excellent Scientists scientist musician prize Invisible Gorilla steals the Nobel Prize ...The gorilla, plus dropped food and country music , were honored... new scientist .org/article/invisibleGorilla.htm Cached Similar pages Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 4
The Search for Excellent Scientists scientist who are musicians and won a prize Invisible Gorilla steals the Nobel Prize ...The gorilla, plus dropped food and country music , were honored... new scientist .org/article/invisibleGorilla.htm Cached Similar pages Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 5
The Search for Excellent Scientists Please give me IMMEDIATELY the scientists who are... Invisible Gorilla steals the Nobel Prize ...The gorilla, plus dropped food and country music , were honored... new scientist .org/article/invisibleGorilla.htm Cached Similar pages Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 6
Solution: An Ontology person is a is a musician scientist is a is a gotPrize Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 7
Solution: An Ontology entity subclass Classes person Relations is a born Individuals 1980 means means means Words "Sam Smart" "Dr. Smart" Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 8
Where do we get the ontology from? recoverWithout(most_people, medication) Previous Approaches: areUnder(0%, the_age_of_18) ر Assemble the ontology manually support(these_findings, the_notion) (WordNet, SUMO, Cyc, GeneOntology) Problem: Usually low coverage (MPI is in none of these) ر Use community work (Semantic Wikipedia, Freebase) Problem: We don't know yet whether it takes off ر Extract the ontology from corpora (e.g. the Web) (Text2Onto, KnowItAll, Espresso, Snowball, LEILA, TextRunner) Problems: 1. Usually low accuracy (50%-92%) 2. Non-canonicity Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 9
Overview ر Motivation: Why would anybody need Ontologies? ر Building a Core Ontology: YAGO ر Extending the Core Ontology: SOFIE Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 10
YAGO Construction: Infoboxes Smart, S bornIn Berlin blah blah blub Elvis (don't read this! Better listen to the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Exploit infoboxes Name: Sam Smart Born in: Berlin ... Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 11
YAGO Construction: Categories Smart, S bornIn born Berlin blah blah blub Elvis (don't read this! Better listen to 1980 the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Exploit infoboxes Exploit relational categories Categories: 1980_births Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 12
YAGO Construction: Categories Smart, S GermanScientist is a bornIn born Berlin blah blah blub Elvis (don't read this! Better listen to 1980 the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Exploit infoboxes Exploit relational categories Categories: Exploit conceptual categories German_scientists Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 13
YAGO Construction: Categories Smart, S GermanScientist Physics is a is a bornIn born Berlin blah blah blub Elvis (don't read this! Better listen to 1980 the talk!) laber fasel suelz. Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Blub, aber blah! Insbesondere, blub, texte zu, und so weiter blah blah blub Elvis laber fasel suelz. Insbesondere, blub, texte zu, und so weiter Exploit infoboxes Exploit relational categories Categories: Exploit conceptual categories Physics Avoid thematic categories Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 14
YAGO Construction: Upper Model entity ? person German Scientist is a born 1980 Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 15
YAGO Construction: Upper Model Business Social_group ? People_by_occupation German Scientist is a born 1980 Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 16
YAGO Construction: Upper Model Person subclass WordNet Scientist "scientist" means subclass German Scientist is a Wikipedia born 1980 "S. Smart" means Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 17
YAGO: Relations establishedOnDate is a isMarriedTo familyName Manual evaluation: hasPopulation givenName 95% correct hasHeight bornOnDate hasWeight diedOnDate hasInflation bornIn actedIn diedIn ... locatedIn 90 relations Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 18
19,000,000 YAGO: Size* 3,000,000 30,000 60,000 200,000 300,000 Yago KnowItAll SUMO WordNet OpenCyc Cyc * Publicly available ontologies with a quality guarantee. Size is not correlated with usefulness. Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 19
YAGO Model: Why binary is not enough scientist #1 (Sam, is_a, scientist) since #2 (#1, since, 1998) 1998 #3 (#1, source, Wikipedia) is a source Wikipedia Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 20
YAGO Model: Formal view A YAGO ontology over ر a set of relations � ر a set of common entities � #1 (Sam, is_a, scientist) ر a set of fact identifiers � #2 (#1, since, 1998) is a function #3 (#1, source, Wikipedia) �� → �� ∪ � ∪ ��� × �� × �� ∪ � ∪ �� S t i l l : C D We can talk about o e n c s i i d s e ر facts (#1, source, Wikipedia) t e a n b c l e y ر additional arguments (#1, since, 1998) ر relations (time, hasRange, time_interval) Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 21
A Hitchhiker's Guide to Ontology YAGO forms SUMO DBpedia YAGO and taxonomic (HU Berlin) (research SUMO have backbone project) been merged YAGO is part of the project by its Web YAGO YAGO will service be included Linking Open Data Freebase (HU Berlin, (community) Planned U Leipzig, YAGO OLS Inc.) contributes the entities Semantic Cyc Wikipedia (commercial) UMBEL (U Karlsruhe) (commercial) [Elsevier 2008] Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 22
Extending the Ontology Our first approach: LEILA - Combining Linguistic and Statistical Analysis [SIGKDD 2006] Worked well, but was slow. was born in 1980 Dr. Smart Dr. Smart was born in 1980. Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 23
Extending the Ontology bornInYear(Person, Year) was born in 1980 Dr. Smart Dr. Smart was born in 1980. Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 24
Extending the Ontology 1. Mapping patterns to relations bornInYear 1980 Dr. Smart Dr. Smart was born in 1980. Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 25
Extending the Ontology diedInYear 1776 1. Mapping patterns to relations 2. Disambiguating entity names bornInYear 1980 Dr. Smart was born in 1980. Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 26
Extending the Ontology 1. Mapping patterns to relations 2. Disambiguating entity names 3. Performing logical reasoning bornInYear 1980 Dr. Smart was born in 1980. Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 27
New ! SOFIE: A Unifying Framework bornInYear 1937 1. Mapping patterns to relations + 2. Disambiguating entity names 3. Performing logical reasoning „Elvis was born in 1937.“ = = = = „X was born in Y“ is a good pattern for bornInYear Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 28
New ! SOFIE: A Unifying Framework „X was born in Y“ is a good pattern for bornInYear 1. Mapping patterns to relations + 2. Disambiguating entity names 3. Performing logical reasoning „Dr. Smart was born in 1980.“ = = = = bornInYear 1980 Fabian M. Suchanek YAGO - A Core of Semantic Knowledge 29
Recommend
More recommend