Semantic Search Focus: IR on Structured Data 8th European Summer School on Information Retrieval Duc Thanh Tran Institute AIFB, KIT, Germany Tran@aifb.uni-karlsruhe.de http://sites.google.com/site/kimducthanh
Agenda � Why Semantic Search? � What is Semantic Search? � A Semantic Search direction - IR on structured data � Matching � Ranking Ranking � Conclusions
Why Semantic Search?
Why Semantic Search? Many of these queries would not Many of these queries would not � Solve main classes of queries, e.g. navigational be asked by users, who learned be asked by users, who learned over time what search over time what search � But long tail queries… technology can and can not do. technology can and can not do. � “teacher math class Goethe” These queries require precise These queries require precise � Several problematic cases understanding of the underlying understanding of the underlying � Ambiguous / imprecise queries information needs and data, and information needs and data, and aggregating results . aggregating results . � “Paris Hilton” “Paris Hilton” � “strong adventures people from Germany” � Specific, complex queries (factual, aggregated) � “32 year old computer scientist living in Karlsruhe” � “digital camera under 300 dollars produced by canon in 1992” 4
Why Semantic Search? � Towards a Semantic Web � Large number of Web data vocabularies published in RDFS and OWL � Schema.org � Dbpedia ontology � Large amounts of data published in RDF / RDFa � Linked Data Semantics captured by Semantics captured by � Embedded metadata taxonomies, ontologies, taxonomies, ontologies, structured metadata can help to structured metadata can help to obtain precise understanding, obtain precise understanding, to aggregate information from to aggregate information from different sources, and to different sources, and to retrieve relevant results! retrieve relevant results! 5
Vocabularies � DBpedia ontology from : http://wiki.dbpedia.org/Ontology 6
Vocabularies DBpedia [Bizer et al, JWS02] from : http://wiki.dbpedia.org/Ontology 7
from : http://wiki.dbpedia.org/Ontology 8
Structured Data Resource Description Framework (RDF) � Each resource (thing, entity) is identified by a URI � Entity descriptions as sets of facts � Triples of (subject, predicate, object) � A set of triples is published together in an RDF document (forming an RDF graph) adopted from : http://www.w3.org/TR/xhtml-rdfa-primer/ 9
Structured Data Linked Data source: http://linkeddata.org/ 10
Metadata RDFa on the rise 510% increase 510% increase between March, between March, 2009 and 2009 and October, 2010 October, 2010 Percentage of URLs with embedded metadata in various formats from : http://www.slideshare.net/pmika/semtech-2011-semantic-search-tutorial 11
Metadata RDFa … <div about=" /alice/posts/trouble_with_bob "> <h2 property="dc:title">The trouble with Bob</h2> <h3 property="dc:creator">Alice</h3> Bob is a good friend of mine. We went to the same university, and also shared an apartment in Berlin in 2008. The trouble with Bob is that he takes much better photos than I do: <div about="http://example.com/bob/photos/sunset.jpg"> <img src="http://example.com/bob/photos/sunset.jpg" /> <span property="dc:title">Beautiful Sunset</span> by <span property="dc:creator">Bob</span>. </div> </div> … adopted from : http://www.w3.org/TR/xhtml-rdfa-primer/ 12
Metadata RDFa Bob is a good friend of mine. We content went to the same university, and also shared an apartment in Berlin in 2008. The trouble with Bob is that he takes much better photos than I do: content adopted from : http://www.w3.org/TR/xhtml-rdfa-primer/ 13
What is Semantic Search?
Structure � Semantics � Search tasks � Document, data, social media, multimedia � Core search problems � Semantic search exploits semantics � For search tasks For search tasks � For search problems � Many Semantic Search directions
Semantics � Semantics is concerned with the meaning of query, data and background knowledge � Distributional hypothesis / statistical semantics � “a word is characterized by the company it keeps” � Based on word patterns (co-occurrence frequency of the context words near a given target word) context words near a given target word) � Explicit semantics � Various explicit representations of meaning 16
Explicit Semantics � Linguistic models: relationships among terms � Taxonomies, thesauri, dictionaries of entity names � Term relationships: synonymous, hyponymous, broader, narrower… � Examples: WordNet, Roget’s Thesaurus � Conceptual models: relationships among classes of objects � Abstract and conceptual representation of data � Terminological part (T-Box) of ontologies, DB schema e.g. relational model model � Concepts, RDFS classes, associations, relationships, attributes… � Examples: SUMO, DBpedia � Structured data: relationships among objects � Description of concrete objects � Assertional part of ontologies (A-Box), DB instance � Tuples, instances, entities, RDF resources, foreign keys, relationships, attributes,… � Examples: Linked Data, metadata 17
Search tasks – document retrieval � Search on textual data (documents, Web pages) � Mainly studied in the IR community � Data and queries � Term-based representation � Search algorithms � Retrieve documents relevant for query keywords � Retrieve documents relevant for query keywords � Match query term against terms / content of documents � Leverage statistical semantics for dealing with ambiguity and for ranking � Optimized, work well for navigational , topical search � Less so for complex information needs � Web scale 18
Search tasks – data retrieval � Focus on structured data and retrieve direct answers � Data and queries � Structured models � Search algorithms � Retrieve direct answers that match structured queries � Structure matching : term / content based relevance Structure matching : term / content based relevance less the focus, but structure filtering based on joins � Use relational semantics in structured data � Optimized for complex structured information needs / queries, less so for text-based relevance � More complex processing � efficiency, scalability 19
Search tasks Addressing complex information needs Combination of data and document retrieval Combination of data and document retrieval � Movies directed by Stephen Structured data with Structured data with Spielberg where synopsis textual attribute values textual attribute values mentions dinosaurs. (content, description) (content, description) � Publications authored by 32 year old computer scientist year old computer scientist Documents with Documents with Documents with Documents with living in Karlsruhe, which metadata metadata mention Semantic Search � Information about a friend of Alice, who shared an apartment with her in Berlin and knows someone in the field of Semantic Search working at KIT 20
Search tasks e.g. combination of data and document retrieval � “Information about a friend of Alice, who shared an apartment with her in Berlin and knows someone in the field of Semantic Search working at KIT”. <shared apartment in Berlin with Alice> <knows someone in the field of Semantic <friend of Alice> Search working at KIT> 34 trouble with bob FluidOps Peter sunset.jpg Bob is a good friend Beautiful of mine. We went to Sunset Semantic the same university, Germany Alice Search and also shared an apartment in Berlin in 2008. The trouble Germany with Bob is that he 2009 Bob Thanh takes much better photos than I do: KIT 21
Core search problems Term ambiguity knows someone works at KIT apartment shared Berlin Alice 34 trouble with bob FluidOps Peter sunset.jpg Bob is a good friend Beautiful of mine. We went to Sunset Semantic the same university, Germany Alice Search and also shared an apartment in Berlin apartment in Berlin in 2008. The trouble Germany with Bob is that he 2009 Bob Thanh takes much better photos than I do: KIT Syntax / Semantic Syntax / Semantic Is “BerllinNN” same as “Berlin”? What is meant by “KIT”? Is “BerllinNN” same as “Berlin”? What is meant by “KIT”? 22
Core search problems Structure ambiguity knows someone works at KIT apartment shared Berlin Alice 34 trouble with bob FluidOps Peter sunset.jpg Bob is a good friend Beautiful of mine. We went to Sunset Semantic Germany the same university, Alice Search and also shared an apartment in Berlin apartment in Berlin in 2008. The trouble Germany 2009 with Bob is that he Bob Thanh takes much better photos than I do: KIT Explicit semantics in Explicit semantics in structured data reduces structured data reduces structure ambiguity structure ambiguity What is the connection between “Berlin” What is the connection between “Berlin” What is the relationship between What is the relationship between and “Alice”? and “Alice”? “someone” and KIT? “someone” and KIT? 23
Recommend
More recommend