Question Answering Biographic Information and Social Networks Powered by the Semantic Web Peter Adolphs, Xiwen Cheng, Tina Klüwer, Hans Uszkoreit & Feiyu Xu German Research Center for Artifical Intelligence (DFKI) Language Technology Lab Presenter: Peter Adolphs peter.adolphs@dfki.de LREC 2010 • Valletta, Malta • 20 May 2010 German Research Center for Artificial Intelligence
Motivation Semantic Web: – “The Semantic Web will bring structure to the meaningful content of Web pages“ (Berners-Lee et al, 2001) – Today: genuine Semantic Web resources + Semantic Web versions of large, sometimes community-driven databases and websites Our questions: – How can we use these data in an knowledge-intensive AI applications? – How can we acquire such data from the Web? – How can we interface Semantic Web Linked Data Visualization from data with the human? http://linkeddata.org/ Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Gossip Galore A user-friendly natural language interface to biographical information Embodied Conversational Agent Gossip Galore Q/A methods employed: – Semantic Knowledge Encoding and Retrieval – Natural Language Query Analysis – Multimodal Answer Generation – Finite-State Dialogue Models Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Architecture Two major parts: – Knowledge Management Components (yellow) – Dialogue-Enabled Question Answering Components (green) Interface between the components: Knowledge Base Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Part 1 Knowledge Acquisition Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Knowledge Acquisition from the Web Different kinds of knowledge sources – Information is offered in structured form (e.g. as SQL or RDF exports) – Information provided in semi-structured form on web pages (e.g. price tables for products, info boxes in Wikipedia, etc.) – Free natural-language text Different approaches for these sources – Structured data can be used more or less directly – Information Wrapping for accessing semi-structured web pages – Information Extraction Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Information Merging Procedure: – Instances with the same referent have to be identified – Knowledge bases are then merged by graph union Semantic Web: – RDF provides a simple framework for such a scenario – Ideal for fragmentary data as delivered by Information Extraction – Missing data can sometimes be inferred from fragmentary data using domain models Question Answering Biographic Information and Social Networks Powered by the Semantic Web
RASCALLI Gossip Knowledge Base Knowledge Base (KB) about Entities: people in the pop music – 38,758 people including domain 16,532 artists Populated using – 1,407 music groups – Information Wrapping from Relations: semi-structured web sites – 14,909 parent-child such as Wikipedia and – 16,886 partner NNDB – 4,214 sibling – Minimally supervised relation extraction with – 308 influence/influenced DARE from raw text – 9,657 group membership Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Relation Extraction with DARE D omain A daptive R elation E xtraction Based on Seeds General framework for automatically learning mappings between linguistic analyses and target semantic relations with minimal human intervention (Xu et al, 2008; Xu, 2007) verb subject object head mod mod mod Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Relation Extraction with DARE Relation instances, mentionings, rules m 10 Rule learning with m 11 bootstrapping (sketch): m 9 r 4 r 5 r 2 e 2 e 3 e 4 – Use confirmed relation e 1 instances as seed data e 5 m 4 m 5 m 6 – Find mentionings of the m 7 seed in the text m 8 r 2 r 1 – Bottom-up extraction of all r 3 patterns for the i -ary m 2 projections of the target m 1 m 3 relation (1 ≤ i < n) e 1 – Extract further relation instances with the new rules and use these as seeds in the next iteration Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Merging with YAGO YAGO is a huge semantic Currently YAGO knows knowledge base, being – more than 2 million entities developed by the group of (like persons, organizations, Gerhard Weikum at Max- cities, etc.). Planck-Institute Saarbrücken – 20 million relations Automatically constructed from We mainly use facts about the semi-structured parts of persons, such as Wikipedia (infoboxes) and the – full name, given name, taxonomic structure of WordNet – bornIn, bornOnDate, diedIn, diedOnDate Made available in RDF format – actedIn, created, directed, (among others) discovered, graduatedFrom, interestedIn, isCitizenOf, participatedIn, produced, worksAt, wrote Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Merging with YAGO: Identity Resolution Merging rules operating on name Dealing with fragmentary name and full name from Rascalli, full information (culture-dependent name and given name from YAGO heuristics) ( <Rascalli Name, Rascalli Full Siblings sharing same surname Name, Yago Full Name, Yago could have the same parents, e.g. Given Name>) • Julia Roberts hasParent Walter – Rascalli Name == Yago Full Name Roberts; e.g. <"Clarence Brown"; "Clarence • Eric Roberts hasParent Walter; Leon Brown"; "Clarence Brown"; • Julia Roberts hasSibling Eric "Clarence” > Roberts; – Rascalli Full Name == Yago Full Walter == Walter Roberts Name e.g. <"Lord Haw-Haw"; "William Joyce"; "William Joyce"; "William” > A couple could have the same children, e.g. + additional info if necessary, e.g.: Rascalli Name == Yago Given • Madonna hasChild Rocco; Name && Rascalli Birthday == Yago • Guy Richie hasChild Rocco Richie; bornOnDate • Madonna hasHusband Guy Richie; Rocco == Rocco Richie Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Merged Knowledge Base People: 618,445 hasWebsite = 118211 influences = 3043 Published: 50,601 interestedIn = 1806 academicAdvisor = 1307 isCitizenOf = 4865 Movies: 34,458 hasChild = 6868 madeCoverFor = 257 Locations: 20,733 hasSon = 4067 participatedIn = 1158 hasDaughter = 2775 produced = 9706 worksAt = 1401 hasParent = 12594 wrote = 4152 hasMother = 3383 bornIn = 44339 causeOfDeath = 1888 hasFather = 4219 bornOnDate = 442319 hasPartyAffliation = 268 diedIn = 15886 hasProfession = 8596 hasSibling = 2076 hasReligion = 1533 diedOnDate = 205808 hasBrother = 2076 hasSexualOrientation = 8560 originatedFrom = 11693 hasSister = 1100 hasRemain = 803 livesIn = 14707 hasGender = 30815 hasPartner = 18793 hasMember = 1407 actedIn = 14088 hasSpouse = 16323 isMemberOf = 8924 created = 22473 hasHusband = 7034 directed = 5859 hasWife = 6458 hasWonPrize = 16967 discovered = 75 hasBoyFriend = 1962 hasAlbum = 2663 graduatedFrom = 4968 hasGirlFriend = 2076 hasNationality = 8256 Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Part 2 Dialog Processing Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Input Analysis Q/A on RDF data is the task of mapping linguistic predicates and arguments to underspecified query graphs We support wh- , yes/no , how many -questions involving exactly one query triple Approach: linguistic input analysis component, which... – Gets the user input – Processes the dependency structure belonging to the input – Delivers a semantic representation belonging to the dependency structure – Assures robustness via an additional string pattern based component Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Concept Identification NER as a bridge from surface strings to semantic concepts Knowledge Base Gazetteers are derived from the Knowledge Base, associating names and words with ontology instance identifiers NER Examples: gazetteer – “Richard Gere” → g:Person. 8134 – “Deep Purple” → g:Group. 1358 – “buddhist” → g:Religion. 3367 Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Robust Input Processing Hybrid approach to robust E.g. „Who are the parents of input processing Mick Jagger ?“ Cascaded input processors, are currently: nsubj attr – Dependency parsing – Fuzzy string matching personY (parent | mother|father|…) baseline det prep_of Using dependency patterns for input analysis, the 1067 the personX paraphrases for the string matching baseline could be reduced to 212 dependency tree patterns Question Answering Biographic Information and Social Networks Powered by the Semantic Web
Recommend
More recommend