Semantic Question Answering on Big Data Tatiana Erekhinskaya July, - PowerPoint PPT Presentation

Semantic Question Answering on Big Data Tatiana Erekhinskaya July, 2016

The Goal Challenge: • Find answers to complex questions in large structured and unstructured data resources • Sample question: List Chinese researchers who worked with Kuznetsov, have publications on Zika virus and studied in US Solution : • Convert data into RDF storage • Convert questions into SPARQL

Outline • System Architecture • NLP & Semantic Parsing • RDF Representation • Plain English Query to SPARQL • Experiments & Results • Use Cases & Future Work

System Architecture Semantic Semantic RDF to RDF to SPARQL Store User Input Question Docs Question Deep Knowledge Processing NLP Extraction

Natural Language Processing

Concept Extraction • Hybrid approach combines machine learning classifiers, cascade of finite-state automata, and lexicons • Uses existing medical ontologies: MeSH, SNOMED and UMLS Metathesaurus • 80+ types of named entities: demographics, disease, symptom, dosage, severity, time course, onset, alleviating and aggravating factors

Semantic Parsing • Extracts 26 predefined binary relation types: AGENT, THEME, LOCATION, TIME, etc. • Maximum granularity, not limited to verb arguments: VALUE, PROPERTY, QUANTITY • Robust basic representation, not for end users POSSESSION QUANTITY VALUE 100 subjects with type 2 diabetes PROPERTY

Semantic Calculus • Defines how and under what conditions a chain of relations can be combined into a high level custom relation • Axioms: Possession(c1;c2)&ISA(c1, disease) & ISA(c2; organism)  HasDisease(c1; c2) HAS_DISEASE QUANTITY 100 subjects with type 2 diabetes SEVERITY

RDF & SPARQL

RDF Representation • 6.3 MB of text → 13 M triples, 1 GB of RDF XML • Keep only relations of interest and tokens that participate in these relations • For tokens: named entity type or is- event flag, lemma, synset, and reference sentence

Reasoning on the RDF Store • OWLPrime • SameAs: mentions • Lexical chains: Wordnet-based relation sequence

Question Processing • Full NLP & semantic parsing • Expected answer type recognition (_human or organization, _date or _time, etc.) • Answer type terms “ which cartel ” • Maximum entropy model

SPARQL Query Formulation

Query Relaxation • Synset relaxation: include hyponyms, parts, derivations • On empty results: drop variable-description triples and semantic relations with little importance

Experiments & Results

Experimental Data • Illicit Drugs domain • 584 documents: Wikipedia + documents • 6.3 MB of plain text • 6,729,854 RDF triples • 546 MB of RDF XML

Results: Question Answering 344 questions Free text-search: 47% MRR Semantic Approach: 66% MRR Factoid: 85% MRR Definition: 78% MRR List: 68 % MRR

Results: NL to SPARQL 34 manually annotated questions • SELECT clauses: 85% • WHERE clauses on triple level: 78% • WHERE clauses on question level: 65% Relaxation usage: 68% of queries inSynset-relaxation sufficient for 31%

Error Analysis 73% caused by faulty or missing semantic relations 16% caused by query conversion: yes/no questions, and procedural questions

Conclusion Use Cases • Processing Pubmed for quality measures • National Security: terrorism, law enforcement • Foreign languages Future Work • Integration with LinkedData • Rapid Customization

Semantic Question Answering on Big Data Tatiana Erekhinskaya July, - PowerPoint PPT Presentation

Semantic Question Answering on Big Data Tatiana Erekhinskaya July, 2016 The Goal Challenge: Find answers to complex questions in large structured and unstructured data resources Sample question: List Chinese researchers who

Question Answering & the Semantic Web Gnter Neumann Language Technology-Lab DFKI,

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Designing deep architectures for Visual Question Answering Matthieu Cord Sorbonne University

Question Answering and AnswerFinder Diego Moll a Centre for Language Technology Department of

A Multilingual Hybrid Question-Answering System Cross-Lingual Open-Domain Question Answering

Additional Semantic Tasks: Entity Coreference and Question Answering CMSC 473/673 UMBC Outline

Question Answering Alexander Solovyev Bauman Moscow Sate Technical University a-soloviev@mail.ru

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Statistical NLP Spring 2011 Lecture 26: Question Answering Dan Klein UC Berkeley Question

Question Answering and Reading Comprehension Kevin Duh Fall 2019, Intro to HLT, Johns Hopkins

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

An Question Recommendation System for Question Answer Community (Stackoverflow) Presenter: Haoyu

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS345a Data Mining Project A Web Based Question Answering System Vincenzo Di Nicola Jyotika

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

What is happening to the world? Professor Sir John Beddington Oxford Martin School, Oxford

Teleconference on Zika for Title V Leaders May 26, 2016 Agenda: I. Introduction and Overview

Development of Mogamulizumab, a defucosylated anti-CCR4 humanized monoclonal antibody Michinori

Causal Effect Moderation (Modification) When Treatment or Exposure is Time-Varying Daniel

5/2/16 Insect Mouthparts Insects with Chewing Mouthparts Japanese Beetle Imported Willow Gypsy

Digital Bridge Governance Principles Transparency: Stakeholders will have Utility: The governance

DCS/CSCI 2350: Social & Economic Networks How does a disease propagate in a network? Chapter

Sta ff Re c o g nitio n E ve nt RAL is happy to recognize our 2017 awardees and nominees for

Semantic Question Answering on Big Data Tatiana Erekhinskaya July, - PowerPoint PPT Presentation

Semantic Question Answering on Big Data Tatiana Erekhinskaya July, 2016 The Goal Challenge: Find answers to complex questions in large structured and unstructured data resources Sample question: List Chinese researchers who

Question Answering &amp; the Semantic Web Gnter Neumann Language Technology-Lab DFKI,

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

Designing deep architectures for Visual Question Answering Matthieu Cord Sorbonne University

Question Answering and AnswerFinder Diego Moll a Centre for Language Technology Department of

A Multilingual Hybrid Question-Answering System Cross-Lingual Open-Domain Question Answering

Additional Semantic Tasks: Entity Coreference and Question Answering CMSC 473/673 UMBC Outline

Question Answering Alexander Solovyev Bauman Moscow Sate Technical University a-soloviev@mail.ru

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Statistical NLP Spring 2011 Lecture 26: Question Answering Dan Klein UC Berkeley Question

Question Answering and Reading Comprehension Kevin Duh Fall 2019, Intro to HLT, Johns Hopkins

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

An Question Recommendation System for Question Answer Community (Stackoverflow) Presenter: Haoyu

Big Data Algorithms with Medical Applications Yixin Chen Outline Challenges to big data

CS345a Data Mining Project A Web Based Question Answering System Vincenzo Di Nicola Jyotika

CS535 Big Data 1/22/2020 Sangmi Lee Pallickara CS535 Big Data | Computer Science Department

What is happening to the world? Professor Sir John Beddington Oxford Martin School, Oxford

Teleconference on Zika for Title V Leaders May 26, 2016 Agenda: I. Introduction and Overview

Development of Mogamulizumab, a defucosylated anti-CCR4 humanized monoclonal antibody Michinori

Causal Effect Moderation (Modification) When Treatment or Exposure is Time-Varying Daniel

5/2/16 Insect Mouthparts Insects with Chewing Mouthparts Japanese Beetle Imported Willow Gypsy

Digital Bridge Governance Principles Transparency: Stakeholders will have Utility: The governance

DCS/CSCI 2350: Social &amp; Economic Networks How does a disease propagate in a network? Chapter

Sta ff Re c o g nitio n E ve nt RAL is happy to recognize our 2017 awardees and nominees for

Question Answering & the Semantic Web Gnter Neumann Language Technology-Lab DFKI,

DCS/CSCI 2350: Social & Economic Networks How does a disease propagate in a network? Chapter