efficient algorithm for answering fact based queries
play

Efficient Algorithm for Answering Fact-based Queries Using - PowerPoint PPT Presentation

Efficient Algorithm for Answering Fact-based Queries Using Relational Data Enriched by Context-Based Embeddings Ph.D. Dissertation Abdulaziz (Aziz) Altowayan Computer Science Pace University 12/12/2019 Outlines Motivation Problem and


  1. Efficient Algorithm for Answering Fact-based Queries Using Relational Data Enriched by Context-Based Embeddings Ph.D. Dissertation Abdulaziz (Aziz) Altowayan Computer Science Pace University 12/12/2019

  2. Outlines • Motivation • Problem and Challenges • Solution, Our Approach, and Validation • Building Representation Models (Altoawayn and Tao, 2015), (Altowayan and Tao, 2016) and (Altowayan and Elnagar 2017) • Models Evaluation (Altoawayn and Tao, 2019) • Application • Factoid KGE QA Algorithm

  3. Demo Web App available at: http://bit.ly/KGE_QA Source code available at: https://github.com/iamaziz/kge_qa This slides available at: http://aziz.nyc/phd/slides.pdf

  4. Where does this come from?

  5. The Knowledge Graph (KG) image credit: https://bit.ly/2RvSC1A

  6. Knowledge Graphs The KG is a graph-structured knowledge base used in a specific domain to describe the knowledge contained in that domain and the relationships between the domain’s components.

  7. 
 
 Motivation Leveraging KGs help to understand and solve problems in various domains: • In biology : interactions between proteins and genes • In medicine : drugs and their e ff ects • In social networks : who knows who and where they belong • In search engines : finding relevant results 
 and other domains.

  8. Where do KGs come from? image credit: https://bit.ly/2r9HJYV

  9. Problem : KG Representation How to use, represent, and leverage Knowledge Graphs? Issues with representing large scale knowledge graphs • KGs are hard to manipulate (Bordes et al., 2015) • large dimensions : 100K/100M entities, 10K/1M relation types • Sparse : few valid links • Noisy/Incomplete : missing/wrong relations/entities

  10. Relations Representation

  11. How it all started Supporting part-Of relations in Ontologies.

  12. OWL: Representation

  13. OWL: Representation Simplifying part-whole relations representation Mapped to (Altowayan and Tao, 2015)

  14. OWL: Representation Inferred Ontology (Altowayan and Tao, 2015)

  15. From Ontology To KG Ontology vs. KG

  16. KG representation is an NLP problem Which NLP-modeling approach is better suited for representing KGs?

  17. Distributed Representations Neural Language Models A.K.A Word Embeddings Why? • Solve the curse-of-dimensionality problem • Capture syntactic and semantic similarity of language

  18. Representing KGs as a Vector Space Model VSM ( Embeddings )

  19. Embedding Idea Learn dense distributed representations for tokens (words). How? 1. Use Neural Networks to learn the embeddings 2. Assume the Distributional Semantics The intuition behind Distributional Hypothesis:

  20. Do Embeddings Really Work? Feature representation: using embeddings (only) vs. hand-crafted features Hand-crafted features Hand-crafted features Embedding-based features (Altowayan and Tao, 2016)

  21. Improving Embeddings Task-specific models perform better than the generic ones Generic Corpus Generic Corpus Specific Corpus Specific Corpus (Altowayan and Elnagar, 2017)

  22. Building KGs Embeddings for a real-world application: Question Answering

  23. Example: Knowledge Graph • Triplet form : ( Titanic, written by, James Cameron)

  24. Answering Questions with KGs We can ask questions in natural language about the knowledge contained within the knowledge graph e.g. • Who is the writer of Titanic? • Who wrote Titanic movie? • Titanic movie is written by whom? All of which can be answered by finding the associated triplet: (Titanic, written by, James Cameron)

  25. A simple Q/A example • Starting from a simple fact: • “James Cameron is the writer of the Titanic movie” • Assume that fact is captured in a triplet form: • (Titanic, written_by, James_Cameron) • Ask a question: • Who is the writer of Titanic? • To find the answer: • Detect Head/Relation in the question e.g. Head : Titanic, Relation : written_by • Complete the pair (Titanic, written_by , ? )

  26. KGE QA System Factoid- Q uestion A nswering System Based on K nowledge G raph E mbeddings We present a new algorithm for answering factual questions from Knowledge Graphs. We build two embedding models from the knowledge graph: one for Named Entities Recognition and the other for Relation Extraction .

  27. Our approach: High-level

  28. Dataset: Domain Knowledge We use the benchmark FB15K dataset and filtered it out to keep four domains

  29. Data Conversion head relation tail Raw FB15K triplet: /m/0dr_4 /film/film/written_by /m/03_gd Cleaned FB15K triplet: titanic written_by james_cameron Entity Descriptions Description data source: https://github.com/xrb92/DKRL

  30. Building ENT/REL models

  31. Answering Pipeline

  32. Evaluating Pre-trained Models (Altowayan and Tao, 2019)

  33. KGE QA Algorithm

  34. Determining Tokens Type Given an input , we decide its type based on its closest token vector from and as follows: neighbor ENT . vec REL . vec neighbors = closest _ entities ( token ) + closest _ relations ( token ) 1) closest _ neighbor = { if similarity ≥ MAX _ CONFIDENCE max ( similarity ( neighbors )), 2) SM ( neighbors ) + similarity ( neighbors ) otherwise max ( ), 2 where , similarity is CosineSimilarity and SM is SequenceMatcher token _ type = { if similarity ( closest _ neighbor ) ≥ MIN _ CONFIDENCE type ( closest _ neighbor ), 3) otherwise OTHER ,

  35. KGE QA: Main UI

  36. Demo: example answer INPUT : who is the writer of Troy movie?

  37. Demo: visualizing similarities INPUT : who is the writer of Troy movie?

  38. Demo: visualizing similarities INPUT : who is the writer of Troy movie?

  39. Demo: under the hood INPUT : who is the writer of Troy movie? Logging output of the system while answering a question

  40. KGE QA is Customizable Works with any customized KG “dataset” 1) Create your own dataset 2) Build KGE for new domain knowledge 3) Ask questions

  41. Assumption: How To Ask Questions

  42. KGE QA: responses 2) When No Entity/Relation detected 1) Answer Found in the question 3) When Entity and Relation are detected, 
 but no corresponding fact in the KG

  43. Sample answers

  44. KGE QA: strengths • Captures variations of word senses in ENT/REL (e.g. influences -> influenced) • Supports one-to-many results (e.g. people born in nyc has 209 results) • Handles typos and letter case-agnostic • Supports semantic (meaning) similarity of words (e.g. belongs to -> located_in) Also .. • Works with any customized KG dataset

  45. KGE QA: limitations • Ask one question (fact) at a time • Assumes Head/Relation to be present in the question • One-direction relation (not reflexive e.g. A has_part B != B partOf A ) • No nested answers (single-relations only) • Sensitive to noisy data (e.g. when similar Head/Relation in the question)

  46. KGE QA: challenges • Acquiring clean data • Converting IDs to actual nouns in benchmark datasets (hard to find mapping data) • Choosing thresholds values (for determining token types) can be tricky • i.e. MAX _ CONFIDENCE and MIN _ CONFIDENCE

  47. Other Applications The KGE QA approach applies to other applications as well. For example, in search engines, for finding relevant results by linking to most similar keywords in other webpages.

  48. Fun stats KGE QA source code:

  49. Fun stats Word count of the dissertation’s chapters:

  50. Fun stats The dissertation was written and maintained using Git Version Control with Markdown and LaTeX: TOTOAL commits: 230 First commit: Sat Feb 18 16:51:21 2017 Last commit: Tue Dec 10 14:55:27 2019

  51. Related Publications Altowayan, A. Aziz, and Tao, Lixin (2015) “Simplified approach for representing part-whole relations in OWL-DL ontologies." 2015 IEEE 12th International Conference on Embedded Software and Systems . IEEE, 2015. Altowayan, A. Aziz, and Tao, Lixin (2016) “Word embeddings for Arabic sentiment analysis." 2016 IEEE International Conference on Big Data (Big Data) . IEEE, 2016. Altowayan, A. Aziz, and Ashraf Elnagar (2017) “Improving Arabic sentiment analysis with sentiment- specific embeddings." 2017 IEEE International Conference on Big Data (Big Data) . IEEE, 2017. Altowayan, A. Aziz and Tao, Lixin (2019) “Evaluating Word Similarity Measure of Embeddings Through Binary Classification” . JCSR 2019, Journal of Computer Science Research. 3. jcsr.v1 Nov. 2019

  52. Thank you Q/A

Recommend


More recommend