phrase indexed question answering a new challenge for
play

Phrase-Indexed Question Answering : A New Challenge for Scalable - PowerPoint PPT Presentation

Phrase-Indexed Question Answering : A New Challenge for Scalable Document Comprehension Minjoon Seo, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, Hannaneh Hajishirzi Question Answering? 1961 Model Barack Obama (1961-present) was the 44 th


  1. Phrase-Indexed Question Answering : A New Challenge for Scalable Document Comprehension Minjoon Seo, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, Hannaneh Hajishirzi

  2. Question Answering?

  3. 1961 Model “Barack Obama (1961-present) was the 44 th When was Obama born? President of the United States.” Document (context) Question

  4. 1961 Model Extractive “Barack Obama (1961-present) was the 44 th When was Obama born? President of the United States.” Document (context) Question

  5. Extractive QA Datasets • SQuAD (Rajpurkar et al., 2016) • NewsQA (Trischler et al., 2016) • TriviaQA (Joshi et al., 2017) • QuAC (Choi et al., 2018) • CoQA (Reddy & Chen & Manning, 2018) • HotpotQA (Yang et al., 2018) • And more…

  6. Open-domain QA?

  7. 1961 Model “Barack Obama (1961-present) was the 44 th When was Obama born? President of the United States.” Document (context) Question

  8. 1961 Model When was Obama born? Question

  9. 4 Million documents 3 Billion tokens 0.1s / doc * 4M docs = 6 days !

  10. Choi et al., 2017; Chen et al., 2017; Clark & Gardner, 2017 TF-IDF, BM 25, LSA Information Retrieval Model 1961 When was Obama born? Pipelined

  11. Choi et al., 2017; Chen et al., 2017; Clark & Gardner, 2017 Wrong TF-IDF, document! BM 25, LSA Information Retrieval Model 1961 When was Obama born?

  12. Choi et al., 2017; Chen et al., 2017; Clark & Gardner, 2017 Wrong TF-IDF, document! BM 25, LSA Wrong Information answer! Retrieval Model 1911 When was Obama born? Error propagation

  13. Ideally…

  14. TF-IDF, BM 25, LSA Information Retrieval Model 1961 When was Obama born?

  15. ? Model 1961 When was Obama born? End-to-end & elegant… But how?

  16. Solution: Index phrases!

  17. [-3, 0.1, …] When was [0.5, 0.1, …] Obama born? [0.3, -0.2, …] Nearest [0.5, 0.1, …] neighbor search [0.7, -0.4, …] Document Indexing [0.5, 0.0, …] - Locality Sensitive Hashing - aLSH (Shrivastava & Li, 2014) [3.3, -2.2, …] - …

  18. “Barack Obama (1961-present) was the 44 th President of the United States.” Who is the 44 th Barack Obama … President of the U.S.? Nearest … ( 1961 -present … neighbor … 44 th President … search When was … United States . Obama born? Question Phrase encoding encoding

  19. Model phrase question document " = argmax ! * + ", -, . ) Decompose " = argmax ! / + (-) 2 3 + (", .) ) Phrase encoder Question encoder

  20. Decomposability is a strong constraint

  21. Phrase-Indexed QA (PIQA) Challenge • Open-domain QA is hard to setup or evaluate • Instead, benchmark on existing datasets (e.g. SQuAD) • Create two models: • Phrase (document) encoder • Question encoder • Phrase encoder must be question-agnostic , and vice versa • Answer must be obtained via nearest neighbor search (NNS)

  22. PI-SQuAD Evaluation

  23. Is it too easy or too hard?

  24. BERT (Devlin et al., 2018) 92% F1 SQuAD v1.1 Red color is phrase- SA+ELMo (Peters et al., 2018) 86% F1 indexed. Decomposability gap SA+ELMo (Seo et al., 2018) 64% F1 Feature-based (Rajpurkar et al., 2018) 50% F1

  25. BERT (Devlin et al., 2018) 92% F1 SQuAD v1.1 Red color is phrase- SA+ELMo (Peters et al., 2018) 86% F1 indexed. Sparse+SA+ELMo 70% F1 Match-LSTM (Wang & Jiang., 2017) First neural model 68% F1 SA+ELMo (Seo et al., 2018) 64% F1 Feature-based (Rajpurkar et al., 2018) 50% F1

  26. Phrase Representation Learning • Not just about scalability, but also about comprehension • Standalone representations of phrases (document) PIQA can be viewed as: • A phrase embedding evaluation method • Sentence embedding in SNLI (Bowman et al., 2015) • Constructing a memory of knowledge • Memory Networks (Weston et al., 2014)

  27. According to the American Library Association , this makes … … tasked with drafting a European Charter of Human Rights , … Named Entities

  28. The LM engines were successfully test- fired and restarted, … Steam turbines were extensively applied … Lexical & Syntactic Similarity

  29. … primarily accomplished through the ductile stretching and thinning . … directly derived from the homogeneity or symmetry of space … Syntactic Similarity

  30. Demo on my Macbook Corpus size: 300k Tokens (SQuAD dev set) 16 CPUs: 100s+ GPU: 10s+

  31. A lot of things to do • Closing the gap due to decomposability constraint • BERT (Devlin et al., 2018)? • Reducing index storage (100TB+ for Wikipedia) • Reducing phrase embedding dimension (1024) • Extending to open-domain QA • Analyzing phrase representations • And more!

  32. http://pi-qa.com Thank you!

Recommend


More recommend