Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon - PowerPoint PPT Presentation

Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon Seo*, Jinhyuk Lee*, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, Hannaneh Hajishirzi * denotes equal contribution

Open-domain QA?

Some 1961 Model When was Obama born? 5 Million documents 3 Billion tokens

Retrieve & Read TF-IDF, BM-25, LSA Information Retrieval Reader 1961 (Model) When was Obama born? 1. Error propagation: reading only 5-10 docs 2. Query-dependent encoding: 30s+ per query Chen et al., 2017

We want… • To ”read” entire Wikipedia • 5-10 docs à 5 Million docs • Reach long-tail answers • Fast inference on CPUs • 35s à 0.5s • Maintain high accuracy HOW?

Our approach: index phrases!

Phrase Indexing “Barack Obama (1961-present) was the 44 th President of the United States.” Who is the 44 th Barack Obama … President of the U.S.? Nearest … ( 1961 -present … neighbor … 44 th President … search When was … United States . Obama born? Question Phrase encoding encoding Seo et al., 2018

[-3, 0.1, …] When was [0.5, 0.1, …] Obama born? [0.3, -0.2, …] Nearest [0.5, 0.1, …] neighbor search [0.7, -0.4, …] Document Indexing [0.5, 0.0, …] - Locality Sensitive Hashing (LSH) - aLSH (Shrivastava & Li, 2014) [3.3, -2.2, …] - HNSW (Malkov & Yashunin, 2018)

Model phrase question document " = argmax ! * + ", -, . ) Query-Agnostic Decomposition " = argmax ! / + (-) 2 3 + (", .) ) Phrase encoder Question encoder

Phrase (and question) Representation • Dense representation • Can utilize deep neural networks • great for capturing semantic and syntactic information • Not great for disambiguating ”Einstein” vs “Tesla” • Sparse representation (bag-of-word) • Great for capturing lexical information • Represent each phrase with a concatenation of both

Dense-Sparse Phrase Index (DenSPI) N = 60 Billion N = 5 Million Phrase Index Document Index Query vector for … … document Dense When was When was Barack Barack Obama born? Reader Sparse Obama born? Model 1961 1961 DenSPI Retrieve & Read Ours (Chen et al., 2017)

Dense Representation for Phrases phrase vector Start vector End vector dot Coherency vector Coherency scalar Text Encoder (BERT) Association According to the American Library We want encoding for this phrase

Dense Representation for Questions question vector Start vector End vector Coherency vector Coherency scalar 1 Text Encoder (BERT) was Barack Obama born? [CLS] When

Sparse Representation • TF-IDF document & paragraph vector, computed over Wikipedia • Unigram & Bigram (vocab size = 17 Million) • Adopted DrQA’s vocab/TF-IDF (Chen et al., 2017)

Beware of the scale… • 60 Billion phrases in Wikipedia! • Training • Softmax on 60 Billion phrases? • Storage • 60 Billion phrases x 4 KB per phrase = 240 TB? • Search • Exact search on 60 Billion phrases? We want to be open-research-friendly

Training • Close-domain QA dataset: the model can easily overfit • e.g. ”who” question when only one named entity in the context • Negative sampling and concatenation • Sampling strategy is crucial • Use query encoder to associate similar questions in training set • Concatenate the context that the similar question belongs to

Storage • 60 Billion phrases x 4 KB per phrase = 240 TB ! 1. Pointer : share start and end vectors • 240 TB à 12 TB 2. Filter : 1-layer classifier on phrase vectors • 12 TB à 4.5 TB 3. Scalar Quantization : 4 bytes à 1 byte per dim • 4.5 TB à 1.5 TB

Search • An open-source library for large-scale dense+sparse nearest neighbor search is non-existent • Dense-first search (DFS) • Sparse-first search (SFS) • Hybrid

Experiments

Open-Domain SQuAD Weaver (Raison et al., 2018) 42% EM Red color is query- agnostic. BERTserini (Yang et al., 2019) 39% EM 115 s/Q 144x DenSPI (Ours) 36% EM 0.8 s/Q MINIMAL (Min et al., 2018) 35% EM 44x Multi-step reasoner (Das et al., 2019) 32% EM DrQA (Chen et al., 2017) 30% EM 35 s/Q

Qualitative Comparisons Q: What can hurt a teacher’s mental and physical health? Teacher Mental health Teachers face several … and poor mental health occupational hazards in can lead to problems such their line of work, as substance abuse . including occupational stress … Retrieve & Read (Chen et al., 2017) DenSPI (Ours)

Q: Who was Kennedy’s science adviser that opposed manned spacecraft flights? Apollo program Apollo program Kennedy’s science advisor Jerome Kennedy’s science advisor Jerome Wiesner , … his opposition to Wiesner , … his opposition to manned spaceflight … manned spaceflight … Apollo program Space Race … and the sun by NASA manager Jerome Wiesner of MIT, who Abe Silverstein , who later said served as a … advisor to … that … Kennedy, … opponent of manned Apollo program John F. Kennedy Although Grumman wanted a … science advisor Jerome second unmanned test, George Wiesner … strongly opposed to Low decided … be manned. manned space exploration, …

Q: What is the best thing to do when bored? Bored to Death (song) Big Brother 2 I’m nearly bored to death When bored, she enjoys drawing . Waterview Connection Angry Kid The twin tunnels were bored by he can think of a much more fun … tunnel boring machine (TBM) thing he can do while on his … back: painting . Bored to Death (song) Pearls Before Swine It’s easier to say you’re bored, or She is a live music goer, and her to be angry, than it is to be sad . hobby is watching movies .

Demo • http://nlp.cs.washington.edu/denspi

http://nlp.cs.washington.edu/denspi

Conclusion • “Read” entire Wikipedia in 0.5s with CPUs • Query-agnostic, indexable phrase representations • Utilize both dense (BERT-based) and sparse (bag-of-word) representations for encoding lexical, syntactic, and semantic information • 6,000x lower computational cost with higher accuracy for exact search • At least 44x faster open-domain QA with higher accuracy • (query-agnostic) decomposability gap still exists (6-10%); we hope future research can close the gap

Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon - PowerPoint PPT Presentation

Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, Hannaneh Hajishirzi * denotes equal contribution Open-domain QA? Some 1961 Model When was Obama born? 5 Million

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

Vector'Semantics Dense%Vectors% Dan%Jurafsky Sparse'versus'dense'vectors PPMI%vectors%are

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and

What Is an Expanded Noun Phrase? An expanded noun phrase gives much more detail than a simple

Compression, inversion and sparse approximate PCA of dense kernel matrices in near linear

Real- Real -Time Systems Time Systems Real- -Time Systems Time Systems Real

Real Real- -Time Systems Time Systems Designing a real- Designing a real -time system time

Real- Real -time systems time systems Real- Real -time programming time programming

Submanifold Sparse Convolutional Networks for Sparse, Locally Dense Particle Image Analysis

Flexible Full Text Search Aleksandr Parfenov Arthur Zakirov PGConf.EU-2017, Warsaw FTS in

Outline of Presentation Introduction A PRESENTATION BY PHP History Variables and

This Lecture PHP Variables PHP and MySQL Arrays IF...ELSE statements Loops

1 Best Practices and Implementation Procedures for Using Telehealth in PHP & IOP Anxiety

AgileItera+ons: Planning&Repor+ng MovingFromUserStories

Seeded Discovery of Base Relations in Large Corpora Nicholas Andrews 1 Naren Ramakrishnan 2 1 BBN

Boolean and Vector Space Retrieval Models CS 293S, 2017 Some of slides from R. Mooney

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon - PowerPoint PPT Presentation

Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon Seo*, Jinhyuk Lee*, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, Hannaneh Hajishirzi * denotes equal contribution Open-domain QA? Some 1961 Model When was Obama born? 5 Million

CS143: Index 1 Topics to Learn Important concepts Dense index vs. sparse index Primary

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Results for different matrices and comparisons Dense Matrices Rectangular Matrices

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

Vector'Semantics Dense%Vectors% Dan%Jurafsky Sparse'versus'dense'vectors PPMI%vectors%are

Dense Flow Visualization Lecture 10 February 27, 2020 General Overview Dense methods in 2D

A Massively Parallel Dense Symmetric A Massively Parallel Dense Symmetric A Massively Parallel

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

Building a Phrase-based SMT System Graham Neubig &amp; Kevin Duh Nara Institute of Science and

What Is an Expanded Noun Phrase? An expanded noun phrase gives much more detail than a simple

Compression, inversion and sparse approximate PCA of dense kernel matrices in near linear

Real- Real -Time Systems Time Systems Real- -Time Systems Time Systems Real

Real Real- -Time Systems Time Systems Designing a real- Designing a real -time system time

Real- Real -time systems time systems Real- Real -time programming time programming

Submanifold Sparse Convolutional Networks for Sparse, Locally Dense Particle Image Analysis

Flexible Full Text Search Aleksandr Parfenov Arthur Zakirov PGConf.EU-2017, Warsaw FTS in

Outline of Presentation Introduction A PRESENTATION BY PHP History Variables and

This Lecture PHP Variables PHP and MySQL Arrays IF...ELSE statements Loops

1 Best Practices and Implementation Procedures for Using Telehealth in PHP &amp; IOP Anxiety

AgileItera+ons: Planning&amp;Repor+ng MovingFromUserStories

Seeded Discovery of Base Relations in Large Corpora Nicholas Andrews 1 Naren Ramakrishnan 2 1 BBN

Boolean and Vector Space Retrieval Models CS 293S, 2017 Some of slides from R. Mooney

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon Seo, Jinhyuk Lee, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, Hannaneh Hajishirzi * denotes equal contribution Open-domain QA? Some 1961 Model When was Obama born? 5 Million

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and

1 Best Practices and Implementation Procedures for Using Telehealth in PHP & IOP Anxiety

AgileItera+ons: Planning&Repor+ng MovingFromUserStories