Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon Seo*, Jinhyuk Lee*, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, Hannaneh Hajishirzi * denotes equal contribution
Open-domain QA?
Some 1961 Model When was Obama born? 5 Million documents 3 Billion tokens
Retrieve & Read TF-IDF, BM-25, LSA Information Retrieval Reader 1961 (Model) When was Obama born? 1. Error propagation: reading only 5-10 docs 2. Query-dependent encoding: 30s+ per query Chen et al., 2017
We want… • To ”read” entire Wikipedia • 5-10 docs à 5 Million docs • Reach long-tail answers • Fast inference on CPUs • 35s à 0.5s • Maintain high accuracy HOW?
Our approach: index phrases!
Phrase Indexing “Barack Obama (1961-present) was the 44 th President of the United States.” Who is the 44 th Barack Obama … President of the U.S.? Nearest … ( 1961 -present … neighbor … 44 th President … search When was … United States . Obama born? Question Phrase encoding encoding Seo et al., 2018
[-3, 0.1, …] When was [0.5, 0.1, …] Obama born? [0.3, -0.2, …] Nearest [0.5, 0.1, …] neighbor search [0.7, -0.4, …] Document Indexing [0.5, 0.0, …] - Locality Sensitive Hashing (LSH) - aLSH (Shrivastava & Li, 2014) [3.3, -2.2, …] - HNSW (Malkov & Yashunin, 2018)
Model phrase question document " = argmax ! * + ", -, . ) Query-Agnostic Decomposition " = argmax ! / + (-) 2 3 + (", .) ) Phrase encoder Question encoder
Phrase (and question) Representation • Dense representation • Can utilize deep neural networks • great for capturing semantic and syntactic information • Not great for disambiguating ”Einstein” vs “Tesla” • Sparse representation (bag-of-word) • Great for capturing lexical information • Represent each phrase with a concatenation of both
Dense-Sparse Phrase Index (DenSPI) N = 60 Billion N = 5 Million Phrase Index Document Index Query vector for … … document Dense When was When was Barack Barack Obama born? Reader Sparse Obama born? Model 1961 1961 DenSPI Retrieve & Read Ours (Chen et al., 2017)
Dense Representation for Phrases phrase vector Start vector End vector dot Coherency vector Coherency scalar Text Encoder (BERT) Association According to the American Library We want encoding for this phrase
Dense Representation for Questions question vector Start vector End vector Coherency vector Coherency scalar 1 Text Encoder (BERT) was Barack Obama born? [CLS] When
Sparse Representation • TF-IDF document & paragraph vector, computed over Wikipedia • Unigram & Bigram (vocab size = 17 Million) • Adopted DrQA’s vocab/TF-IDF (Chen et al., 2017)
Beware of the scale… • 60 Billion phrases in Wikipedia! • Training • Softmax on 60 Billion phrases? • Storage • 60 Billion phrases x 4 KB per phrase = 240 TB? • Search • Exact search on 60 Billion phrases? We want to be open-research-friendly
Training • Close-domain QA dataset: the model can easily overfit • e.g. ”who” question when only one named entity in the context • Negative sampling and concatenation • Sampling strategy is crucial • Use query encoder to associate similar questions in training set • Concatenate the context that the similar question belongs to
Storage • 60 Billion phrases x 4 KB per phrase = 240 TB ! 1. Pointer : share start and end vectors • 240 TB à 12 TB 2. Filter : 1-layer classifier on phrase vectors • 12 TB à 4.5 TB 3. Scalar Quantization : 4 bytes à 1 byte per dim • 4.5 TB à 1.5 TB
Search • An open-source library for large-scale dense+sparse nearest neighbor search is non-existent • Dense-first search (DFS) • Sparse-first search (SFS) • Hybrid
Experiments
Open-Domain SQuAD Weaver (Raison et al., 2018) 42% EM Red color is query- agnostic. BERTserini (Yang et al., 2019) 39% EM 115 s/Q 144x DenSPI (Ours) 36% EM 0.8 s/Q MINIMAL (Min et al., 2018) 35% EM 44x Multi-step reasoner (Das et al., 2019) 32% EM DrQA (Chen et al., 2017) 30% EM 35 s/Q
Qualitative Comparisons Q: What can hurt a teacher’s mental and physical health? Teacher Mental health Teachers face several … and poor mental health occupational hazards in can lead to problems such their line of work, as substance abuse . including occupational stress … Retrieve & Read (Chen et al., 2017) DenSPI (Ours)
Q: Who was Kennedy’s science adviser that opposed manned spacecraft flights? Apollo program Apollo program Kennedy’s science advisor Jerome Kennedy’s science advisor Jerome Wiesner , … his opposition to Wiesner , … his opposition to manned spaceflight … manned spaceflight … Apollo program Space Race … and the sun by NASA manager Jerome Wiesner of MIT, who Abe Silverstein , who later said served as a … advisor to … that … Kennedy, … opponent of manned Apollo program John F. Kennedy Although Grumman wanted a … science advisor Jerome second unmanned test, George Wiesner … strongly opposed to Low decided … be manned. manned space exploration, …
Q: What is the best thing to do when bored? Bored to Death (song) Big Brother 2 I’m nearly bored to death When bored, she enjoys drawing . Waterview Connection Angry Kid The twin tunnels were bored by he can think of a much more fun … tunnel boring machine (TBM) thing he can do while on his … back: painting . Bored to Death (song) Pearls Before Swine It’s easier to say you’re bored, or She is a live music goer, and her to be angry, than it is to be sad . hobby is watching movies .
Demo • http://nlp.cs.washington.edu/denspi
http://nlp.cs.washington.edu/denspi
Conclusion • “Read” entire Wikipedia in 0.5s with CPUs • Query-agnostic, indexable phrase representations • Utilize both dense (BERT-based) and sparse (bag-of-word) representations for encoding lexical, syntactic, and semantic information • 6,000x lower computational cost with higher accuracy for exact search • At least 44x faster open-domain QA with higher accuracy • (query-agnostic) decomposability gap still exists (6-10%); we hope future research can close the gap
Recommend
More recommend