Attending knowledge facts with BERT-like models in - PowerPoint PPT Presentation

Attending knowledge facts with BERT-like models in Question-Answering : disappointing results and some explanations Guillaume Le Berre & Philippe Langlais 33rd Canadian Conference on Artificial Intelligence May 13 to May 15

Question Answering: a challenging task Why question answering? Core task of Natural Language Processing One of the most challenging task for deep learning models Necessary to achieve true AI interactions with humans A lot of available benchmarks 2 / 22

Extractive question answering example: SQUAD The question are adjoined with a reference text Models are required to select a span of text containing the answer Modern deep learning models reached human performances Model EM F1 Human Performance 86.8 89.4 SA-Net on Albert (ensemble) 89.7 93.0 Retro-Reader (ensemble) 90.6 93.0 ALBERT + DAAF + Verifier (ensemble) 90.4 92.8 3 / 22

OpenBookQA Multiple choice questions (4 answer choices) with no reference text Models need to rely on general world knowledge Human performance not yet achieved despite recent improvements Dataset is adjoined with Science and Common knowledge facts Model Accuracy Human Performance 0.92 UnifiedQA 0.87 TTTTT 0.83 KF+SIR 0.80 4 / 22

AI2 Reasoning Challenge (ARC) Similar to OpenBookQA Multiple choice questions (4 answer choices) with no reference text Models need to rely on general world knowledge Divided into two parts: ”Easy” and ”Challenge” Model Accuracy UnifiedQA 0.79 FreeLB-RoBERTa 0.68 arcRoberta, erenup 0.67 5 / 22

General knowledge: Learned... Most current state of the art models do not use common knowledge facts provided Major drawbacks: Low generalization capacity Require a lot of data 6 / 22

...vs extracted from a database Teaching a model how to search for information in a database: possibly allows an easier generalization by adding domain specific information into the database requires less annotated data 7 / 22

Pretrained models: BERT Pretrained on BookCorpus and Wikipedia Transformer model Provides a contextual embedding of the words in a sentence 8 / 22

Sentence BERT (SBERT) Additional pretraining on SNLI dataset From A and B, learn to predict entailment, neutral or contradiction SBERT is supposed to capture the semantic of sentences 9 / 22

Model CAT: vanilla BERT The question (+ eventual additional knowledge facts) is concatenated to every answers choices Each question/answer sequence is embedded using BERT Base The embedding are sent through a few linear layers to get a scalar score for each answer choice The model is trained using a cross-entropy loss With this setup on OpenBookQA, BERT Base and Large are expected to obtain around 55% and 60% accuracy respectively 10 / 22

Experiment 1: Biases in OpenBookQA In order to understand what part of the question BERT is using while answering the questions, we removed part of the questions (no knowledge facts given) accuracy Full question (baseline) 55.8% Last 4 tokens only 52.0% 11 / 22

Experiment 1: Biases in OpenBookQA In order to understand what part of the question BERT is using while answering the questions, we removed part of the questions (no knowledge facts given) accuracy Full question (baseline) 55.8% Last 4 tokens only 52.0% Without the question 51.2% 12 / 22

Experiment 1: Biases in OpenBookQA In order to understand what part of the question BERT is using while answering the questions, we removed part of the questions (no knowledge facts given) accuracy Full question (baseline) 55.8% Last 4 tokens only 52.0% Without the question 51.2% The model is thus able to differentiate between right and wrong answers using information inherent to the answers themselves Similar biases exist in ARC Accuracy of around 36% on ”Easy” and ”Challenge” without the question 13 / 22

Potential biases We have identified 2 biases in the dataset: Right answers are in average longer Right answers generally contains less frequent words Dummy models that select the longest answer or the one with the least frequent word obtain 33% and 37% accuracy respectively 14 / 22

Potential biases We have identified 2 biases in the dataset: Right answers are in average longer Right answers generally contains less frequent words Dummy models that select the longest answer or the one with the least frequent word obtain 33% and 37% accuracy respectively Question What impacts an objects ability to reflect light? Answer choices A: color pallete B: weights C: height D: smell 4 tokens ...ability to reflect light? 15 / 22

Model ATT: attention over facts Concatenating the knowledge facts to the question often results in long sequences and thus a large memory usage Using an attention mechanism over the BERT embeddings of the knowledge facts allows to use more complex architectures and eventually to pre-compute the embeddings in advance 16 / 22

Experiment 2: Semantic significance In this second experiment, we compare the representation provided by BERT and SBERT when trying to apply an attention mechanism We have for each question: Gold fact - A particular science fact that is relevant Other facts - A list of 9 facts automatically selected by word overlap We compare 2 setups: CAT - Vanilla setup in which the knowledge facts are concatenated to the questions ATT - Attention setup in which the facts are embedded with BERT first and then used in an attention mechanism 17 / 22

Results When using model CAT the results of BERT and SBERT are similar accuracy of 55.8% and 53.2% respectively with no additional knowledge facts increase to 64.5% and 63.6% respectively when adding the gold fact 18 / 22

Results When using model CAT the results of BERT and SBERT are similar accuracy of 55.8% and 53.2% respectively with no additional knowledge facts increase to 64.5% and 63.6% respectively when adding the gold fact With an attention (model ATT): with BERT, the model is unable to use the additional knowledge even if the gold fact is provided (alone or among the other facts) with SBERT, we observe some improvements compared to a SBERT model with no knowledge facts (accuracy of 55% with only the gold fact and 54.8% with the gold fact among other facts) 19 / 22

Results When using model CAT the results of BERT and SBERT are similar accuracy of 55.8% and 53.2% respectively with no additional knowledge facts increase to 64.5% and 63.6% respectively when adding the gold fact With an attention (model ATT): with BERT, the model is unable to use the additional knowledge even if the gold fact is provided (alone or among the other facts) with SBERT, we observe some improvements compared to a SBERT model with no knowledge facts (accuracy of 55% with only the gold fact and 54.8% with the gold fact among other facts) For SBERT, keeping only the end of the question (last 4 tokens) improves the results when using additional knowledge facts Model ATT with SBERT thus obtains an accuracy of more than 61% with only gold fact and nearly 57% with gold fact given among other facts 20 / 22

Conclusion We have to put in perspective the results of machine learning models on OpenBookQA It becomes increasingly important to understand how deep learning models are making their decision It gives an opportunity to work on bias reduction for question answering 21 / 22

The End

Attending knowledge facts with BERT-like models in - PowerPoint PPT Presentation

Attending knowledge facts with BERT-like models in Question-Answering : disappointing results and some explanations Guillaume Le Berre & Philippe Langlais 33rd Canadian Conference on Artificial Intelligence May 13 to May 15 Question

BERT 3.0 The New BERT Wheres Ernie????? Logging into Bert BERT now uses the same style logon as

BIBLICAL SURVEY Judges - Archaeology Helpful Facts Helpful Facts Neutral Facts Helpful Facts

JET Job Skills Elementary School I Like Rain By Sarah Rogers-Tanner I like rain I dont like

BERT Bidirectional Encoder Representations from Transformers Introduction What is BERT?

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Control, inference and learning Bert Kappen : SNN Donders Institute, Radboud University, Nijmegen

BERT Basic Error Response Type Bert Why: Document WG Choice What: method to sign

Architecture in Motion How Adyen achieved 100x Bert Wolters - EVP Technology bert@adyen.com

PHOTONICS IN THE MAGIC KINGDOM FAIRY TALES AND TALENT FAIRS BERT GYSELINCKX IMEC USA BERT

BERTScore: Evaluating Text Generation with BERT Varsha Kishore Tianyi Zhang Felix Wu Kilian Q.

Categorical models of circuit description languages Bert Lindenhovius Michael Mislove Vladimir

Plan for today Knowledge-based systems 1 Explicit knowledge Knowledge Representation Inferred

Plan for today Knowledge-based systems 1 Tacit knowledge Knowledge Representation Inferred

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

Community Update MST T Fast st Facts cts MST T Fast st Facts cts MST T Fast st Facts

Fact and Opinion How to Tell the Difference Facts Facts are statements that can be proven. Facts

The Family Medicine In 2013 April 5- April 22 Deadline to withdraw without seat fee 5 days

Introduction to Linguaskill What can a good test do for you nramos@anglo.edu.uy Cambridge

Lecture 28: spaced, single column, 11pt recommended; submission through Compass) No extensions

Multiple-Choice Item Design 1. The purpose of the stam is to remove the a. octal b. stam bar

Getting To and Through Academ ic W riting Using W riting Test Prom pts to Develop Academ ic W

P4 MA 4 MATH TH SY SYLL LLABUS ABUS Numbers to 100 000 Factors and Multiples Four

A In-class Test 2 26 Mar 2020 N Student Name Student Number Answer ALL Questions S 1. What

Chapter 16: The Law of Averages If we toss a coin many times, number of Hs = half the number