Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers Georgios Spithourakis , Steffen Petersen, Sebastian Riedel Machine Reading Group
Numeracy the mat cat 2000 words 3.14… dog fox ℤ brown -1 ℝ 7 sleeping three sat 0.001 2 numbers four jumped two numerals 4 0 1+2i ℕ 0. ത 9 one ℂ 2 1.73 1 i π 2 ℚ 2018 −1 3.14 5/8 2 2/3
Literate Language Models 𝑄 𝑀𝑁 𝑢𝑓𝑦𝑢 Plausible (semantically, grammatically, etc.) ‘A apple eats I’ ‘I eats an apple’ ‘An apple eats me’ ‘I eat an apple’
Numerate Language Models 𝑄 𝑀𝑁 𝑢𝑓𝑦𝑢 𝑄 ℎ𝑓𝑗ℎ𝑢 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 ‘John is 0 m tall’ ‘John is 1.7 m tall’ ‘John is 2 m tall’ ‘John is 999 m tall’
Numeracy Matters 0.5 ‘Unemployment of the US is 5 %’ 0.0 50 23.2 500 ‘Patient’s temperature is 36.6 degrees’ 41.9 98.6 0 ‘Our model is 10 times better than the baseline’ 100 1000
Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?
Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?
A Neural Language Model 𝑞(𝑥 𝑢 |ℎ 𝑢 ) Output … … ℎ 𝑢 ℎ 𝑢−1 RNN Input 𝑓 𝑢 𝑥 𝑢−1
A Neural Language Model = softmax (𝑥 𝑢 ) 𝑞(𝑥 𝑢 |ℎ 𝑢 ) 𝑊 petrichor the unothrorgaphy Output cat Spithourakis mat … sat … … ℎ 𝑢 ℎ 𝑢−1 RNN UNK h t V 2 2.1 1 Input 𝑓 𝑢 0.731 1.7 9,846,321 2018.3 2018 … 𝑥 𝑢−1 UNKNUM
Evaluation: Adjusted Perplexity Perplexity John is 2.1 m tall 𝑞 2.1 = 𝑞 UNKNUM BUT + 𝑞 0.731 2.1 + 0.731 𝑞 9,846,321 9,846,321 + 2018.3 ⋮ …
Evaluation: Adjusted Perplexity Perplexity Adjusted Perplexity [Ueberla, 1994] John is 2.1 m tall 𝑞 UNKNUM 𝑞 2.1 = 𝑞 𝑞 2.1 = UNKNUM 𝑥 ∈ UNKNUM BUT + from test data 𝑞 0.731 2.1 + 0.731 𝑞 9,846,321 a.k.a. Unknown-Penalised Perplexity 9,846,321 + 2018.3 [Ahn et al., 2016] ⋮ …
Datasets Clinical Dataset Scientific Dataset 16,015 clinical patient reports 20,962 paragraphs from scientific papers Source: ARXIV Source: London Chest Hospital 4% 96% words numerals
Results: Adjusted Perplexity 3,505,856.25 (Lower is better) 100 80.62 Scientific 51.83 50 0 58,443.72 all tokens words numerals 8.91 10 5.99 Clinical 5 0 all tokens words numerals
Results: Adjusted Perplexity 3,505,856.25 (Lower is better) 100 80.62 Scientific 51.83 50 0 58,443.72 all tokens words numerals 8.91 10 5.99 Clinical 5 0 all tokens words numerals
Results: Adjusted Perplexity 3,505,856.25 (Lower is better) softmax 100 Assumptions Reality (?) 80.62 Scientific 51.83 large large 50 UNKNUM 0 58,443.72 all tokens words numerals 8.91 10 small small 5.99 Clinical 5 PMF PDF 0 all tokens words numerals
Q1: Are existing LMs numerate? Q2: How to improve the numeracy of LMs?
Strategy: Softmax & Hierarchical Softmax softmax softmax softmax the the word h t 𝑢𝑧𝑞𝑓 s cat cat numeral mat mat 𝑥𝑝𝑠𝑒𝑡 sat sat UNK UNK h t V 2 1 𝒒(𝒐𝒗𝒏𝒇𝒔𝒃𝒎|𝒊 𝒖 ) 1.7 2018 UNKNUM
Strategy: Softmax & Hierarchical Softmax softmax softmax softmax the the word h t 𝑢𝑧𝑞𝑓 s cat cat numeral mat mat 𝑥𝑝𝑠𝑒𝑡 sat sat softmax UNK UNK h t V 2 2 1 1 1.7 1.7 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡 2018 2018 UNKNUM UNKNUM
Strategy: Softmax & Hierarchical Softmax softmax softmax softmax the the word h t 𝑢𝑧𝑞𝑓 s cat cat numeral mat mat 𝑥𝑝𝑠𝑒𝑡 sat sat UNK UNK 𝒒(𝒐𝒗𝒏𝒇𝒔𝒃𝒎|𝒊 𝒖 ) h t V 2 • h-softmax 1 • digit-by-digit 1.7 • from PDF 2018 • etc. UNKNUM
Strategy: Digit-by-Digit Composition 𝑞 2.1 = 𝑞 2 𝑞 . |2 𝑞 1 2. )𝑞 𝐹𝑃𝑇 2.1) d-RNN 2 . 1 EOS ℎ 𝑢 SOS 2 . 1
Strategy: Digit-by-Digit Composition 𝑞 2.1 = 𝑞 2 𝑞 . |2 𝑞 1 2. )𝑞 𝐹𝑃𝑇 2.1) UNKNUM d-RNN 2 . 1 EOS 1.99 ℎ 𝑢 1.98 2.02 1.97 1.96 2.01 1.95 2.00 1.94 SOS 2 . 1
Strategy: from continuous PDF 𝑞 𝐨𝐯𝐧𝐟𝐬𝐛𝐦 = 2.1 = 𝑞 𝑸𝑵𝑮 2.05 < 𝐨𝐯𝐧𝐜𝐟𝐬 < 2.15 |precision = 1 × 𝑞 precision = 1 2.5 2 1.5 PDF 1 0.5 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5
Strategy: from continuous PDF 𝑞 𝐨𝐯𝐧𝐟𝐬𝐛𝐦 = 2.1 = 𝑞 𝑸𝑵𝑮 2.05 < 𝐨𝐯𝐧𝐜𝐟𝐬 < 2.15 |precision = 1 × 𝑞 precision = 1 2.5 MoG Frozen 𝜈 s and 𝜏 s 2 softmax 𝑑𝑝𝑛𝑞𝑝𝑜𝑓𝑜𝑢 1.5 PDF 1 h t 0.5 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 0 𝑞 precision = 𝑞 𝑆𝑂𝑂 𝑒𝑒𝑒𝑒 𝐹𝑃𝑇 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5
Overview of Strategies softmax 2 1.7 h-softmax 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡 2018 2 . 1 <EOS> UNKNUM d-RNN <SOS> 2 . 1 MoG PDF
Overview of Strategies softmax 2 combination 1.7 h-softmax 𝑜𝑣𝑛𝑓𝑠𝑏𝑚𝑡 2018 softmax 𝑡𝑢𝑠𝑏𝑢𝑓𝑗𝑓𝑡 2 . 1 <EOS> UNKNUM h t d-RNN <SOS> 2 . 1 MoG PDF
Results: Language Modelling (1) 8.91 10 Clinical 6.05 5.88 5.88 5.82 5 All Tokens 0 softmax h-softmax d-RNN MoG combination 10 Perplexity Adjusted 5.99 4.96 4.95 4.99 4.96 Words 5 0 softmax h-softmax d-RNN MoG combination 58,443.72 1000 495.95 Numerals 500 263.22 226.46 197.59 0 softmax h-softmax d-RNN MoG combination (lower is better)
Results: Language Modelling (2) 80.62 Scientific 100 54.8 54.37 53.7 53.03 50 All Tokens 0 100 softmax h-softmax d-RNN MoG combination 51.83 49.81 48.89 48.97 48.25 Perplexity Adjusted 50 Words 0 softmax h-softmax d-RNN MoG combination 3,505,856.25 1000 683.16 550.98 519.8 520.95 500 Numerals 0 softmax h-softmax d-RNN MoG combination (lower is better)
Results: Number Prediction numeral number 𝑁𝐵𝑄𝐹 = 𝑞𝑠𝑓𝑒𝑗𝑑𝑢𝑗𝑝𝑜 − 𝑢𝑏𝑠𝑓𝑢 × 100% `2.1’ 𝑢𝑏𝑠𝑓𝑢 2.1 Clinical 2353.11 747 800 622 552 514 600 426 348 400 200 0 (lower is mean median softmax h-softmax d-RNN MoG combination better)
Results: Number Prediction Scientific 1e23 8039 2333 2500 1947 2000 1652 1287 1500 1000 590 500 0 mean softmax d-RNN combination Clinical 2353.11 747 800 622 552 514 600 426 348 400 200 0 (lower is mean median softmax h-softmax d-RNN MoG combination better)
Softmax versus Hierarchical Softmax 1 2 3 4 … 100 101 … 2012 2013 2013 2012 … 101 100 … 4 3 2 1 cosine similarities cosine similarities softmax h-softmax
Analysis: d-RNN and Benford’s Law 0 1 2 3 4 5 6 7 8 9 . EOS EOS . 9 8 7 6 5 4 3 2 1 0 cosine similarities d-RNN
Analysis: d-RNN and Benford’s Law 1st digit 4th digit 0 1 2 3 4 5 6 7 8 9 . EOS 30 30 Clinical EOS . 9 8 7 6 5 4 3 2 1 0 20 20 10 10 0 0 1st digit 4th digit 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 30 30 Scientific 20 20 10 10 0 0 cosine similarities 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 d-RNN d-RNN Benford
Analysis: Model Predictions h-softmax d-RNN ‘… ejective fraction : ____ % ...’ MoG
Analysis: Strategy Selection 4 out of 17 segments Small integers, Enhancement > 25 % percentiles, h-softmax years Li et al. 2003 softmax 𝑡𝑢𝑠𝑏𝑢𝑓𝑗𝑓𝑡 measured 32 x 31 mm 2-digit integers, h t d-RNN NGC 6334 stars some ids Ejective fraction: 27.00 % reals, MoG Ejective fraction: 35.00 % some ids HIP 12961 and GL 676
Conclusion (1) Are existing LMs numerate? softmax the cat mat UNK h t 2 3.14 2018 UNKNUM
Conclusion (1) Are existing LMs numerate? ‘John’s height is ___ ’ softmax 0 1 2 the 25 3 cat 999 mat UNKNUM UNK 3.14 h t 2 2018 3.14 50 2018 UNKNUM
Conclusion (2) How to improve the numeracy of LMs? combination h-softmax softmax 𝑡𝑢𝑠𝑏𝑢𝑓𝑗𝑓𝑡 h t d-RNN MoG
Conclusion (2) How to improve ‘John’s height is ___ ’ the numeracy of LMs? combination 2.1 h-softmax 1.8 softmax 𝑡𝑢𝑠𝑏𝑢𝑓𝑗𝑓𝑡 1.73 h t d-RNN 2 MoG
0 200 1 2 3.14 25 0 ℤ … 3 999 -1 ℝ 7 Thank you! 2.1 1.8 UNKNUM 3.14 2018 1.73 𝟑 50 0 1+2i 𝟏. ഥ ℕ 𝟘 ℂ 2 1 2 ℚ −𝟐 5/8 2/3
Recommend
More recommend