Machine Learning on Knowledge Bases Marcus Pfeiffer 19th June 2018

Introduction Data Preprocessing Machine Learning Algorithms for Premise Selection Premise Selectors in Detail Comparison and Outlook 1

Introduction

❼ ❼ ❼ ❼ ❼ ❼ ❼ ❼ ❼ ❼ ❼ Knowledge Bases and Provers ❼ Knowledge Bases 2

❼ ❼ ❼ ❼ ❼ ❼ ❼ ❼ ❼ ❼ Knowledge Bases and Provers ❼ Knowledge Bases ❼ Mizar Mathematical Library (MML) 2

❼ ❼ ❼ ❼ ❼ ❼ ❼ ❼ ❼ Knowledge Bases and Provers ❼ Knowledge Bases ❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL 2

❼ ❼ ❼ ❼ ❼ ❼ ❼ Knowledge Bases and Provers ❼ Knowledge Bases ❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ SUMO ❼ Cyc 2

❼ ❼ ❼ ❼ ❼ ❼ Knowledge Bases and Provers ❼ Knowledge Bases ❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ SUMO ❼ Cyc ❼ Automatic Theorem Provers 2

❼ Knowledge Bases and Provers ❼ Knowledge Bases ❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ SUMO ❼ Cyc ❼ Automatic Theorem Provers ❼ E ❼ Vampire ❼ CVC4 ❼ SPASS ❼ Z3 2

Knowledge Bases and Provers ❼ Knowledge Bases ❼ Mizar Mathematical Library (MML) ❼ Library of Isabelle/HOL ❼ SUMO ❼ Cyc ❼ Automatic Theorem Provers ❼ E ❼ Vampire ❼ CVC4 ❼ SPASS ❼ Z3 ❼ Integration of Automatic Provers in Interactive Provers 2

Premise Selection problem Definition (Premise Selection problem) ❼ ATP A ❼ large number of premises P ❼ conjecture c 3

Premise Selection problem Definition (Premise Selection problem) ❼ ATP A ❼ large number of premises P ❼ conjecture c Predict those premises from P that are likely of use to A for constructing a proof for c 3

Introduction Example map f xs = ys � ⇒ zip ( rev xs )( rev ys ) = rev ( zip xs ys ) 4

Introduction Example map f xs = ys � ⇒ zip ( rev xs )( rev ys ) = rev ( zip xs ys ) A proof can be found including the following two lemmas: length xs = length ys � ⇒ zip ( rev xs )( rev ys ) = rev ( zip xs ys ) (1) length ( map f xs ) = length xs (2) 4

Introduction Example used [] = used evs 5

Introduction Example used [] = used evs A straightforward proof uses the following lemmas: used [] = ⋃ parts ( initState B ) (3) B X ∈ parts ( initState B ) � ⇒ X ∈ used evs (4) ( ⋀ x . x ∈ A � ⇒ x ∈ B ) � ⇒ A ⊆ B (5) b ∈ ⋃ → ∃ x ∈ A . b ∈ Bx Bx ← (6) x ∈ A 5

Data Preprocessing

Dependency Definition (Dependency) A definition or theorem T depends on some definition, lemma or theorem T ′ , if T ′ is needed for the proof of T 6

Theory dependencies graph of Free Groups [Pure] [HOL] [HOL-Library] [HOL-Cardinals] [HOL-Computational_Algebra] [HOL-Proofs] [HOL-Algebra] [HOL-Analysis] [HOL-Nonstandard_Analysis] [HOL-Proofs-Lambda] C2 Generators [HOL-Probability] Cancelation UnitGroup FreeGroups [Applicative_Lifting] Isomorphisms PingPongLemma 7

Example of feature extraction For given depth 2, g ( h x a ) with x ∶∶ τ we get the following structural features: 8

Example of feature extraction For given depth 2, g ( h x a ) with x ∶∶ τ we get the following structural features: g ( h ) x a g h h x h a h x a 8

Example of feature extraction For given depth 2, g ( h x a ) with x ∶∶ τ we get the following structural features: g ( h ) x a g h h x h a h x a which are simplified to g ( h ) a g τ h ( τ ) h ( a ) h ( τ, a ) h 8

Another example of feature extraction transpose(map(map f) xss) = map(map f)(transpose xss) has the features: map map(list list) fun map(fun) map(map, list list) list map(map) transpose list list map(transpose) transpose(map) List map(map, transpose) transpose(list list) 9

Stored Information ( c , usedPremises ( c ) , F ( c )) 10

Machine Learning Algorithms for Premise Selection

k-Nearest-Neighbors ”Nearness” of two formulas a and b in terms of shared features: 11

k-Nearest-Neighbors ”Nearness” of two formulas a and b in terms of shared features: n ( a , b ) = w ( t ) τ 1 ∑ (7) t ∈ F ( a )∩ F ( b ) 11

k-Nearest-Neighbors ”Nearness” of two formulas a and b in terms of shared features: n ( a , b ) = w ( t ) τ 1 ∑ (7) t ∈ F ( a )∩ F ( b ) N ∶ = k nearest neighbors of c 11

k-Nearest-Neighbors ”Nearness” of two formulas a and b in terms of shared features: n ( a , b ) = w ( t ) τ 1 ∑ (7) t ∈ F ( a )∩ F ( b ) N ∶ = k nearest neighbors of c Relevance c ( p ) = ⎧ ⎪ ⎛ ⎞ n ( p , c ) if p ∈ N n ( q , c ) ⎪ ⎨ ⎪ ∑ ∣ usedPremises ( q )∣ ⎝ τ 2 ⎠ + ⎪ ⎩ 0 otherwise q ∈ N ∣ p ∈ usedPremises ( q ) (8) 11

Naive Bayes P ( p is used in the proof of c ) (9) 12

Naive Bayes P ( p is used in the proof of c ) (9) ≈ P ( p is used to prove c’ ∣ c’ has features F(c) ) (10) 12

Naive Bayes P ( p is used in the proof of c ) (9) ≈ P ( p is used to prove c’ ∣ c’ has features F(c) ) (10) P ( p is used in the proof of c’ ) P ( c’ has feature t ∣ p is used in the proof of c’ ) ∏ ⋅ t ∈ F ( c )∩ ¯ F ( p ) P ( p has feature t ∣ p is not used in the proof of c’ ) ∏ ⋅ t ∈ F ( c )− ¯ F ( p ) P ( c’ does not have feature t ∣ t is used in the proof of c’ ) ∏ ⋅ t ∈ ¯ F ( p )− F ( c ) (11) 12

❼ ❼ Naive Bayes ❼ r ( q ) = number of times a fact q occurs as a dependency 13

❼ Naive Bayes ❼ r ( q ) = number of times a fact q occurs as a dependency ❼ s ( q , t ) = number of times a fact q occurs as a dependency of a fact described by feature t. 13

Naive Bayes ❼ r ( q ) = number of times a fact q occurs as a dependency ❼ s ( q , t ) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼ K = total number of known proofs. 13

Naive Bayes ❼ r ( q ) = number of times a fact q occurs as a dependency ❼ s ( q , t ) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼ K = total number of known proofs. P ( p is used in the proof of c’ ) = r ( p ) (12) K 13

Naive Bayes ❼ r ( q ) = number of times a fact q occurs as a dependency ❼ s ( q , t ) = number of times a fact q occurs as a dependency of a fact described by feature t. ❼ K = total number of known proofs. P ( p is used in the proof of c’ ) = r ( p ) (12) K P ( c’ has feature t ∣ p is used in the proof of c’ ) = s ( p , t ) r ( p ) (13) 13

Naive Bayes ❼ r ( q ) = number of times a fact q occurs as a dependency ❼ s ( q , t ) = number of times a fact q occurs as a dependency of a fact described with feature t. ❼ K = total number of known proofs. w ( f ) ln σ 2 s ( p , t ) Relevance c ( p ) = σ 1 ln ( r ( p )) + ∑ r ( p ) t ∈ F ( c ′ )∩ ¯ F ( p ) w ( f ) ln ( 1 − s ( p , t ) r ( t ) ) + σ 4 w ( f ) + σ 3 ∑ ∑ t ∈ ¯ t ∈ F ( c )− ¯ F ( p )− F ( c ) F ( p ) 14

Premise Selectors in Detail

MePo For a set K of known symbols let k = number of known symbols occurring in the fact u = number of unknown symbols occurring in the fact 15

MePo For a set K of known symbols let k = number of known symbols occurring in the fact u = number of unknown symbols occurring in the fact 1. For each fact, compute as its score k /( k + u ) 2. Select the facts with the highest score; add the features of the selected facts to the set of known symbols K. 15

❼ ❼ MaSh, MeSh ❼ MaSh: Machine Learning for Sledgehammer with k-NN and Naive Bayes 16

❼ MaSh, MeSh ❼ MaSh: Machine Learning for Sledgehammer with k-NN and Naive Bayes ❼ MeSh: Combination of MaSh and a MePo like selector 16

MaSh, MeSh ❼ MaSh: Machine Learning for Sledgehammer with k-NN and Naive Bayes ❼ MeSh: Combination of MaSh and a MePo like selector ❼ In practice: Combined with a proximity selector 16

Comparison and Outlook

Metrics for p 1 ,... p n selected by selection algorithm Full recall: k ∈ N usedPremises ( c ) ⊂ { p 1 ,..., p k } min 17

Metrics for p 1 ,... p n selected by selection algorithm Full recall: k ∈ N usedPremises ( c ) ⊂ { p 1 ,..., p k } min k-Coverage ∣{ p 1 ,..., p k } ∩ usedPremises ( c )∣ min { k , ∣ usedPremises ( c )∣} 17

Comparison in Metric MaSh MeSh Formalization MePo NB kNN NB kNN Auth 647 104 143 96 112 IsaFoR 1332 513 604 517 570 Jinja 839 244 306 229 256 List 1083 234 263 259 271 Nominal2 1045 220 276 229 264 Probability 1324 424 422 393 395 Figure 1: Average full recall 18

Comparison in Metric Figure 2: k-Coverage for IsaFoR 19

Comparison in Performance Figure 3: Success rates 20

Machine Learning on Knowledge Bases Marcus Pfeiffer 19th June 2018 - PowerPoint PPT Presentation

Machine Learning on Knowledge Bases Marcus Pfeiffer 19th June 2018 Introduction Data Preprocessing Machine Learning Algorithms for Premise Selection Premise Selectors in Detail Comparison and Outlook 1 Introduction

Chemistry 2000 Slide Set 20: Organic bases Marc R. Roussel March 26, 2020 Chemistry 2000 Slide

Finite Projective Planes http://math.uwyo.edu/moorhouse/pub/planes/ Eric Moorhouse Mutually

Knowledge-Based Reasoning in Computer Vision CSC 2539 Paul Vicol Outline Knowledge Bases

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

26:198:722 Expert Systems I Knowledge representation I Knowledge acquisition I Machine learning I

Learning From/For Knowledge Bases Graham Neubig Site https://phontron.com/class/nn4nlp2019/

Learning From/For Knowledge Bases Graham Neubig Site https://phontron.com/class/nn4nlp2017/

Learning From/For Knowledge Bases Graham Neubig Site https://phontron.com/class/nn4nlp2019/

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Acids and Bases Slide 3 / 208 Table of Contents: Acids and Bases Click on the topic to go to

Acids and Bases Slide 3 / 208 Slide 4 / 208 Table of Contents: Acids and Bases Click on the

Acids and Bases List as many things that you can about acids or bases in 15 seconds. Share

Acids and Bases Slide 3 / 208 Table of Contents: Acids and Bases Click on the topic to go to

Thinking Like a Chemist About Acids and Bases UNIT 6 DAY 5 What are we going to learn today?

G -bases in free objects of Topological Algebra (Local) -bases in topological and uniform

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Complete Completion using Types and Weights Tihomir Gvero, Viktor Kuncak, Ivan Kuraj and Ruzica

What to check when subscribing to online services a privacy perspective. Does the service

Introduction to Choice Models Amanda Stathopoulos amanda.stathopoulos@epfl.ch Transport and

What is RCU, Fundamentally? By: Paul E. McKenney Jonathan Walpole Presenter: Dany Madden Agenda

What Parents Should Know About Assessments National PTA Webinar October 20, 2015 Lucille E. Davy

Web Security: Basic Web Security Model Spring 2016 Franziska (Franzi) Roesner

real estate syndicates real estate brokers/agents syndicators attorneys Investors property

Linking UK Government Data John Sheridan Jeni Tennison Open Government Data International