Learning Sets of Rules • Sequential covering algorithms • FOIL • Induction as inverse of deduction • Inductive Logic Programming Web resources: • http://web.comlab.ox.ac.uk/oucl/research/areas/ machlearn/ilp.html • http://www-ai.ijs.si/ ∼ ilpnet2 1
Learning Disjunctive Sets of Rules Method 1: Learn decision tree, convert to rules Method 2: Sequential covering algorithm: 1. Learn one rule with high accuracy, any coverage 2. Remove positive examples covered by this rule 3. Repeat 2
Sequential Covering Algorithm Sequential-covering ( Target attribute , Attributes, Examples, Threshold ) • Learned rules ← {} • Rule ← learn-one- rule ( Target attribute, Attributes, Examples ) • while performance ( Rule, Examples ) > Threshold , do – Learned rules ← Learned rules + Rule – Examples ← Examples − { examples correctly classified by Rule } – Rule ← learn-one- rule ( Target attribute, Attributes, Examples ) • Learned rules ← sort Learned rules according to performance over Examples • return Learned rules 3
Learn-One-Rule IF THEN PlayTennis=yes IF Wind=weak THEN PlayTennis=yes ... IF Wind=strong IF Humidity=high THEN PlayTennis=no THEN PlayTennis=no IF Humidity=normal THEN PlayTennis=yes IF Humidity=normal Wind=weak THEN PlayTennis=yes ... IF Humidity=normal IF Humidity=normal Wind=strong Outlook=rain IF Humidity=normal THEN PlayTennis=yes THEN PlayTennis=yes Outlook=sunny THEN PlayTennis=yes 4
Learn-One-Rule • Pos ← positive Examples • Neg ← negative Examples • while Pos , do Learn a NewRule – NewRule ← most general rule possible – NewRuleNeg ← Neg – while NewRuleNeg , do Add a new literal to specialize NewRule 1. Candidate literals ← generate candidates 2. Best literal ← argmax L ∈ Candidate literals Performance ( SpecializeRule ( NewRule, L )) 3. add Best literal to NewRule preconditions 4. NewRuleNeg ← subset of NewRuleNeg that satisfies NewRule preconditions – Learned rules ← Learned rules + NewRule – Pos ← Pos − { members of Pos covered by NewRule } • Return Learned rules 5
Subtleties: Learn One Rule 1. May use beam search 2. Easily generalizes to multi-valued target functions 3. Choose evaluation function to guide search: • Entropy (i.e., information gain) • Sample accuracy: n c n where n c = correct rule predictions, n = all predictions • m estimate: n c + mp n + m 6
Variants of Rule Learning Programs • Sequential or simultaneous covering of data? • General → specific, or specific → general? • Generate-and-test, or example-driven? • Whether and how to post-prune? • What statistical evaluation function? 7
Learning First Order Rules Why do that? • Can learn sets of rules such as Ancestor ( x, y ) ← Parent ( x, y ) Ancestor ( x, y ) ← Parent ( x, z ) ∧ Ancestor ( z, y ) • General purpose programming language Prolog : programs are sets of such rules 8
First Order Rule for Classifying Web Pages [Slattery, 1997] course(A) ← has-word(A, instructor), Not has-word(A, good), link-from(A, B), has-word(B, assign), Not link-from(B, C) Train: 31/31, Test: 31/34 9
FOIL • First-Order Induction of Logic (FOIL) • Learns Horn clauses without functions • Allows negated literals in rule body • Sequential covering algorithm – Greedy, hill-climbing approach – Seeks only rules for predicting True • Each new rule generalizes overall concept (S → G) • Each added conjunct specializes rule (G → S) 10
FOIL ( Target predicate, Predicates, Examples ) • Pos ← positive Examples • Neg ← negative Examples • while Pos , do Learn a NewRule – NewRule ← most general rule possible – NewRuleNeg ← Neg – while NewRuleNeg , do Add a new literal to specialize NewRule 1. Candidate literals ← generate candidates 2. Best literal ← argmax L ∈ Candidate literals Foil Gain ( L, NewRule ) 3. add Best literal to NewRule preconditions 4. NewRuleNeg ← subset of NewRuleNeg that satisfies NewRule preconditions – Learned rules ← Learned rules + NewRule – Pos ← Pos − { members of Pos covered by NewRule } • Return Learned rules 11
Specializing Rules in FOIL Learning rule: P ( x 1 , x 2 , . . . , x k ) ← L 1 . . . L n Candidate specializations add new literal of form: • Q ( v 1 , . . . , v r ), where at least one of the v i in the created literal must already exist as a variable in the rule. • Equal ( x j , x k ), where x j and x k are variables already present in the rule • The negation of either of the above forms of literals 12
Information Gain in FOIL p 1 p 0 Foil Gain ( L, R ) ≡ t log 2 − log 2 p 1 + n 1 p 0 + n 0 Where • L is the candidate literal to add to rule R • p 0 = number of positive bindings of R • n 0 = number of negative bindings of R • p 1 = number of positive bindings of R + L • n 1 = number of negative bindings of R + L • t is the number of positive bindings of R also covered by R + L Note p 0 • − log 2 p 0 + n 0 is optimal number of bits to indicate the class of a positive binding covered by R 13
FOIL Example 7 8 6 0 2 3 4 5 1 LinkedTo(x,y) x y represents Instances: • pairs of nodes, e.g � 1 , 5 � , with graph described by literals LinkedTo(0,1), ¬ LinkedTo(0,8) etc. Target function: • CanReach(x,y) true iff directed path from x to y Hypothesis space: • Each h ∈ H is a set of horn clauses using predicates LinkedTo (and CanReach ) 14
Induction as Inverted Deduction Induction is finding h such that ( ∀� x i , f ( x i ) � ∈ D ) B ∧ h ∧ x i ⊢ f ( x i ) where • x i is i th training instance • f ( x i ) is the target function value for x i • B is other background knowledge So let’s design inductive algorithm by inverting operators for automated deduction! 15
Induction as Inverted Deduction “pairs of people, � u, v � such that child of u is v ,” f ( x i ) : Child ( Bob, Sharon ) x i : Male ( Bob ) , Female ( Sharon ) , Father ( Sharon, Bob ) B : Parent ( u, v ) ← Father ( u, v ) What satisfies ( ∀� x i , f ( x i ) � ∈ D ) B ∧ h ∧ x i ⊢ f ( x i )? h 1 : Child ( u, v ) ← Father ( v, u ) h 2 : Child ( u, v ) ← Parent ( v, u ) 16
Induction is, in fact, the inverse operation of deduction, and cannot be conceived to exist without the corresponding operation, so that the question of relative importance cannot arise. Who thinks of asking whether addition or subtraction is the more important process in arithmetic? But at the same time much difference in difficulty may exist between a direct and inverse operation; . . . it must be allowed that inductive investigations are of a far higher degree of difficulty and complexity than any questions of deduction . . . . (Jevons 1874) 17
Induction as Inverted Deduction We have mechanical deductive operators F ( A, B ) = C , where A ∧ B ⊢ C need inductive operators O ( B, D ) = h where ( ∀� x i , f ( x i ) � ∈ D ) ( B ∧ h ∧ x i ) ⊢ f ( x i ) 18
Induction as Inverted Deduction Positives: • Subsumes earlier idea of finding h that “fits” training data • Domain theory B helps define meaning of “fit” the data B ∧ h ∧ x i ⊢ f ( x i ) • Suggests algorithms that search H guided by B 19
Induction as Inverted Deduction Negatives: • Doesn’t allow for noisy data. Consider ( ∀� x i , f ( x i ) � ∈ D ) ( B ∧ h ∧ x i ) ⊢ f ( x i ) • First order logic gives a huge hypothesis space H → overfitting... → intractability of calculating all acceptable h ’s 20
Deduction: Resolution Rule ∨ L P ¬ L ∨ R ∨ R P 1. Given initial clauses C 1 and C 2 , find a literal L from clause C 1 such that ¬ L occurs in clause C 2 2. Form the resolvent C by including all literals from C 1 and C 2 , except for L and ¬ L . More precisely, the set of literals occurring in the conclusion C is C = ( C 1 − { L } ) ∪ ( C 2 − {¬ L } ) where ∪ denotes set union, and “ − ” denotes set difference. 21
Inverting Resolution V C : KnowMaterial Study C : V KnowMaterial Study 2 2 C : V V PassExam KnowMaterial C : PassExam KnowMaterial 1 1 V C: PassExam Study V C: PassExam Study 22
Inverted Resolution (Propositional) 1. Given initial clauses C 1 and C , find a literal L that occurs in clause C 1 , but not in clause C . 2. Form the second clause C 2 by including the following literals C 2 = ( C − ( C 1 − { L } )) ∪ {¬ L } 23
First order resolution First order resolution: 1. Find a literal L 1 from clause C 1 , literal L 2 from clause C 2 , and substitution θ such that L 1 θ = ¬ L 2 θ 2. Form the resolvent C by including all literals from C 1 θ and C 2 θ , except for L 1 θ and ¬ L 2 θ . More precisely, the set of literals occurring in the conclusion C is C = ( C 1 − { L 1 } ) θ ∪ ( C 2 − { L 2 } ) θ 24
Inverting First order resolution C 2 = ( C − ( C 1 − { L 1 } ) θ 1 ) θ − 1 ∪ {¬ L 1 θ 1 θ − 1 2 } 2 25
Cigol Father Tom, Bob ( ) GrandChild y,x ( ) V Father x,z ( ) V Father z,y ( ) { Bob/y, Tom/z } Father Shannon, Tom ( ) GrandChild Bob,x ( ) V Father x,Tom ( ) { Shannon/x } GrandChild Bob, Shannon ( ) 26
Recommend
More recommend