10b Machine Learning: Symbol-based 10.0 Introduction 10.5 Knowledge and Learning 10.1 A Framework for 10.6 Unsupervised Learning Symbol-based Learning 10.7 Reinforcement Learning 10.2 Version Space Search 10.8 Epilogue and 10.3 The ID3 Decision Tree References Induction Algorithm 10.9 Exercises 10.4 Inductive Bias and Learnability Additional references for the slides: Jean-Claude Latombe’s CS121 slides: robotics.stanford.edu/~latombe/cs121 1
Decision Trees • A decision tree allows a classification of an object by testing its values for certain properties • check out the example at: www.aiinc.ca/demos/whale.html • The learning problem is similar to concept learning using version spaces in the sense that we are trying to identify a class using the observable properties. • It is different in the sense that we are trying to learn a structure that determines class membership after a sequence of questions. This structure is a decision tree. 2
Reverse engineered decision tree of the whale watcher expert system see flukes? no yes see dorsal fin? no (see next page) yes size? size med? vlg med yes no blue blow whale forward? blows? Size? yes no 1 2 lg vsm sperm humpback gray bowhead right narwhal whale whale whale whale whale whale 3
Reverse engineered decision tree of the whale watcher expert system (cont’d) see flukes? no yes see dorsal fin? no (see previous page) yes blow?no yes size? lg sm dorsal fin and dorsal fin blow visible tall and pointed? at the same time? yes no yes no killer northern sei fin whale bottlenose whale whale whale 4
What might the original data look like? Place Time Group Fluke Dorsal Dorsal Size Blow … Blow Type fin shape fwd Kaikora 17:00 Yes Yes Yes small Very Yes No Blue whale triang. large Kaikora 7:00 No Yes Yes small Very Yes No Blue whale triang. large Kaikora 8:00 Yes Yes Yes small Very Yes No Blue whale triang. large Kaikora 9:00 Yes Yes Yes squat Medium Yes Yes Sperm triang. whale Cape 18:00 Yes Yes Yes Irregu- Medium Yes No Hump-back Cod lar whale Cape 20:00 No Yes Yes Irregu- Medium Yes No Hump-back Cod lar whale Newb. 18:00 No No No Curved Large Yes No Fin Port whale Cape 6:00 Yes Yes No None Medium Yes No Right Cod whale … 5
The search problem Given a table of observable properties, search for a decision tree that • correctly represents the data (assuming that the data is noise-free), and • is as small as possible. What does the search tree look like? 6
Comparing VSL and learning DTs A hypothesis learned in NUM? VSL can be represented as a decision tree. True False Consider the predicate BLACK? False that we used as a VSL False True example: NUM(r) ∧ BLACK(s) ⇔ True False REWARD([r,s]) The decision tree on the right represents it: 7
Predicate as a Decision Tree The predicate CONCEPT(x) ⇔ A(x) ∧ ( ¬ B(x) v C(x)) can be represented by the following decision tree: Example: A? A mushroom is poisonous iff True False it is yellow and small, or yellow, big and spotted False • x is a mushroom B? False • CONCEPT = POISONOUS True • A = YELLOW • B = BIG True C? • C = SPOTTED True False • D = FUNNEL-CAP • E = BULKY True False 8
Training Set Ex. # A B C D E CONCEPT 1 False False True False True False 2 False True False False False False 3 False True True True True False 4 False False True False False False 5 False False False True True False 6 True False True False False True 7 True False False True False True 8 True False True False True True 9 True True True False True True 10 True True True True True True 11 True True False False False False 12 True True False False True False 13 True False True True True True 9
Possible Decision Tree D T F Ex. # A B C D E CONCEPT E C 1 False False True False True False 2 False True False False False False 3 False True True True True False 4 False False True False False False T F A B 5 False False False True True False 6 True False True False False True 7 True False False True False True 8 True False True False True True T F T 9 True True True False True True E 10 True True True True True True 11 True True False False False False 12 True True False False True False 13 True False True True True True A A F T T F 10
Possible Decision Tree D CONCEPT ⇔ T F (D ∧ ( ¬ E v A)) v (C ∧ (B v ((E ∧ ¬ ∧ ¬ A) v A))) E C CONCEPT ⇔ A ∧ ( ¬ B v C) A? T F A B True False B? False T F T False True E C? True True False A A True False KIS bias � Build smallest decision tree F T T F Computationally intractable problem � greedy algorithm 11
Getting Started The distribution of the training set is: True: 6, 7, 8, 9, 10,13 False: 1, 2, 3, 4, 5, 11, 12 12
Getting Started The distribution of training set is: True: 6, 7, 8, 9, 10,13 False: 1, 2, 3, 4, 5, 11, 12 Without testing any observable predicate, we could report that CONCEPT is False (majority rule) with an estimated probability of error P(E) = 6/13 13
Getting Started The distribution of training set is: True: 6, 7, 8, 9, 10,13 False: 1, 2, 3, 4, 5, 11, 12 Without testing any observable predicate, we could report that CONCEPT is False (majority rule) with an estimated probability of error P(E) = 6/13 Assuming that we will only include one observable predicate in the decision tree, which predicate should we test to minimize the probability of error? 14
How to compute the probability of error A F T 6, 7, 8, 9, 10, 13 True: 1, 2, 3, 4, 5 False: 11, 12 If we test only A, we will report that CONCEPT is True if A is True (majority rule) and False otherwise. The estimated probability of error is: Pr(E) = (8/13)x(2/8) + (5/8)x0 = 2/13 8/13 is the probability of getting True for A, and 2/8 is the probability that the report was incorrect 15 (we are always reporting True for the concept).
How to compute the probability of error A F T 6, 7, 8, 9, 10, 13 True: 1, 2, 3, 4, 5 False: 11, 12 If we test only A, we will report that CONCEPT is True if A is True (majority rule) and False otherwise. The estimated probability of error is: Pr(E) = (8/13)x(2/8) + (5/8)x0 = 2/13 5/8 is the probability of getting False for A, and 0 is the probability that the report was incorrect 16 (we are always reporting False for the concept).
Assume It’s A A F T 6, 7, 8, 9, 10, 13 True: 1, 2, 3, 4, 5 False: 11, 12 If we test only A, we will report that CONCEPT is True if A is True (majority rule) and False otherwise The estimated probability of error is: Pr(E) = (8/13)x(2/8) + (5/8)x0 = 2/13 17
Assume It’s B B F T 9, 10 True: 6, 7, 8, 13 1, 4, 5 False: 2, 3, 11, 12 If we test only B, we will report that CONCEPT is False if B is True and True otherwise The estimated probability of error is: Pr(E) = (6/13)x(2/6) + (7/13)x(3/7) = 5/13 18
Assume It’s C C F T 6, 8, 9, 10, 13 True: 7 1, 5, 11, 12 False: 1, 3, 4 If we test only C, we will report that CONCEPT is True if C is True and False otherwise The estimated probability of error is: Pr(E) = (8/13)x(3/8) + (5/13)x(1/5) = 4/13 19
Assume It’s D D F T 7, 10, 13 True: 6, 8, 9 1, 2, 4, 11, 12 False: 3, 5 If we test only D, we will report that CONCEPT is True if D is True and False otherwise The estimated probability of error is: Pr(E) = (5/13)x(2/5) + (8/13)x(3/8) = 5/13 20
Assume It’s E E F T 8, 9, 10, 13 True: 6, 7 2, 4, 11 False: 1, 3, 5, 12 If we test only E we will report that CONCEPT is False, independent of the outcome The estimated probability of error is: Pr(E) = (8/13)x(4/8) + (5/13)x(2/5) = 6/13 21
Pr(error) for each • If A: 2/13 • If B: 5/13 • If C: 4/13 • If D: 5/13 • If E: 6/13 So, the best predicate to test is A 22
Choice of Second Predicate A F T False C F T 6, 8, 9, 10, 13 True: 7 11, 12 False: The majority rule gives the probability of error Pr(E|A) = 1/8 and Pr(E) = 1/13 23
Choice of Third Predicate A F T False C F T True B T F True: 7 False: 11,12 24
A? Final Tree True False B? False False True C? True True False True False A True False False C False True B True True False False True L ≡ CONCEPT ⇔ A ∧ (C v ¬ B) 25
Learning a decision tree Function induce_tree (example_set, properties) begin if all entries in example_set are in the same class then return a leaf node labeled with that class else if properties is empty then return leaf node labeled with disjunction of all classes in example_set else begin select a property, P, and make it the root of the current tree; delete P from properties; for each value, V, of P begin create a branch of the tree labeled with V; let partition v be elements of example_set with values V for property P; call induce_tree (partition v , properties), attach result to branch V end end If property V is Boolean: the partition will contain two 26 sets, one with property V true and one with false end
Recommend
More recommend