machine learning 2007 lecture 4 instructor tim van erven
play

Machine Learning 2007: Lecture 4 Instructor: Tim van Erven - PowerPoint PPT Presentation

Machine Learning 2007: Lecture 4 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/erven/teaching/0708/ml/ September 27, 2007 1 / 29 Overview Organisational Organisational Matters Matters An Unbiased Hypothesis


  1. Machine Learning 2007: Lecture 4 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/˜erven/teaching/0708/ml/ September 27, 2007 1 / 29

  2. Overview Organisational Organisational Matters ● Matters An Unbiased Hypothesis Space for ● L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Math: Directed Graphs and Trees ● Hypothesis Space: Decision Trees for Classification ● Decision Trees ID3 ✦ Hypothesis Space: Decision Trees Probability ✦ Method: ID3 Distributions Math: Probability Distributions ● 2 / 29

  3. Organisational Matters Organisational Course Organisation: Matters Biweekly exercises: you get a full week instead of 5 days. L IST -T HEN -E LIMINATE ● Directed Graphs and Exercise 2 available this evening. ● Trees Grades for Exercise 1 available this week. ● Hypothesis Space: Decision Trees Study Guide: ID3 Probability You don’t have to know the details of the ● Distributions C ANDIATE -E LIMINATION algorithm, just that it does the same thing as the L IST -T HEN -E LIMINATE algorithm. But sections 2.6 and 2.7 of Mitchell are very important! Just ● replace each occurrence of C ANDIATE -E LIMINATION by L IST -T HEN -E LIMINATE when reading them. This Lecture versus Mitchell: Decision trees are in Mitchell, but I will discuss the underlying ● mathematics in much more detail. 3 / 29

  4. Overview Organisational Organisational Matters ● Matters An Unbiased Hypothesis Space for ● L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Math: Directed Graphs and Trees ● Hypothesis Space: Decision Trees for Classification ● Decision Trees ID3 ✦ Hypothesis Space: Decision Trees Probability ✦ Method: ID3 Distributions Math: Probability Distributions ● 4 / 29

  5. L IST -T HEN -E LIMINATE Algorithm Description: Organisational Matters L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE finds the set, VersionSpace, of all ● Directed Graphs and Trees hypotheses that are consistent with all the training data. Hypothesis Space: It can only classify a new feature vector x if all the hypotheses ● Decision Trees in VersionSpace agree. ID3 Probability Distributions Hypothesis Space: H = {� ? , ? , ? , ? , ? , ? � , � Sunny , ? , ? , ? , ? , ? � , � Warm , ? , ? , ? , ? , ? � , . . . , �∅ , ∅ , ∅ , ∅ , ∅ , ∅�} Has a very strong representation bias : Only 973 out of ● 2 96 ≈ 10 29 possible hypotheses can be represented. 5 / 29

  6. An Unbiased Hypothesis Space All Possible Hypotheses: Organisational Matters Why not take all possible hypotheses as a hypothesis space for L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Hypothesis Space: H = { h | h is a function from X to Y} , Decision Trees ID3 where Probability Distributions X = set of possible feature vectors, ● Y = set of possible labels, ● |H| = |Y| |X| = 2 96 . ● 6 / 29

  7. An Unbiased Hypothesis Space All Possible Hypotheses: Organisational Matters Why not take all possible hypotheses as a hypothesis space for L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Hypothesis Space: H = { h | h is a function from X to Y} , Decision Trees ID3 where Probability Distributions X = set of possible feature vectors, ● Y = set of possible labels, ● |H| = |Y| |X| = 2 96 . ● Classifying a New Feature Vector: � y 1 � � y n � Given: data D = , . . . , . ● x 1 x n What happens if we try to classify a new feature vector x n +1 ? ● 6 / 29

  8. Classifying New Instances For any hypothesis h ∈ H , there exists a h ′ ∈ H such that Organisational Matters L IST -T HEN -E LIMINATE h ( x ) � = h ′ ( x ) if x = x n + 1 , Directed Graphs and Trees h ( x ) = h ′ ( x ) for any other x . Hypothesis Space: Decision Trees ID3 Probability Distributions 7 / 29

  9. Classifying New Instances For any hypothesis h ∈ H , there exists a h ′ ∈ H such that Organisational Matters L IST -T HEN -E LIMINATE h ( x ) � = h ′ ( x ) if x = x n + 1 , Directed Graphs and Trees h ( x ) = h ′ ( x ) for any other x . Hypothesis Space: Decision Trees Consequence: ID3 Probability Suppose x n +1 does not occur in D . ● Distributions Then for every h ∈ VersionSpace, there exists an alternative ● h ′ ∈ VersionSpace that disagrees on the label of x n +1 : h ( x n +1 ) � = h ′ ( x n +1 ) Conclusion: In an unbiased hypothesis space, the L IST -T HEN -E LIMINATE algorithm cannot generalise at all. Bias is unavoidable! 7 / 29

  10. Overview Organisational Organisational Matters ● Matters An Unbiased Hypothesis Space for ● L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Math: Directed Graphs and Trees ● Hypothesis Space: Decision Trees for Classification ● Decision Trees ID3 ✦ Hypothesis Space: Decision Trees Probability ✦ Method: ID3 Distributions Math: Probability Distributions ● 8 / 29

  11. Directed Graphs Organisational A directed graph G is an ordered pair G = ( V, E ) , where Matters V = { v 1 , . . . , v m } is a set of vertices / nodes ; ● L IST -T HEN -E LIMINATE E = { e 1 , . . . , e n } is a set of directed edges between the Directed Graphs and ● Trees vertices in V . Hypothesis Space: Decision Trees Each directed edge e from vertex u to vertex v is an ordered ● ID3 pair e = ( u, v ) . Probability I can draw the same directed graph in different ways. ● Distributions v1 v2 v3 v4 v7 v5 v6 9 / 29

  12. Directed Graphs Organisational A directed graph G is an ordered pair G = ( V, E ) , where Matters V = { v 1 , . . . , v m } is a set of vertices / nodes ; ● L IST -T HEN -E LIMINATE E = { e 1 , . . . , e n } is a set of directed edges between the Directed Graphs and ● Trees vertices in V . Hypothesis Space: Decision Trees Each directed edge e from vertex u to vertex v is an ordered ● ID3 pair e = ( u, v ) . Probability I can draw the same directed graph in different ways. ● Distributions v1 v1 v2 v3 v2 v3 v4 v7 v6 v7 v5 v6 v4 v5 9 / 29

  13. Directed Graphs with Edge Labels We can also label edges with labels from some set of Organisational ● Matters possible labels L . Now G = ( V, E, L ) . L IST -T HEN -E LIMINATE Each directed edge e with label l ∈ L from vertex u to vertex v ● Directed Graphs and Trees is an ordered pair e = ( u, l, v ) . Hypothesis Space: Decision Trees Example: ID3 Let L = { a, b, c } . Probability Distributions c v1 v2 a c v3 v4 b a a v5 v6 v7 10 / 29

  14. Tree Examples Organisational Example 1: Example 2: Example 3: Matters L IST -T HEN -E LIMINATE Directed Graphs and v1 v1 Trees a b Hypothesis Space: Decision Trees v1 v3 v2 v3 v2 v4 ID3 Probability Distributions Example 4: Example 5: ● In all examples the root of the tree v1 is v 1 . ● The nodes with- v1 out outgoing a b v2 v3 v8 edges (shown in v2 v3 red) are called v4 v5 b a leaves . ● The other nodes v4 v5 v6 v7 are called inter- nal nodes. 11 / 29

  15. Directed Trees A directed graph is a (directed) tree T = ( V, E ) with root v ∈ V if Organisational Matters and only if either: L IST -T HEN -E LIMINATE v is the only node: T = ( { v } , ∅ ) , or Directed Graphs and 1. Trees 2. T 1 , . . . , T k are trees with roots t 1 , . . . , t k , ● Hypothesis Space: Decision Trees v , T 1 , . . . , T k have no nodes in common, and ● ID3 T looks like: ● Probability Distributions v t1 tk Tk T1 12 / 29

  16. Properties of Trees Let T be a (directed) tree. Organisational Matters If T contains an edge e = ( u, v ) from node u to node v , then ● L IST -T HEN -E LIMINATE Directed Graphs and ✦ u is called the parent of v , Trees v is called the child of u . Hypothesis Space: ✦ Decision Trees ID3 Probability Distributions 13 / 29

  17. Properties of Trees Let T be a (directed) tree. Organisational Matters If T contains an edge e = ( u, v ) from node u to node v , then ● L IST -T HEN -E LIMINATE Directed Graphs and ✦ u is called the parent of v , Trees v is called the child of u . Hypothesis Space: ✦ Decision Trees ID3 Number of Parents: Probability Distributions Each node has exactly one parent, except for the root , which ● has no parents. 13 / 29

  18. Properties of Trees Let T be a (directed) tree. Organisational Matters If T contains an edge e = ( u, v ) from node u to node v , then ● L IST -T HEN -E LIMINATE Directed Graphs and ✦ u is called the parent of v , Trees v is called the child of u . Hypothesis Space: ✦ Decision Trees ID3 Number of Parents: Probability Distributions Each node has exactly one parent, except for the root , which ● has no parents. Number of Children: Each node may have any (finite) number of children. ● The leaves are the nodes without children. ● The internal nodes have at least one child. ● 13 / 29

  19. Overview Organisational Organisational Matters ● Matters An Unbiased Hypothesis Space for ● L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Math: Directed Graphs and Trees ● Hypothesis Space: Decision Trees for Classification ● Decision Trees ID3 ✦ Hypothesis Space: Decision Trees Probability ✦ Method: ID3 Distributions Math: Probability Distributions ● 14 / 29

Recommend


More recommend