Machine Learning 2007: Lecture 4 Instructor: Tim van Erven - PowerPoint PPT Presentation

Machine Learning 2007: Lecture 4 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/˜erven/teaching/0708/ml/ September 27, 2007 1 / 29

Overview Organisational Organisational Matters ● Matters An Unbiased Hypothesis Space for ● L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Math: Directed Graphs and Trees ● Hypothesis Space: Decision Trees for Classification ● Decision Trees ID3 ✦ Hypothesis Space: Decision Trees Probability ✦ Method: ID3 Distributions Math: Probability Distributions ● 2 / 29

Organisational Matters Organisational Course Organisation: Matters Biweekly exercises: you get a full week instead of 5 days. L IST -T HEN -E LIMINATE ● Directed Graphs and Exercise 2 available this evening. ● Trees Grades for Exercise 1 available this week. ● Hypothesis Space: Decision Trees Study Guide: ID3 Probability You don’t have to know the details of the ● Distributions C ANDIATE -E LIMINATION algorithm, just that it does the same thing as the L IST -T HEN -E LIMINATE algorithm. But sections 2.6 and 2.7 of Mitchell are very important! Just ● replace each occurrence of C ANDIATE -E LIMINATION by L IST -T HEN -E LIMINATE when reading them. This Lecture versus Mitchell: Decision trees are in Mitchell, but I will discuss the underlying ● mathematics in much more detail. 3 / 29

L IST -T HEN -E LIMINATE Algorithm Description: Organisational Matters L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE finds the set, VersionSpace, of all ● Directed Graphs and Trees hypotheses that are consistent with all the training data. Hypothesis Space: It can only classify a new feature vector x if all the hypotheses ● Decision Trees in VersionSpace agree. ID3 Probability Distributions Hypothesis Space: H = {� ? , ? , ? , ? , ? , ? � , � Sunny , ? , ? , ? , ? , ? � , � Warm , ? , ? , ? , ? , ? � , . . . , �∅ , ∅ , ∅ , ∅ , ∅ , ∅�} Has a very strong representation bias : Only 973 out of ● 2 96 ≈ 10 29 possible hypotheses can be represented. 5 / 29

An Unbiased Hypothesis Space All Possible Hypotheses: Organisational Matters Why not take all possible hypotheses as a hypothesis space for L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Hypothesis Space: H = { h | h is a function from X to Y} , Decision Trees ID3 where Probability Distributions X = set of possible feature vectors, ● Y = set of possible labels, ● |H| = |Y| |X| = 2 96 . ● 6 / 29

An Unbiased Hypothesis Space All Possible Hypotheses: Organisational Matters Why not take all possible hypotheses as a hypothesis space for L IST -T HEN -E LIMINATE L IST -T HEN -E LIMINATE ? Directed Graphs and Trees Hypothesis Space: H = { h | h is a function from X to Y} , Decision Trees ID3 where Probability Distributions X = set of possible feature vectors, ● Y = set of possible labels, ● |H| = |Y| |X| = 2 96 . ● Classifying a New Feature Vector: � y 1 � � y n � Given: data D = , . . . , . ● x 1 x n What happens if we try to classify a new feature vector x n +1 ? ● 6 / 29

Classifying New Instances For any hypothesis h ∈ H , there exists a h ′ ∈ H such that Organisational Matters L IST -T HEN -E LIMINATE h ( x ) � = h ′ ( x ) if x = x n + 1 , Directed Graphs and Trees h ( x ) = h ′ ( x ) for any other x . Hypothesis Space: Decision Trees ID3 Probability Distributions 7 / 29

Classifying New Instances For any hypothesis h ∈ H , there exists a h ′ ∈ H such that Organisational Matters L IST -T HEN -E LIMINATE h ( x ) � = h ′ ( x ) if x = x n + 1 , Directed Graphs and Trees h ( x ) = h ′ ( x ) for any other x . Hypothesis Space: Decision Trees Consequence: ID3 Probability Suppose x n +1 does not occur in D . ● Distributions Then for every h ∈ VersionSpace, there exists an alternative ● h ′ ∈ VersionSpace that disagrees on the label of x n +1 : h ( x n +1 ) � = h ′ ( x n +1 ) Conclusion: In an unbiased hypothesis space, the L IST -T HEN -E LIMINATE algorithm cannot generalise at all. Bias is unavoidable! 7 / 29

Directed Graphs Organisational A directed graph G is an ordered pair G = ( V, E ) , where Matters V = { v 1 , . . . , v m } is a set of vertices / nodes ; ● L IST -T HEN -E LIMINATE E = { e 1 , . . . , e n } is a set of directed edges between the Directed Graphs and ● Trees vertices in V . Hypothesis Space: Decision Trees Each directed edge e from vertex u to vertex v is an ordered ● ID3 pair e = ( u, v ) . Probability I can draw the same directed graph in different ways. ● Distributions v1 v2 v3 v4 v7 v5 v6 9 / 29

Directed Graphs Organisational A directed graph G is an ordered pair G = ( V, E ) , where Matters V = { v 1 , . . . , v m } is a set of vertices / nodes ; ● L IST -T HEN -E LIMINATE E = { e 1 , . . . , e n } is a set of directed edges between the Directed Graphs and ● Trees vertices in V . Hypothesis Space: Decision Trees Each directed edge e from vertex u to vertex v is an ordered ● ID3 pair e = ( u, v ) . Probability I can draw the same directed graph in different ways. ● Distributions v1 v1 v2 v3 v2 v3 v4 v7 v6 v7 v5 v6 v4 v5 9 / 29

Directed Graphs with Edge Labels We can also label edges with labels from some set of Organisational ● Matters possible labels L . Now G = ( V, E, L ) . L IST -T HEN -E LIMINATE Each directed edge e with label l ∈ L from vertex u to vertex v ● Directed Graphs and Trees is an ordered pair e = ( u, l, v ) . Hypothesis Space: Decision Trees Example: ID3 Let L = { a, b, c } . Probability Distributions c v1 v2 a c v3 v4 b a a v5 v6 v7 10 / 29

Tree Examples Organisational Example 1: Example 2: Example 3: Matters L IST -T HEN -E LIMINATE Directed Graphs and v1 v1 Trees a b Hypothesis Space: Decision Trees v1 v3 v2 v3 v2 v4 ID3 Probability Distributions Example 4: Example 5: ● In all examples the root of the tree v1 is v 1 . ● The nodes with- v1 out outgoing a b v2 v3 v8 edges (shown in v2 v3 red) are called v4 v5 b a leaves . ● The other nodes v4 v5 v6 v7 are called internal nodes. 11 / 29

Directed Trees A directed graph is a (directed) tree T = ( V, E ) with root v ∈ V if Organisational Matters and only if either: L IST -T HEN -E LIMINATE v is the only node: T = ( { v } , ∅ ) , or Directed Graphs and 1. Trees 2. T 1 , . . . , T k are trees with roots t 1 , . . . , t k , ● Hypothesis Space: Decision Trees v , T 1 , . . . , T k have no nodes in common, and ● ID3 T looks like: ● Probability Distributions v t1 tk Tk T1 12 / 29

Properties of Trees Let T be a (directed) tree. Organisational Matters If T contains an edge e = ( u, v ) from node u to node v , then ● L IST -T HEN -E LIMINATE Directed Graphs and ✦ u is called the parent of v , Trees v is called the child of u . Hypothesis Space: ✦ Decision Trees ID3 Probability Distributions 13 / 29

Properties of Trees Let T be a (directed) tree. Organisational Matters If T contains an edge e = ( u, v ) from node u to node v , then ● L IST -T HEN -E LIMINATE Directed Graphs and ✦ u is called the parent of v , Trees v is called the child of u . Hypothesis Space: ✦ Decision Trees ID3 Number of Parents: Probability Distributions Each node has exactly one parent, except for the root , which ● has no parents. 13 / 29

Properties of Trees Let T be a (directed) tree. Organisational Matters If T contains an edge e = ( u, v ) from node u to node v , then ● L IST -T HEN -E LIMINATE Directed Graphs and ✦ u is called the parent of v , Trees v is called the child of u . Hypothesis Space: ✦ Decision Trees ID3 Number of Parents: Probability Distributions Each node has exactly one parent, except for the root , which ● has no parents. Number of Children: Each node may have any (finite) number of children. ● The leaves are the nodes without children. ● The internal nodes have at least one child. ● 13 / 29

Machine Learning 2007: Lecture 4 Instructor: Tim van Erven - PowerPoint PPT Presentation

Machine Learning 2007: Lecture 4 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/erven/teaching/0708/ml/ September 27, 2007 1 / 29 Overview Organisational Organisational Matters Matters An Unbiased Hypothesis

Machine Learning 2007: Lecture 2 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Lecture 7 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Lecture 11 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Lecture 8 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Lecture 3 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Slides 1 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Slides 1 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Mixability in Statistical Learning Tim van Erven Joint work with: Peter Grnwald, Mark Reid, Bob

Learning Faster from Easy Data II Wouter Koolen Tim van Erven Aim of the Workshop

Follow the leader if you can, Hedge if you must Tim van Erven NIPS, 2013 Joint work with:

The Catch-up Phenomenon in Bayesian and MDL Model Selection Tim van Erven www.timvanerven.nl 23

Follow the Leader with Dropout Perturbations Tim van Erven COLT, 2014 Joint work with: Wojciech

Making Regional Forecasts Add Up 1,2 Tim van Erven Joint work with: Jairo Cugliari 2 1 2

x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data,

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Subnormal weighted shifts on directed trees whose n th powers have trivial domain Zenon Jabo

A Functional Graph Library Based on Inductive Graphs and Functional Graph Algorithms by Martin

Dependency Parsing CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Slides

1 Previously, we looked at random tes.ng, or fuzzing, as

Brauer Trees and Brauer Tree Algebras Adam Wood Department of Mathematics University of Iowa

Revisiting Key-alternating Feistel Ciphers for Shorter Keys and Multi-user Security Chun Guo and

SensiKeys: improving movement with precision Raymund & Georg Zacharias Situation A PC-gamer

Khudra Secure Embedded Architecture Laboratory, Indian Institute of Technology, Kharagpur, India

Machine Learning 2007: Lecture 4 Instructor: Tim van Erven - PowerPoint PPT Presentation

Machine Learning 2007: Lecture 4 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/erven/teaching/0708/ml/ September 27, 2007 1 / 29 Overview Organisational Organisational Matters Matters An Unbiased Hypothesis

Machine Learning 2007: Lecture 2 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Lecture 7 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Lecture 11 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Lecture 8 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Lecture 3 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Slides 1 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Slides 1 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Mixability in Statistical Learning Tim van Erven Joint work with: Peter Grnwald, Mark Reid, Bob

Learning Faster from Easy Data II Wouter Koolen Tim van Erven Aim of the Workshop

Follow the leader if you can, Hedge if you must Tim van Erven NIPS, 2013 Joint work with:

The Catch-up Phenomenon in Bayesian and MDL Model Selection Tim van Erven www.timvanerven.nl 23

Follow the Leader with Dropout Perturbations Tim van Erven COLT, 2014 Joint work with: Wojciech

Making Regional Forecasts Add Up 1,2 Tim van Erven Joint work with: Jairo Cugliari 2 1 2

x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data,

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Subnormal weighted shifts on directed trees whose n th powers have trivial domain Zenon Jabo

A Functional Graph Library Based on Inductive Graphs and Functional Graph Algorithms by Martin

Dependency Parsing CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Slides

1 Previously, we looked at random tes.ng, or fuzzing, as

Brauer Trees and Brauer Tree Algebras Adam Wood Department of Mathematics University of Iowa

Revisiting Key-alternating Feistel Ciphers for Shorter Keys and Multi-user Security Chun Guo and

SensiKeys: improving movement with precision Raymund &amp; Georg Zacharias Situation A PC-gamer

Khudra Secure Embedded Architecture Laboratory, Indian Institute of Technology, Kharagpur, India

SensiKeys: improving movement with precision Raymund & Georg Zacharias Situation A PC-gamer