Machine Learning 2007: Lecture 3 Instructor: Tim van Erven - PowerPoint PPT Presentation

Machine Learning 2007: Lecture 3 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/˜erven/teaching/0708/ml/ September 20, 2007 1 / 30

Overview Organisational Organisational Matters ● Matters Hypothesis Spaces ● Hypothesis Spaces Method: Least Squares Linear Regression ● Least Squares Linear Regression Being Informal about Feature Vectors ● Being Informal about Method: L IST -T HEN -E LIMINATE for Concept Learning ● Feature Vectors L IST -T HEN -E LIMINATE ✦ A Biased Hypothesis Space for Concept Learning ✦ An Unbiased Hypothesis Space? Biased Hypothesis Space An Unbiased Hypothesis Space? 2 / 30

Organisational Matters Organisational Course Organisation: Matters Hypothesis Spaces Intermediate exam: October 25, 11.00 – 13.00 in 04A05. ● Least Squares Linear Biweekly exercises ● Regression Being Informal about This Lecture versus Mitchell Feature Vectors L IST -T HEN -E LIMINATE for Concept Learning All of it is in the book (Chapters 1 and 2), except for “Being ● Biased Hypothesis Informal About Feature Vectors”. Space The presentation is different though: We recognise methods ● An Unbiased Hypothesis Space? from Mitchell as methods to deal with regression and classification. 3 / 30

Reminder of Machine Learning Categories Prediction: Given data D = y 1 , . . . , y n , predict how the Organisational Matters sequence continues with y n +1 . Hypothesis Spaces � y 1 � � y n � Least Squares Linear Regression: Given data D = , learn to predict , . . . , Regression x 1 x n Being Informal about the value of the label y for any new feature vector x . Typically y Feature Vectors L IST -T HEN -E LIMINATE can take infinitely many values. Acceptable if your prediction is for Concept Learning close to the correct y . Biased Hypothesis Space � y 1 � � y n � Classification: Given data D = An Unbiased , . . . , , learn to Hypothesis Space? x 1 x n predict the class label y for any new feature vector x . Only finitely many categories. Your prediction is either correct or wrong. 5 / 30

Hypotheses and Hypothesis Spaces Definition of a Hypothesis: Organisational Matters A hypothesis h is a candidate description of the regularity or Hypothesis Spaces Least Squares Linear patterns in your data. Regression Being Informal about Feature Vectors Prediction example: y n +1 = h ( y 1 , . . . , y n ) = y n − 1 + y n ● L IST -T HEN -E LIMINATE Regression example: y = h ( x ) = 5 x 1 ● for Concept Learning � +1 if 3 x 1 − 20 > 0; Biased Hypothesis Classification example: y = h ( x ) = ● Space − 1 otherwise. An Unbiased Hypothesis Space? Definition of a Hypothesis Space: A hypothesis space H is the set { h } of hypotheses that are being considered. Regression example: { h a ( x ) = a · x 1 | a ∈ R } ● 6 / 30

Linear Regression Linear Regression: Organisational Matters In linear regression the goal is to select a linear hypothesis that Hypothesis Spaces Least Squares Linear best captures the regularity in the data. Regression Being Informal about Feature Vectors 100 L IST -T HEN -E LIMINATE for Concept Learning 80 Biased Hypothesis Space 60 An Unbiased Hypothesis Space? 40 y 20 0 −20 −10 −5 0 5 10 15 x 8 / 30

Hypothesis Space of Linear Hypotheses Organisational Linear Function: Matters Hypothesis Spaces y = h w ( x ) = w 0 + w 1 x 1 + . . . + w d x d Least Squares Linear Regression Being Informal about x = ( x 1 , . . . , x d ) ⊤ is a d -dimensional feature vector. Feature Vectors ● w = ( w 0 , w 1 , . . . , w d ) ⊤ are called the weights . L IST -T HEN -E LIMINATE ● for Concept Learning Examples: Biased Hypothesis Space An Unbiased h w ( x ) = 2 + 9 x 1 ( w 0 = 2 , w 1 = 9) Hypothesis Space? h w ( x ) = 3 + 16 x 1 − 2 x 3 ( w 0 = 3 , w 1 = 16 , w 2 = 0 , w 3 = − 2) Hypothesis Space of All Linear Hypotheses: H = { h w | w ∈ R d +1 } . 9 / 30

Example: A Linear Function with Noise Organisational Matters 100 Hypothesis Spaces 80 Least Squares Linear Regression 60 Being Informal about Feature Vectors 40 y L IST -T HEN -E LIMINATE 20 for Concept Learning 0 Biased Hypothesis Space −20 An Unbiased Hypothesis Space? −10 −5 0 5 10 15 x Data generated by a linear function y = 6 x + 20 + ǫ, where ǫ is noise with distribution N (0 , 10) . Can we recover this function from the data alone? 10 / 30

Determining Weights from the Data Organisational Squared Error: Matters For given w , we may evaluate the squared error of h w on a single Hypothesis Spaces � y i � Least Squares Linear data-item : Regression x i Being Informal about Feature Vectors Squared Error = ( y i − h w ( x i )) 2 L IST -T HEN -E LIMINATE for Concept Learning Biased Hypothesis Space Least Squares Linear Regression: An Unbiased Hypothesis Space? � y 1 � � y n � Given data D = , . . . , , select w to minimize the sum x 1 x n of squared errors SSE ( D ) on all data: n � ( y i − h w ( x i )) 2 . min w SSE ( D ) = min w i =1 11 / 30

Linear Regression Example The previous example again: Organisational Matters Hypothesis Spaces 100 Least Squares Linear Regression 80 Being Informal about Feature Vectors L IST -T HEN -E LIMINATE 60 for Concept Learning 40 Biased Hypothesis y Space An Unbiased 20 Hypothesis Space? 0 −20 −10 −5 0 5 10 15 x Original Function y = 6 x + 20 + ǫ 12 / 30

Linear Regression Example The previous example again: Organisational Matters Hypothesis Spaces 100 Least Squares Linear Regression 80 Being Informal about Feature Vectors L IST -T HEN -E LIMINATE 60 for Concept Learning 40 Biased Hypothesis y Space An Unbiased 20 Hypothesis Space? 0 −20 −10 −5 0 5 10 15 x Original Function Least Squares y = 6 x + 20 + ǫ y = 6 . 38 x + 17 . 37 12 / 30

Inductive Bias Organisational Least Squares Linear Regression: Matters Hypothesis Spaces Only looks for linear patterns in the data. ● Least Squares Linear Regression For example, it cannot discover y = x 2 ✦ 1 even if it gets an Being Informal about infinite amount of data. Feature Vectors L IST -T HEN -E LIMINATE Minimizes the sum of squared errors. for Concept Learning ● Biased Hypothesis ✦ Why not something else, like for example the sum of Space absolute errors? An Unbiased Hypothesis Space? n � | y i − h w ( x i ) | min w i =1 13 / 30

EnjoySport Representation 1 Numbering Attribute Values: Organisational Matters Attribute Sky AirTemp EnjoySport Hypothesis Spaces Least Squares Linear Value Sunny Cloudy Rainy Warm Cold No Yes Regression Encoding 1 2 3 1 2 1 2 Being Informal about Feature Vectors L IST -T HEN -E LIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 15 / 30

EnjoySport Representation 1 Numbering Attribute Values: Organisational Matters Attribute Sky AirTemp EnjoySport Hypothesis Spaces Least Squares Linear Value Sunny Cloudy Rainy Warm Cold No Yes Regression Encoding 1 2 3 1 2 1 2 Being Informal about Feature Vectors Example: L IST -T HEN -E LIMINATE for Concept Learning Sky, AirTemp EnjoySport Representation Biased Hypothesis � 1 � Space Sunny, Warm Yes x = , y = 2 1 An Unbiased � 3 � Hypothesis Space? x = , y = 1 Rainy, Cold No 2 � 1 � Sunny, Cold Yes x = , y = 2 2 ● The difference between feature vectors has no clear meaning. For � 3 � � 1 � � 2 � − example = . 2 1 1 15 / 30

EnjoySport Representation 2 Another Way to Do It: Organisational Matters Attribute Sky AirTemp EnjoySport Hypothesis Spaces Least Squares Linear Value Sunny Cloudy Rainy Warm Cold No Yes Regression       1 0 0 � 1 � � 0 � Being Informal about Encoding 0 1 0 1 2 Feature Vectors       0 1 0 0 1 L IST -T HEN -E LIMINATE for Concept Learning Biased Hypothesis Space An Unbiased Hypothesis Space? 16 / 30

Machine Learning 2007: Lecture 3 Instructor: Tim van Erven - PowerPoint PPT Presentation

Machine Learning 2007: Lecture 3 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website: www.cwi.nl/erven/teaching/0708/ml/ September 20, 2007 1 / 30 Overview Organisational Organisational Matters Matters Hypothesis Spaces

Machine Learning 2007: Lecture 2 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Lecture 7 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Lecture 11 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Lecture 8 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Lecture 4 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Slides 1 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Machine Learning 2007: Slides 1 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Mixability in Statistical Learning Tim van Erven Joint work with: Peter Grnwald, Mark Reid, Bob

Learning Faster from Easy Data II Wouter Koolen Tim van Erven Aim of the Workshop

Follow the leader if you can, Hedge if you must Tim van Erven NIPS, 2013 Joint work with:

The Catch-up Phenomenon in Bayesian and MDL Model Selection Tim van Erven www.timvanerven.nl 23

Follow the Leader with Dropout Perturbations Tim van Erven COLT, 2014 Joint work with: Wojciech

Making Regional Forecasts Add Up 1,2 Tim van Erven Joint work with: Jairo Cugliari 2 1 2

x ? Machine Learning 5/4/20 Tim Althoff, UW CS547: Machine Learning for Big Data,

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

Strategic Classification with Crowdsourcing Yang Liu ( joint work with Yiling Chen)

IN IN LI LINE E AN AND BAR D BAR GRA GRAPH PHS: S: UNDE DERES RESTIMA TIMATION, TION,

Lecture 2: Gradient Estimators CSC 2547 Spring 2018 David Duvenaud Based mainly on slides by Will

Probability and Statistics for Computer Science How

Biostatistics Preparatory Course: Methods and Computing Lecture 6 Simulations Methods and

Fu Func nctio tions ns on t on the he La Latt ttic ice Huey-Wen Lin University of

Pileup Systematic Studies in The Fermilab Muon g-2 Experiment Meghna Bhattacharya University of

Cross Section Uncertainties in the NOvA Oscillation Analyses Aaron Mislivec University of