Human-Oriented Robotics Prof. Kai Arras Social Robotics Lab Human-Oriented Robotics Supervised Learning Part 3/3 Kai Arras Social Robotics Lab, University of Freiburg 1
Human-Oriented Robotics Supervised Learning Prof. Kai Arras Social Robotics Lab Contents • Introduction and basics • Bayes Classi fi er • Logistic Regression • Support Vector Machines • AdaBoost • k-Nearest Neighbor • Cross-validation • Performance measures 2
Human-Oriented Robotics Supervised Learning Prof. Kai Arras Social Robotics Lab Ensemble Learning • So far, we have looked at learning methods in which a single hypothesis h for is used to make predictions • The underlying idea of ensemble learning is to select a collection, or ensemble , of hypotheses and combine their predictions • Consider, for instance, an ensemble of K = 5 hypotheses and suppose that we combine their predictions using simple majority voting. For the ensemble to misclassify a new sample, at least 3 of 5 hypotheses have to be wrong. This is much less likely than a mistake by a single hypothesis • Boosting is the most widely used ensemble learning method. In boosting, simple “rules” or base classi fi ers are trained in sequence in a way that the performance of the ensemble members is improved, i.e. “boosted” • Other ensemble methods include bagging, mixture of experts, voting 3
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Ensemble Learning • AdaBoost is the most popular boosting algorithm • It learns an accurate strong classi fi er by combining an ensemble of inaccurate “rules of thumb” • Inaccurate rule : weak classi fi er (a.k.a. weak learner, base classi fi er, feature) • Accurate rule : strong classi fi er • Given an ensemble of weak classi fi ers the combined strong classi fi er is obtained by a weighted majority voting scheme Con fi dence Strong classi fi er 4
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Boosting • Boosting methods de fi ne a weight distribution over the training samples • Each weak classi fi er is trained { w (1) { w (2) { w ( M ) n } n } } n on weighted training data (blue arrows) in which the weights depend on the y 1 ( x ) y 2 ( x ) y M ( x ) performance of the previous weak classi fi er (green) • Once all classi fi ers have been Source [4] learned, they are combined to M ! X Y M ( x ) = sign α m y m ( x ) give a strong classi fi er (red) m 5
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Boosting • Weak classi fi er examples • Decision stump Single axis-parallel partition of space • Decision tree Hierarchical partition of space • Multi-layer perceptron General non-linear function approximators • Support Vector Machines Maximum-margin classi fi er • There is a trade-o ff between diversity among weak learners versus their accuracy • Decision stumps are a popular choice 6
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Decision Stump • Simple-most type of decision tree • Linear classi fi er de fi ned by an axis-parallel hyperplane with parameters θ and d x • Hyperplane is orthogonal to axis/dimension 2 d with which it intersects orthogonally at threshold value θ • Rarely useful on its own due to its simplicity • Formally, x θ 1 where is an m -dimensional training sample, d is the dimension 7
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Decision Stump • Learning objective of decision stumps on weighted data x 2 where I( . ) is the indicator function x θ 1 • The goal is to fi nd parameters θ *, d * that minimize the weighted error 8
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Decision Stump Learning algorithm for decision stumps on weighted data • For 1. Sort samples in ascending order along dimension d 2. For Compute N cumulative sums 3. Threshold is at extremum of 4. Sign of extremum gives direction p d of inequality • Global extremum in all m cumulative sums gives optimal threshold and dimension 9
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Decision Stump Learning algorithm for decision stumps on weighted data x • Label y : 2 red: +1 blue: –1 • Assume all x 1 weights = 1 θ * , j * = 1 , = 1 10
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Learning ( x 1 , y 1 ) , ( x 2 , y 2 ) , · · · ( x N , y N ) Given training set , learn a strong classi fi er • Initialize weights • For 1. Learn a weak classi fi er on weighted training data minimizing the error 2. Compute voting weight of as 3. Recompute weights 11
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Learning • Voting weight of a weak 2.5 classi fi er as a function of the 2 error 1.5 • measures the importance 1 of classi fi er and corres- ponds to the strength of its 0.5 vote in the strong classi fi er 0 • The expression yields the − 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 optimal voting weight . error = 0.5 Proven later. • Notice, training samples are weighted by weight , weak classi fi ers are weighted by voting weight 12
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Learning • Let us take a closer look at the weight update step • From we see that weights of misclassi fi ed training samples are increased and weights of correctly classi fi ed samples are decreased • Normalizer Z k makes the weight distribution a probability distribution • Thus, the learning algorithm generates weak classi fi er by training the next classi fi er on the mistakes of the previous one • Hence the name: AdaBoost is derived from ada ptive Boost ing 13
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Inference and Decision T x 0 + • After the learning phase, predictions of new data are made by the weighted majority voting scheme of the strong classi fi er • The learned model consists in the K weak learner with associated voting weights 14
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Learning: why does it work? • The goal for the strong classi fi er is to minimize the training error de fi ned as the number of misclassi fi ed training pairs • Using the indicator function we can rewrite the error as • Remember our de fi nitions of the con fi dence , the strong classi fi er and labels 15
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Learning: why does it work? • Then, we see that implies and the error becomes Often called 0/1-loss function • Plotting the error for the case of a single sample shows that the function is non-di ff erentiable and di ffi cult to handle mathematically • Idea: because minimizing the training error directly is di ffi cult, we de fi ne an upper bound and minimize this bound instead 16
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Learning: why does it work? • Then, we see that implies and the error becomes Often called 0/1-loss function • Plotting the error for the case of a single sample shows that the function is non-di ff erentiable and di ffi cult to handle mathematically • Idea: because minimizing the training error directly is di ffi cult, we de fi ne an upper bound and minimize this bound instead • Using the exponential loss function we have for a single sample 17
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Learning: why does it work? • The upper bound holds for all training samples • To proceed from here, we consider the weight update equation and unravel it recursively from the back for k = K From k = 0 18
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Learning: why does it work? • Substitution into the error bound yields • Minimizing the upper bounds is equivalent to minimizing the product of the K normalizers or the Z k in each training round, respectively • This in turn is achieved by choosing the optimal weak classi fi er and fi nding the optimal voting weight 19
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Learning: why does it work? • First , let us go for the optimal voting weight • To minimize we partially di ff erentiate it w.r.t. and set the derivative to zero (skipping round index k ) • Next, we subdivide the sum into a sum over the correctly predicted samples (for which ) and a sum over the misclassi fi ed samples (for which ) 20
Human-Oriented Robotics AdaBoost Prof. Kai Arras Social Robotics Lab Learning: why does it work? • The last step uses the de fi nition of the error for weak learners to be the weighted sum over all misclassi fi ed training samples. We fi nally fi nd • Second , we want to fi nd the optimal weak classi fi er that minimizes Z k using this result • We subdivide Z k into the same two sums as before, use the de fi nition of the error for weak learners and substitute the optimal voting weight 21
Recommend
More recommend