The bits the whirlwind tour left out ... BMVA Summer School 2016 – extra background slides (from teaching material at Durham University) BMVA Summer School 2016 Machine Learning Extra : 1
Machine Learning Definition: – “ A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks T, improves with experience E .” [Mitchell, 1997] BMVA Summer School 2016 Machine Learning Extra : 2
Algorithm to construct decision trees …. BMVA Summer School 2016 Machine Learning Extra : 3
Building Decision Trees – ID3 node = root of tree Main loop: A = “best” decision attribute for next node ..... But which attribute is best to split on ? BMVA Summer School 2016 Machine Learning Extra : 4
Entropy in machine learning Entropy : a measure of impurity – S is a sample of training examples – P is the proportion of positive examples in S – P ⊖ is the proportion of negative examples in S Entropy measures the impurity of S: BMVA Summer School 2016 Machine Learning Extra : 5
Information Gain – reduction in Entropy Gain(S,A) = expected reduction in entropy due to splitting on attribute A – i.e. expected reduction in impurity in the data – (improvement in consistent data sorting) BMVA Summer School 2016 Machine Learning Extra : 6
Information Gain – reduction in Entropy – reduction in entropy in set of examples S if split on attribute A – S v = subset of S for which attribute A has value v – Gain(S,A) = original entropy – SUM(entropy of sub-nodes if split on A) BMVA Summer School 2016 Machine Learning Extra : 7
Information Gain – reduction in Entropy Information Gain : – “information provided about the target function given the value of some attribute A” – How well does A sort the data into the required classes? Generalise to c classes : – (not just or ⊖) c Entropy S =− ∑ p i log p i i = 1 BMVA Summer School 2016 Machine Learning Extra : 8
Building Decision Trees Selecting the Next Attribute – which attribute should we split on next? BMVA Summer School 2016 Machine Learning Extra : 9
Building Decision Trees Selecting the Next Attribute – which attribute should we split on next? BMVA Summer School 2016 Machine Learning Extra : 10
Boosting and Bagging …. + Forests BMVA Summer School 2016 Machine Learning Extra : 11
Learning using Boosting Learning Boosted Classifier (Adaboost Algorithm) Assign equal weight to each training instance For t iterations: Apply learning algorithm to weighted training set, store resulting (weak) classifier Compute classifier’s error e on weighted training set If e = 0 or e > 0.5: Terminate classifier generation For each instance in training set: If classified correctly by classifier: Multiply instance’s weight by e /(1- e ) Normalize weight of all instances e = error of classifier on the training set Classification using Boosted Classifier Assign weight = 0 to all classes For each of the t (or less) classifiers : For the class this classifier predicts add –log e /(1- e ) to this class’s weight Return class with highest weight Toby Breckon Lecture 4 : 12
Learning using Boosting Some things to note: – Weight adjustment means t+1 th classifier concentrates on the examples t th classifier got wrong – Each classifier must be able to achieve greater than 50% success • (i.e. 0.5 in normalised error range {0..1}) – Results in an ensemble of t classifiers • i.e. a boosted classifier made up of t weak classifiers • boosting/bagging classifiers often called ensemble classifiers – Training error decreases exponentially (theoretically) • prone to over-fitting (need diversity in test set) – several additions/modifications to handle this – Works best with weak classifiers ..... Boosted Trees – set of t decision trees of limited complexity (e.g. depth) Toby Breckon Lecture 4 : 13
Decision Forests (a.k.a. Random Forests/Trees) Bagging using multiple decision trees where each tree in the ensemble classifier ... – is trained on a random subsets of the training data – computes a node split on a random subset of the available attributes [Breiman 2001] Each tree is grown as follows: – Select a training set T' (size N) by randomly selecting (with replacement) N instances from training set T – Select a number m < M where a subset of m attributes out of the available M attributes are used to compute the best split at a given node (m is constant across all trees in the forest) – Grow each tree using T' to the largest extent possible without any pruning. Toby Breckon Lecture 4 : 14
Backpropogation Algorithm …. BMVA Summer School 2016 Machine Learning Extra : 15
Backpropagation Algorithm Assume we have: Output vector, O k – input examples d={1...D} • each is pair {x d ,t d } = {input node index {1 … N} Output vector, target vector} Layer – node index n={1 … N} Hidden Layer – weight w ji connects node j → i – input x ji is the input on the connection node j → i Input layer • corresponding weight = w ji – output error for node n is δ n Input, x • similar to (o – t) BMVA Summer School 2016 Machine Learning Extra : 16
Backpropagation Algorithm (1) Input Example example d (2) output layer error based on : difference between output and targe t (t - o) derivative of sigmoid function (3) Hidden layer error proportional to node contribution to output error (4) Update weights w ij – BMVA Summer School 2016 Machine Learning Extra : 17
Backpropagation Termination criteria – number of iterations reached – Or error below suitable bound Output layer error Hidden layer error Add weights updated using relevant error BMVA Summer School 2016 Machine Learning Extra : 18
Backpropagation Output vector, O k Output Layer, unit k Hidden Layer, unit h Input layer Input, x BMVA Summer School 2016 Machine Learning Extra : 19
Backpropagation Output vector, O k Output Layer, unit k Hidden Layer, unit h Input layer δ h is expressed as a weighted sum of the output layer errors δ k Input, x to which it contributes (i.e. w hk > 0 ) BMVA Summer School 2016 Machine Learning Extra : 20
Backpropagation Output vector, O k Error is propogated backwards from network Output output .... Layer, unit k to weights of output layer Hidden .... Layer, unit h to weights of the hidden layer … Input layer Input, x Hence the name: backpropagation BMVA Summer School 2016 Machine Learning Extra : 21
Backpropagation Output vector, O k Repeat these stages for every Output hidden layer in a Layer, unit k multi-layer network: (using error δ i where x ji >0 ) Hidden ....... Layer(s), unit h Input layer Input, x BMVA Summer School 2016 Machine Learning Extra : 22
Backpropagation Output vector, O k Error is propogated backwards from network Output output .... Layer, unit k to weights of output layer .... over weights of all N Hidden ....... Layer(s), hidden layers unit h … Input layer Hence the name: backpropagation Input, x BMVA Summer School 2016 Machine Learning Extra : 23
Backpropagation Will perform gradient descent over the weight space of {w ji } for all connections i → j in the network Stochastic gradient descent – as updates based on training one sample at a time BMVA Summer School 2016 Machine Learning Extra : 24
Future and current concepts This is beyond the scope this introductory tutorial but the following are recommended as good places to start: Convolutional Neural Networks – http://deeplearning.net/tutorial/lenet.html Deep Learning – http://www.deeplearning.net/tutorial/ BMVA Summer School 2016 Machine Learning Extra : 25
Understanding (and believing) the SVM stuff …. BMVA Summer School 2016 Machine Learning Extra : 26
2D LINES REMINDER Remedial Note: equations of 2D lines Line: Normal to line Offset from origin where: are 2D vectors. BMVA Summer School 2016 Machine Learning Extra : 27
2D LINES REMINDER Remedial Note: equations of 2D lines http://www.mathopenref.com/coordpointdisttrig.html BMVA Summer School 2016 Machine Learning Extra : 28
2D LINES REMINDER Remedial Note: equations of 2D lines For a defined line equation: Fixed Insert point into equation …... Result is the distance (+ve or Result is +ve if -ve) of point from line given by: point on this side of line (i.e.> 0). Normal to line for: Result is -ve if point on this side of line. ( < 0 ) BMVA Summer School 2016 Machine Learning Extra : 29
Linear Separator Classification of example function Instances (i.e, examples) { x i , y i } y = +1 f(x) = y = {+1, -1} x i = point in instance space (R n ) made i.e. 2 classes – up of n attributes y i =class value for classification of x i – Want a linear separator. Can view this as constraint satisfaction problem: y = -1 Equivalently, N.B. we have a vector of weights coefficients ⃗ w BMVA Summer School 2016 Machine Learning Extra : 30
Recommend
More recommend