the bits the whirlwind tour left out
play

The bits the whirlwind tour left out ... BMVA Summer School 2016 - PowerPoint PPT Presentation

The bits the whirlwind tour left out ... BMVA Summer School 2016 extra background slides (from teaching material at Durham University) BMVA Summer School 2016 Machine Learning Extra : 1 Machine Learning Definition: A computer


  1. The bits the whirlwind tour left out ... BMVA Summer School 2016 – extra background slides (from teaching material at Durham University) BMVA Summer School 2016 Machine Learning Extra : 1

  2. Machine Learning  Definition: – “ A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P , if its performance at tasks T, improves with experience E .” [Mitchell, 1997] BMVA Summer School 2016 Machine Learning Extra : 2

  3. Algorithm to construct decision trees …. BMVA Summer School 2016 Machine Learning Extra : 3

  4. Building Decision Trees – ID3  node = root of tree  Main loop: A = “best” decision attribute for next node ..... But which attribute is best to split on ? BMVA Summer School 2016 Machine Learning Extra : 4

  5. Entropy in machine learning  Entropy : a measure of impurity – S is a sample of training examples – P  is the proportion of positive examples in S – P ⊖ is the proportion of negative examples in S  Entropy measures the impurity of S: BMVA Summer School 2016 Machine Learning Extra : 5

  6. Information Gain – reduction in Entropy  Gain(S,A) = expected reduction in entropy due to splitting on attribute A – i.e. expected reduction in impurity in the data – (improvement in consistent data sorting) BMVA Summer School 2016 Machine Learning Extra : 6

  7. Information Gain – reduction in Entropy – reduction in entropy in set of examples S if split on attribute A – S v = subset of S for which attribute A has value v – Gain(S,A) = original entropy – SUM(entropy of sub-nodes if split on A) BMVA Summer School 2016 Machine Learning Extra : 7

  8. Information Gain – reduction in Entropy  Information Gain : – “information provided about the target function given the value of some attribute A” – How well does A sort the data into the required classes?  Generalise to c classes : – (not just  or ⊖) c Entropy  S =− ∑ p i log p i i = 1 BMVA Summer School 2016 Machine Learning Extra : 8

  9. Building Decision Trees  Selecting the Next Attribute – which attribute should we split on next? BMVA Summer School 2016 Machine Learning Extra : 9

  10. Building Decision Trees  Selecting the Next Attribute – which attribute should we split on next? BMVA Summer School 2016 Machine Learning Extra : 10

  11. Boosting and Bagging …. + Forests BMVA Summer School 2016 Machine Learning Extra : 11

  12. Learning using Boosting Learning Boosted Classifier (Adaboost Algorithm) Assign equal weight to each training instance For t iterations: Apply learning algorithm to weighted training set, store resulting (weak) classifier Compute classifier’s error e on weighted training set If e = 0 or e > 0.5: Terminate classifier generation For each instance in training set: If classified correctly by classifier: Multiply instance’s weight by e /(1- e ) Normalize weight of all instances e = error of classifier on the training set Classification using Boosted Classifier Assign weight = 0 to all classes For each of the t (or less) classifiers : For the class this classifier predicts add –log e /(1- e ) to this class’s weight Return class with highest weight Toby Breckon Lecture 4 : 12

  13. Learning using Boosting  Some things to note: – Weight adjustment means t+1 th classifier concentrates on the examples t th classifier got wrong – Each classifier must be able to achieve greater than 50% success • (i.e. 0.5 in normalised error range {0..1}) – Results in an ensemble of t classifiers • i.e. a boosted classifier made up of t weak classifiers • boosting/bagging classifiers often called ensemble classifiers – Training error decreases exponentially (theoretically) • prone to over-fitting (need diversity in test set) – several additions/modifications to handle this – Works best with weak classifiers .....  Boosted Trees – set of t decision trees of limited complexity (e.g. depth) Toby Breckon Lecture 4 : 13

  14. Decision Forests (a.k.a. Random Forests/Trees)  Bagging using multiple decision trees where each tree in the ensemble classifier ... – is trained on a random subsets of the training data – computes a node split on a random subset of the available attributes [Breiman 2001]  Each tree is grown as follows: – Select a training set T' (size N) by randomly selecting (with replacement) N instances from training set T – Select a number m < M where a subset of m attributes out of the available M attributes are used to compute the best split at a given node (m is constant across all trees in the forest) – Grow each tree using T' to the largest extent possible without any pruning. Toby Breckon Lecture 4 : 14

  15. Backpropogation Algorithm …. BMVA Summer School 2016 Machine Learning Extra : 15

  16. Backpropagation Algorithm  Assume we have: Output vector, O k – input examples d={1...D} • each is pair {x d ,t d } = {input node index {1 … N} Output vector, target vector} Layer – node index n={1 … N} Hidden Layer – weight w ji connects node j → i – input x ji is the input on the connection node j → i Input layer • corresponding weight = w ji – output error for node n is δ n Input, x • similar to (o – t) BMVA Summer School 2016 Machine Learning Extra : 16

  17. Backpropagation Algorithm (1) Input Example example d (2) output layer error based on : difference between output and targe t (t - o) derivative of sigmoid function (3) Hidden layer error proportional to node contribution to output error (4) Update weights w ij – BMVA Summer School 2016 Machine Learning Extra : 17

  18. Backpropagation  Termination criteria – number of iterations reached – Or error below suitable bound  Output layer error  Hidden layer error  Add weights updated using relevant error BMVA Summer School 2016 Machine Learning Extra : 18

  19. Backpropagation Output vector, O k Output Layer, unit k Hidden Layer, unit h Input layer Input, x BMVA Summer School 2016 Machine Learning Extra : 19

  20. Backpropagation Output vector, O k Output Layer, unit k Hidden Layer, unit h Input layer δ h is expressed as a weighted sum of the output layer errors δ k Input, x to which it contributes (i.e. w hk > 0 ) BMVA Summer School 2016 Machine Learning Extra : 20

  21. Backpropagation Output vector, O k  Error is propogated backwards from network Output output .... Layer, unit k to weights of output layer Hidden .... Layer, unit h to weights of the hidden layer … Input layer Input, x  Hence the name: backpropagation BMVA Summer School 2016 Machine Learning Extra : 21

  22. Backpropagation Output vector, O k Repeat these stages for every Output hidden layer in a Layer, unit k multi-layer network: (using error δ i where x ji >0 ) Hidden ....... Layer(s), unit h Input layer Input, x BMVA Summer School 2016 Machine Learning Extra : 22

  23. Backpropagation Output vector, O k  Error is propogated backwards from network Output output .... Layer, unit k to weights of output layer .... over weights of all N Hidden ....... Layer(s), hidden layers unit h … Input layer  Hence the name: backpropagation Input, x BMVA Summer School 2016 Machine Learning Extra : 23

  24. Backpropagation  Will perform gradient descent over the weight space of {w ji } for all connections i → j in the network  Stochastic gradient descent – as updates based on training one sample at a time BMVA Summer School 2016 Machine Learning Extra : 24

  25. Future and current concepts This is beyond the scope this introductory tutorial but the following are recommended as good places to start:  Convolutional Neural Networks – http://deeplearning.net/tutorial/lenet.html  Deep Learning – http://www.deeplearning.net/tutorial/ BMVA Summer School 2016 Machine Learning Extra : 25

  26. Understanding (and believing) the SVM stuff …. BMVA Summer School 2016 Machine Learning Extra : 26

  27. 2D LINES REMINDER Remedial Note: equations of 2D lines  Line: Normal to line Offset from origin where: are 2D vectors. BMVA Summer School 2016 Machine Learning Extra : 27

  28. 2D LINES REMINDER Remedial Note: equations of 2D lines http://www.mathopenref.com/coordpointdisttrig.html BMVA Summer School 2016 Machine Learning Extra : 28

  29. 2D LINES REMINDER Remedial Note: equations of 2D lines  For a defined line equation:  Fixed  Insert point into equation …... Result is the distance (+ve or Result is +ve if -ve) of point from line given by: point on this side of line (i.e.> 0). Normal to line for: Result is -ve if point on this side of line. ( < 0 ) BMVA Summer School 2016 Machine Learning Extra : 29

  30. Linear Separator Classification of example function  Instances (i.e, examples) { x i , y i } y = +1 f(x) = y = {+1, -1} x i = point in instance space (R n ) made i.e. 2 classes – up of n attributes y i =class value for classification of x i –  Want a linear separator. Can view this as constraint satisfaction problem: y = -1  Equivalently, N.B. we have a vector of weights coefficients ⃗ w BMVA Summer School 2016 Machine Learning Extra : 30

Recommend


More recommend