supervised learning
play

Supervised Learning Part 1 Theory Sven Krippendorf Workshop on Big - PowerPoint PPT Presentation

Supervised Learning Part 1 Theory Sven Krippendorf Workshop on Big Data in String Theory Boston, 01.12.2017 Content Theory Applications: Mathematica Discussion Def: Supervised Learning Supervised learning is the machine


  1. Supervised Learning Part 1 — Theory Sven Krippendorf 
 Workshop on Big Data in String Theory Boston, 01.12.2017

  2. Content • Theory • Applications: Mathematica • Discussion

  3. Def: Supervised Learning Supervised learning is the machine learning task of inferring a function from labelled training data. Workflow: 1. Determine training examples 2. Prepare training set 3. How to represent the input object 4. How to represent the output object 5. Determine your algorithm 6. Run algorithm, adjust/determine parameters 7. Evaluate accuracy

  4. Learning algorithms • Support Vector Machines • Naive Bayes • Linear discriminant analysis • Decision trees • k-nearest neighbour algorithm • Neural networks

  5. Known issues • Bias-variance tradeo ff • Function complexity and amount of training data • Dimensionality of input space • Noise in output values • Heterogeneous data

  6. Examples • Geometric classification • Handwritten number recognition - the harmonic oscillator of ML • Voice recognition (spectral features)

  7. A 1 st problem Classify data into two classes: y Class 1: above line 10 Class 2: below line Input: data points 5 x - 10 - 5 5 10 - 5 - 10

  8. SVM Which line? y 10 5 x - 10 - 5 5 10 - 5 - 10

  9. 
 
 
 
 
 
 
 
 
 SVM • SVM (support vector machine) identify the lines maximally separating the data sets: 
 y 10 5 x - 10 - 5 5 10 w . x i − b ≥ 1 2 | w | - 5 w . x i − b ≤ − 1 - 10 • Useful to be stable against “perturbations”

  10. SVM • How are these lines determined? Minimisation with constraints, dual problem using Lagrange-multipliers. This problem then can be dealt with using quadratic programming algorithms. • These are readily implemented in standard environments (Mathematica, Matlab, Python, etc.) • In higher dimensions: plane, hyperplane

  11. 
 
 
 
 
 
 
 
 
 
 SVM: hard layer vs soft layer • Penalty for outliers, might be a better fit to data 
 y 10 5 x - 10 - 5 5 10 - 5 - 10

  12. 2nd problem y y 10 10 5 5 x x - 10 - 5 5 10 - 10 - 5 5 10 - 5 - 5 - 10 - 10

  13. SVM: Kernel trick Di ff erent representation of data via kernel map: y 10 y 10 5 5 x - 10 - 5 5 10 x - 10 - 5 5 10 - 5 - 5 - 10 - 10 { x, y } → { x 2 , y } { x, y } → { x 2 , y 2 } 6 100 5 80 4 60 3 2 40 1 20 0 20 40 60 80 20 40 60 80 100 - 1

  14. 
 Linear discriminant analysis • Use information about mean/variance of data set to distinguish classes. 
 ( x − µ 0 ) Σ − 1 0 ( x − µ 0 ) + log | Σ 0 | − ( x − µ 1 ) Σ − 1 1 ( x − µ 1 ) − log | Σ 1 | < threshold • Set threshold to identify class: # 6 500 4 400 2 300 - 6 - 4 - 2 2 4 6 200 - 2 100 - 4 - 6 0 T - 100 - 50 0 50 100

  15. k-nearest neighbour • Classify data according to data point and nearest neighbours. 10 8 8 10 6 6 4 5 4 2 2 - 10 - 5 5 10 - 6 - 4 - 2 2 4 6 - 6 - 4 - 2 2 4 6 - 2 - 2 - 5 - 4 more less noise noise k boundaries boundaries clear less clear

  16. Neural Network d, w, b d, w, b … output layer hidden layer input layer • d (data), w (weight), b (bias). Linear layer: w ij d j + b e d i • Softmax Layer: softmax( d i ) = j e d j P • Loss function, capturing how far the desired output is from true output.

  17. Hand-written number recognition 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 • Handwritten number recognition: • Input 28x28 matrix with entries {0,1} • Simple network, taking every entry as an input and using one layer (w.x+b), a success rate of 89% is achieved. • More sophisticated networks achieve incredible accuracy.

  18. 
 
 
 
 Voice recognition • A voice signal 
 0.5 - 0.5 0 1 2 3 4 5 6 7 • Representing it via wavelet transform (spectrogram) 0.5 0.4 0.3 0.2 0.1 0 0 100000 200000 300000

  19. String theory example: Dimers 1 2 1 � � 6 1 3 2 � � � 3 1 3 � � 2 1 � � 3 2 1 2 1 � � = X 23 Y 31 Z 12 − X 12 Y 31 Z 23 + X 36 Y 62 Z 23 − X 23 Y 62 Z 36 W dP 1 − X 36 Y 23 Z 12 Φ 61 + X 12 Y 23 Z 36 Φ 61 bounds on number of families (1002.1790):

  20. end of part 1

  21. Supervised Learning Part 2 — Applications Sven Krippendorf 
 Workshop on Big Data in String Theory Boston, 01.12.2017

  22. Disclaimer • There are many tools you can use… • I just talk about one at a very basic level: Mathematica 
 … simply because it’s quick for me and I assume people are familiar with it.

  23. Mathematica

  24. Mathematica • You need version 11.1.1 or later…

  25. Example 1: basic SVN • Let’s switch to notebook01.nb

  26. Example 2: kernel trick • let’s look at notebook02.nb

  27. Example 3: kernel trick • let’s look at notebook03.nb

  28. Thank you. Let’s discuss about applications…

  29. Supervised Learning Part 3 — Discussion Sven Krippendorf 
 Workshop on Big Data in String Theory Boston, 01.12.2017

Recommend


More recommend