Supervised Learning Part 1 — Theory Sven Krippendorf Workshop on Big Data in String Theory Boston, 01.12.2017
Content • Theory • Applications: Mathematica • Discussion
Def: Supervised Learning Supervised learning is the machine learning task of inferring a function from labelled training data. Workflow: 1. Determine training examples 2. Prepare training set 3. How to represent the input object 4. How to represent the output object 5. Determine your algorithm 6. Run algorithm, adjust/determine parameters 7. Evaluate accuracy
Learning algorithms • Support Vector Machines • Naive Bayes • Linear discriminant analysis • Decision trees • k-nearest neighbour algorithm • Neural networks
Known issues • Bias-variance tradeo ff • Function complexity and amount of training data • Dimensionality of input space • Noise in output values • Heterogeneous data
Examples • Geometric classification • Handwritten number recognition - the harmonic oscillator of ML • Voice recognition (spectral features)
A 1 st problem Classify data into two classes: y Class 1: above line 10 Class 2: below line Input: data points 5 x - 10 - 5 5 10 - 5 - 10
SVM Which line? y 10 5 x - 10 - 5 5 10 - 5 - 10
SVM • SVM (support vector machine) identify the lines maximally separating the data sets: y 10 5 x - 10 - 5 5 10 w . x i − b ≥ 1 2 | w | - 5 w . x i − b ≤ − 1 - 10 • Useful to be stable against “perturbations”
SVM • How are these lines determined? Minimisation with constraints, dual problem using Lagrange-multipliers. This problem then can be dealt with using quadratic programming algorithms. • These are readily implemented in standard environments (Mathematica, Matlab, Python, etc.) • In higher dimensions: plane, hyperplane
SVM: hard layer vs soft layer • Penalty for outliers, might be a better fit to data y 10 5 x - 10 - 5 5 10 - 5 - 10
2nd problem y y 10 10 5 5 x x - 10 - 5 5 10 - 10 - 5 5 10 - 5 - 5 - 10 - 10
SVM: Kernel trick Di ff erent representation of data via kernel map: y 10 y 10 5 5 x - 10 - 5 5 10 x - 10 - 5 5 10 - 5 - 5 - 10 - 10 { x, y } → { x 2 , y } { x, y } → { x 2 , y 2 } 6 100 5 80 4 60 3 2 40 1 20 0 20 40 60 80 20 40 60 80 100 - 1
Linear discriminant analysis • Use information about mean/variance of data set to distinguish classes. ( x − µ 0 ) Σ − 1 0 ( x − µ 0 ) + log | Σ 0 | − ( x − µ 1 ) Σ − 1 1 ( x − µ 1 ) − log | Σ 1 | < threshold • Set threshold to identify class: # 6 500 4 400 2 300 - 6 - 4 - 2 2 4 6 200 - 2 100 - 4 - 6 0 T - 100 - 50 0 50 100
k-nearest neighbour • Classify data according to data point and nearest neighbours. 10 8 8 10 6 6 4 5 4 2 2 - 10 - 5 5 10 - 6 - 4 - 2 2 4 6 - 6 - 4 - 2 2 4 6 - 2 - 2 - 5 - 4 more less noise noise k boundaries boundaries clear less clear
Neural Network d, w, b d, w, b … output layer hidden layer input layer • d (data), w (weight), b (bias). Linear layer: w ij d j + b e d i • Softmax Layer: softmax( d i ) = j e d j P • Loss function, capturing how far the desired output is from true output.
Hand-written number recognition 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 • Handwritten number recognition: • Input 28x28 matrix with entries {0,1} • Simple network, taking every entry as an input and using one layer (w.x+b), a success rate of 89% is achieved. • More sophisticated networks achieve incredible accuracy.
Voice recognition • A voice signal 0.5 - 0.5 0 1 2 3 4 5 6 7 • Representing it via wavelet transform (spectrogram) 0.5 0.4 0.3 0.2 0.1 0 0 100000 200000 300000
String theory example: Dimers 1 2 1 � � 6 1 3 2 � � � 3 1 3 � � 2 1 � � 3 2 1 2 1 � � = X 23 Y 31 Z 12 − X 12 Y 31 Z 23 + X 36 Y 62 Z 23 − X 23 Y 62 Z 36 W dP 1 − X 36 Y 23 Z 12 Φ 61 + X 12 Y 23 Z 36 Φ 61 bounds on number of families (1002.1790):
end of part 1
Supervised Learning Part 2 — Applications Sven Krippendorf Workshop on Big Data in String Theory Boston, 01.12.2017
Disclaimer • There are many tools you can use… • I just talk about one at a very basic level: Mathematica … simply because it’s quick for me and I assume people are familiar with it.
Mathematica
Mathematica • You need version 11.1.1 or later…
Example 1: basic SVN • Let’s switch to notebook01.nb
Example 2: kernel trick • let’s look at notebook02.nb
Example 3: kernel trick • let’s look at notebook03.nb
Thank you. Let’s discuss about applications…
Supervised Learning Part 3 — Discussion Sven Krippendorf Workshop on Big Data in String Theory Boston, 01.12.2017
Recommend
More recommend