Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm review 1
• Theory • PAC learning • Bias-variance tradeoff • Model selection • Methods • K-nearest neighbor • Naïve Bayes • Linear regression • Regularization • Logistic regression • Neural networks • SVM • Multi-class classification • Feature engineering • Boosting 2
Supervised learning 3
PAC Learnability 4
PAC Learnability 5
PAC Learnability 6
7
Generalization error bounds • Finite consistent hypothesis class • Finite inconsistent hypothesis class 8
Finite Consistent Hypothesis Class 9
Finite Inconsistent Hypothesis Class 10
Bias-variance tradeoff 11
12
13
14
15
16
• Theory • PAC learning • Bias-variance tradeoff • Model selection • Methods • K-nearest neighbor • Naïve Bayes • Linear regression • Regularization • Logistic regression • Neural networks • SVM • Multi-class classification • Feature engineering • Boosting 17
•Methods • Model • Algorithm 18
K-nearest neighbors 19
K-nearest neighbors 20
21
22
For text classification with Laplace smoothing: 23
24
Linear regression • Data are continuous inputs and outputs 25
Objective function (model) The objective function is called the residual sum of squares : 26
Probabilistic interpretation A discriminative model that assumes the response Gaussian with mean 27
Regularization 28
Prior Distribution 29
Prior Distribution • Lasso's prior peaked at 0 means expect many params to be zero • Ridge's prior flatter and fatter around 0 means we expect many coefficients to be smallish 30
31
32
Logistic regression 33
Neural networks 34
Gradient descent 35
Stochastic gradient descent 36
Forward algorithm 37
Backpropagation 38
Neural network techniques • Momentum • Dropout • Batch normalization • Weight initialization 39
Hard-margin SVM 40
Soft-margin SVM 41
KKT conditions 42
KKT conditions 43
SMO algorithm 44
Kernels 45
46
Feature engineering 47
Multi-class Classification • Reduction • One-against-all • All-pairs 48
One-against-all • Break k- class problem into k binary problems and solve separately • Combine predictions: evaluates all h’s, take the one with highest confidence 49
All-pairs • Break k- class problem into k(k-1)/2 binary problems and solve separately • Combine predictions: evaluates all h’s, take the one with highest sum confidence 50
51
Ensemble methods • Bagging • Train classifiers on subsets of data • Predict based on majority vote • Stacking • Take multiple classifiers’ outputs as inputs and train another classifier to make final prediction 52
Adaboost 53
Adaboost 54
Good luck! 55
Recommend
More recommend