Applied Machine Learning CIML Chaps 4-5 (A Geometric Approach) A - PowerPoint PPT Presentation

Applied Machine Learning CIML Chaps 4-5 (A Geometric Approach) “A ship in port is safe, but that is not what ships are for.” – Grace Hopper (1906-1992) Week 3: Extensions and Variations of Perceptron; Practical Issues and HW1 Professor Liang Huang some slides from A. Zisserman (Oxford)

Trivia: Grace Hopper and the first bug • Edison coined the term “bug” around 1878 and it had been widely used in engineering • Hopper was associated with the discovery of the first computer bug in 1947 which was a moth stuck in a relay Smithsonian National Museum of American History 2

Week 3: Perceptron in Practice • Problems with Perceptron “A ship in port is safe, but that is not what ships are for.” • doesn’t converge with inseparable data – Grace Hopper (1906-1992) • update might often be too “bold” • doesn’t optimize margin • result is sensitive to the order of examples • Ways to alleviate these problems (without SVM/kernels) • Part II: voted perceptron and average perceptron • Part III: MIRA (margin-infused relaxation algorithm) • Part IV: Practical Issues and HW1 • Part V: “Soft” Perceptron: Logistic Regression 3

Recap of Week 2 δ δ input: training data D x ⊕ ⊕ output: weights w x initialize w ← 0 u · x ≥ δ R while not converged u : k u k = 1 w 0 aa for ( x , y ) ∈ D aaaa if y ( w · x ) ≤ 0 aaaaaa w ← w + y x w “idealized” ML Input x Training Model w Output y “actual” ML deep learning ≈ representation learning Input x feature map ϕ Input x Model w Training Output y Model w Training Output y feature map ϕ 4

Python Demo (requires numpy and matplotlib) $ python perc_demo.py 5

Part II: Voted and Averaged Perceptron vanilla perceptron dev set error a v e r a g e d p e r c e p t r o n v o t e d p e r c e p t r o n 6

Voted/Avg. Perceptron Revives Perceptron batch 1997 +soft-margin Cortes/Vapnik SVM s l e n online approx. r e k + subgradient descent max margin +max margin 2007--2010* minibatch Singer group Pegasos minibatch online 2003 2006 Crammer/Singer Singer group c o n s e r v a t i v e u p d a t e s MIRA aggressive 1959 1962 1969* DEAD 1999 Rosenblatt Novikoff Minsky/Papert Freund/Schapire invention proof book killed it voted/avg: revived i n s e p a r a b l e c a s e 2002 2005* Collins McDonald/Crammer/Pereira structured MIRA structured *mentioned in lectures but optional (others papers all covered in detail) 7

Voted/Avged Perceptron • problem: later examples dominate earlier examples • solution: voted perceptron (Freund and Schapire, 1999) • record the weight vector after each example in D • not just after each update! • and vote on a new example using | D | models • shown to have better generalization power • averaged perceptron (from the same paper) • an approximation of voted perceptron • just use the average of all weight vectors • can be implemented efficiently 8

Voted Perceptron our notation: ( x (1) , y (1) ) v is weight, c is its # of votes r e c t , i n c r e a s e t h e i f c o r n t m o d e l ’ s # o f v o t e s ; c u r r e w i s e c r e a t e a n e w o t h e r l w i t h 1 v o t e m o d e 9

Experiments vanilla perceptron dev set error a v e r a g e d p e r c e p t r o n v o t e d p e r c e p t r o n 10

Averaged Perceptron • voted perceptron is not scalable • and does not output a single model • avg perceptron is an approximation of voted perceptron • actually, summing all weight vectors is enough; no need to divide initialize w ← 0; w s ← 0 while not converged w (1) = w (1) = ∆ w (1) aa for ( x , y ) ∈ D aaaa if y ( w · x ) ≤ 0 w (2) = w (2) = ∆ w (1) ∆ w (2) aaaaaa w ← w + y x aaaa w s ← w s + w w (3) = w (3) = ∆ w (1) ∆ w (2) ∆ w (3) output: summed weights w s w (4) = w (4) = ∆ w (1) ∆ w (2) ∆ w (3) ∆ w (4) after each example, not after each update! 11

Efficient Implementation of Averaging • naive implementation (running sum w s ) doesn’t scale • OK for low dim. (HW1); too slow for high-dim. (HW3) • very clever trick from Hal Daumé (2006, PhD thesis) ∆ w ( t ) w ( t ) initialize w ← 0; w a ← 0; c ← 0 while not converged w (1) = w (1) = ∆ w (1) aa for ( x , y ) ∈ D aaaa if y ( w · x ) ≤ 0 c w (2) = w (2) = ∆ w (1) ∆ w (2) aaaaaa w ← w + y x aaaaaa w a ← w a + cy x w (3) = w (3) = ∆ w (1) ∆ w (2) ∆ w (3) aaaa c ← c + 1 output: c w − w a w (4) = w (4) = ∆ w (1) ∆ w (2) ∆ w (3) ∆ w (4) after each update, not after each example! 12

Part III: MIRA • perceptron often makes bold updates (over-correction) • and sometimes too small updates (under-correction) • but hard to tune learning rate x • “just enough” update to correct the mistake? w 0 w 0 w + y � w · x x under-correction k x k 2 w easy to show: x ⊕ w 0 · x = ( w + y � w · x x ) · x = y perceptron w 0 1 k x k k x k 2 � w · x k x k w 0 MIRA margin-infused relaxation x · w k over-correction � x algorithm (MIRA) w k 1 13

Example: Perceptron under-correction x perceptron w 0 w 14

MIRA: just enough w 0 k w 0 � w k 2 min s.t. w 0 · x � 1 x w 0 MIRA minimal change to ensure functional margin of 1 (dot-product w ’ · x =1) k x k 1 perceptron w 0 MIRA ≈ 1-step SVM w functional margin: y ( w · x ) geometric margin: y ( w · x ) k w k 15

MIRA: functional vs geom. margin w 0 k w 0 � w k 2 min 1 = 0 x w · s.t. w 0 · x � 1 x k w 0 k w 0 MIRA minimal change to ensure 1 functional margin of 1 w 0 · x = 0 (dot-product w ’ · x =1) MIRA ≈ 1-step SVM w functional margin: y ( w · x ) geometric margin: y ( w · x ) k w k 16

Optional: Aggressive MIRA · x = 1 0 w 7 0 = w 0 x . · k w 0 k 1 w 0 0 = x 0 x w · • aggressive version of MIRA • also update if correct but not confident enough • i.e., functional margin ( y w · x ) not big enough • p- aggressive MIRA: update if y ( w · x ) < p (0<= p <1) • MIRA is a special case with p= 0: only update if misclassified! • update equation is same as MIRA • i.e., after update, functional margin becomes 1 • larger p leads to a larger geometric margin but slower convergence 17

Demo 18

Demo 19

Part IV: Practical Issues and HW1 “A ship in port is safe, but that is not what ships are for.” – Grace Hopper (1906-1992) • you will build your own linear classifiers for HW1 data 20

HW1: Adult Income >50K? training/dev sets: Age, Sector, Education, Marital_Status, Occupation, Race, Sex, Hours, Country, Target 40, Private, Doctorate, Married-civ-spouse, Prof-specialty, White, Female, 60, United-States, >50K 44, Local-gov, Some-college, Married-civ-spouse, Exec-managerial, Black, Male, 38, United-States, >50K 55, Private, HS-grad, Divorced, Sales, White, Male, 40, England, <=50K test data (semi-blind): 30, Private, Assoc-voc, Married-civ-spouse, Tech-support, White, Female, 40, Canada, ??? • 2 numerical features: age and hours-per-week • option 1: keep them as numerical features • but is older and more hours always better? • option 2: (better) treat them as binary features • e.g., age=22, hours=38, ... • 7 categorical features: convert to binary features • country, race, occupation, etc. • e.g., country=United_States, education=Doctorate,... • perceptron: ~19% dev error, avg. perceptron: ~15% dev error 21

Interesting Facts in HW1 Data • only ~25% positive (>50K); data was from 1994 (~$27K per capita) • education is probably the single most important factor • education=Doctorate is extremely positive (80%) • education=Prof-school is also very positive (75%) • education=Masters is also positive (55%) • education=9th (high school dropout) is extremely negative (6%) • “married” is good (45%), “never married” is extremely bad (5%) • “self-emp-inc” is the best sector (59%), but “self-emp-not-inc” 30% • hours-per-week=1 is 100% positive; country=Iran is 70% positive • exec-managerial and prof-specialty are best occupations (48% / 46%) • interesting combinations (e.g. “edu=Doc and sector=self-emp-inc”: 100%) 22

Applied Machine Learning CIML Chaps 4-5 (A Geometric Approach) A - PowerPoint PPT Presentation

Applied Machine Learning CIML Chaps 4-5 (A Geometric Approach) A ship in port is safe, but that is not what ships are for. Grace Hopper (1906-1992) Week 3: Extensions and Variations of Perceptron; Practical Issues and HW1 Professor

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Contact

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introductory Applied Machine Learning The primary aim of the course is to provide the student with

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Slides and

MACHINE LEARNING Linear and Weighted Regression Support Vector Regression 1 APPLIED MACHINE

How Developers Iterate on Machine Learning Workflows -- A Survey of the Applied Machine Learning

APPLIED MACHINE LEARNING Methods for Reduction of Dimensionality through Linear Projection

APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1 APPLIED

Responsible Machine Learning INFO-4604, Applied Machine Learning University of Colorado Boulder

Machine Learning Prof. Kuan-Ting Lai 2020/4/11 Applied Math for Machine Learning Linear

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Convolutional Neural Networks Siamak

Applied Machine Learning Applied Machine Learning Convolutional Neural Networks Siamak

Applied Machine Learning Applied Machine Learning Linear Regression Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Perceptron and Support Vector Machines Siamak

Applied Machine Learning Applied Machine Learning Gradient Computation & Automatic

Applied Machine Learning Applied Machine Learning Gradient Computation & Automatic

Applied Machine Learning Introduction Siamak Ravanbakhsh COMP 551 (fall 2020) Objectives a

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Syllabus and logistics Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Naive Bayes Siamak Ravanbakhsh Siamak

Applied Machine Learning CIML Chaps 4-5 (A Geometric Approach) A - PowerPoint PPT Presentation

Applied Machine Learning CIML Chaps 4-5 (A Geometric Approach) A ship in port is safe, but that is not what ships are for. Grace Hopper (1906-1992) Week 3: Extensions and Variations of Perceptron; Practical Issues and HW1 Professor

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Contact

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introductory Applied Machine Learning The primary aim of the course is to provide the student with

Applied Machine Learning Introduction 1 APPLIED MACHINE LEARNING Practicalities Slides and

MACHINE LEARNING Linear and Weighted Regression Support Vector Regression 1 APPLIED MACHINE

How Developers Iterate on Machine Learning Workflows -- A Survey of the Applied Machine Learning

APPLIED MACHINE LEARNING Methods for Reduction of Dimensionality through Linear Projection

APPLIED MACHINE LEARNING Probability Density Functions Gaussian Mixture Models 1 APPLIED

Responsible Machine Learning INFO-4604, Applied Machine Learning University of Colorado Boulder

Machine Learning Prof. Kuan-Ting Lai 2020/4/11 Applied Math for Machine Learning Linear

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Regularization Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Convolutional Neural Networks Siamak

Applied Machine Learning Applied Machine Learning Convolutional Neural Networks Siamak

Applied Machine Learning Applied Machine Learning Linear Regression Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Perceptron and Support Vector Machines Siamak

Applied Machine Learning Applied Machine Learning Gradient Computation &amp; Automatic

Applied Machine Learning Applied Machine Learning Gradient Computation &amp; Automatic

Applied Machine Learning Introduction Siamak Ravanbakhsh COMP 551 (fall 2020) Objectives a

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Some basic concepts Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Syllabus and logistics Siamak Ravanbakhsh

Applied Machine Learning Applied Machine Learning Naive Bayes Siamak Ravanbakhsh Siamak

Applied Machine Learning Applied Machine Learning Gradient Computation & Automatic

Applied Machine Learning Applied Machine Learning Gradient Computation & Automatic