linear classifiers and perceptron
play

Linear Classifiers and Perceptron E.D. And F. MAN TO BUY - PowerPoint PPT Presentation

Text Classification Linear Classifiers and Perceptron E.D. And F. MAN TO BUY INTO HONG KONG FIRM CS678 Advanced Topics in Machine Learning The U.K. Based commodity


  1. Text � Classification Linear � Classifiers � and � Perceptron E.D. � And � F. � MAN � TO � BUY � INTO � HONG � KONG � FIRM CS678 � Advanced � Topics � in � Machine � Learning The � U.K. � Based � commodity � house � E.D. � And � F. � Man � Thorsten � Joachims Ltd � and � Singapore’ s � Yeo � Hiap � Seng � Ltd � jointly � announced � that � Man � will � buy � a � substantial � stake � in � Spring � 2003 Yeo’ s � 71.1 � pct � held � unit, � Yeo � Hiap � Seng � Enterprises � Ltd. � Man � will � develop � the � locally � listed � soft � drinks � manufacturer � into � a � securities � and � commodities � brokerage � arm � and � will � rename � the � firm � Man � Pacific � Outline: (Holdings) � Ltd. • Linear � classifiers • Example: � text � classification About � a � corportate � acquisition? • Perceptron � learning � algorithm • Mistake � bound � for � Perceptron JA NEIN • Separation � margin • Dual � representation Learning � Text � Classifiers Generative � vs. � Discriminative � Training Process: Real-World Process • Generator: � Generates � descriptions � � according � to � distribution � . x P x ( ) Training � Set New � Documents • Teacher: � Assigns � a � value � � to � each � description � � based � on �� . y x P y x ( ) N y => � Training � examples � x 1 y 1 x n y n P x y x i { , 1 – 1 } ( , ) … , , ( , ) ∼ ( , ) ∈ ℜ ∈ i Classifier Learner Generative � Training Discriminative � Training • make � assumptions � about � the � • make � assumptions � about � the � parametric � form � of � . set � H � of � classifiers P x y ( , ) Goal: � • estimate � the � parameters � of � • estimate � error � of � classifiers � in � • Learner � uses � training � set � to � find � classifier � with � low � prediction � error. � from � the � training � data H � from � the � training � data P x y ( , ) • derive � optimal � classifier � using � • select � classifier � with � lowest � Bayes’ � rule error � rate • example: � naive � Bayes • example: � SVM, � decision � tree

  2. Representing � Text � as � Attribute � Vectors Linear � Classifiers � (Example) Attributes: � Words � Text � Classification: � Physics � (+1) � versus � Receipes � (-1) 0 baseball (Word-Stems) nuclear � atom � salt pepper � water heat � and 3 specs ID y (x 1 ) (x 2 ) � (x 3 ) (x 4 ) � (x 5 ) (x 6 ) (x 7 ) From: xxx@sciences.sdsu.edu 0 graphics Newsgroups: comp.graphics D1 1 2 0 0 2 0 2 +1 Subject: Need specs on Apple QT 1 references Values: � Occurrence- I need to get the specs, or at least a 0 hockey D2 0 0 0 3 0 1 1 -1 Frequencies very verbose interpretation of the specs, 0 car for QuickTime. Technical articles from D3 0 2 1 0 0 0 3 +1 magazines and references to books would 0 clinton be nice, too. D4 0 0 1 1 1 1 1 -1 . . I also need the specs in a fromat usable . on a Unix or MS-Dos system. I can’t do much with the QuickTime stuff they 1 unix w,b 2 3 -1 -3 -1 -1 0 b=1 have on ... 0 space 7 2 quicktime � D1: � w i x i + b = 2 1 + 3 2 + – 1 ) 0 + – 3 ) 0 + – 1 ) 2 + – 1 ) 0 + 0 2 ] 1 + [ ⋅ ⋅ ( ⋅ ( ⋅ ( ⋅ ( ⋅ ⋅ 0 computer i = 1 7 � ==> � The � ordering � of � words � is � ignored! D2: � w i x i + b = 2 0 + 3 0 + – 1 ) 0 + – 3 ) 3 + – 1 ) 0 + – 1 ) 1 + 0 1 ] 1 + [ ⋅ ⋅ ( ⋅ ( ⋅ ( ⋅ ( ⋅ ⋅ i = 1 Linear � Classifiers Perceptron � (Rosenblatt) N y Rules � of � the � Form: � weight � vector � , � threshold � Input: � � � (linear � separable) w b S = x 1 y 1 x n y n x i { , 1 – 1 } ( , ) … , , ( , ) ∈ ℜ ∈ i � w 0 0 b 0 ; 0 k ; 0 ← ← ← N • � N � � R = max i x i 1 if w i x i + b 0 � > • h x = sign w i x i + b = � ( ) • repeat � i = 1 � i = 1 • for � i=1 � to � n – 1 else � • if � y i w k x i + b k 0 ( ⋅ ) ≤ Geometric � Interpretation � (Hyperplane): w k w k + η y i x i ← + 1 • η y i R 2 b k b k + ← • + 1 w k k = 1 ← • • endif b • endfor • until � no � mistakes � made � in � the � for � loop • return � w k b k ( , )

  3. � Analysis � of � Perceptron Dual � Perceptron Definition � (Margin � of � an � Example): � The � margin � of � an � example � • For � each � example � , � count � with � � the � number � of � times � the � x i y i x i y i ( , ) ( , ) α i with � respect � to � the � hyperplane � � is perceptron � algorithm � makes � a � mistake � on � it. � Then w b ( , ) n � w = α i y i x i = y i w x i + b δ i ( ⋅ ) i = 1 � and � = 0 ; b 0 0 R = max i x i α ← • Definition � (Margin � of � an � Example): � The � margin � of � a � training � set � • repeat � with � respect � to � the � hyperplane � � is � � S = x 1 y 1 x n y n w b ( , ) … , , ( , ) ( , ) • for � i=1 � to � n � � = min i y i w x i + b n δ ( ⋅ ) � � � • if � y i α i y i x j x i + b k 0 ( ⋅ ) ≤ � � � � j = 1 Theorem � (Novikoff): � If � for � a � training � set � S � there � exists � a � weight � vector � + 1 α i ← α i • with � margin � , � then � the � perceptron � makes � at � most y i R 2 δ b b + ← • 4 R 2 � � • endif � � - - - - - - • endfor � � δ 2 • until � no � mistakes � made � in � the � for � loop mistakes � before � returning � a � separating � hyperplane. • return � α b ( , ) � Experiment: � Perceptron � for � Text � Classification Perceptron with eta=0.1 30 "perceptron_iter_trainerror.dat" "perceptron_iter_testerror.dat" 25 hard_margin_svm_testerror.dat Percent Training/Testing Errors 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 Iterations Train � on � 1000 � pos � / � 1000 � neg � examples � for � “ acq” � (Reuters-21578).

Recommend


More recommend