svm flexible discriminant analysis
play

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 - PowerPoint PPT Presentation

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM = Penalization method discriminant analysis FDA: flexible discriminant analysis penalized discriminant analysis mixture discriminant analysis


  1. SVM-flexible discriminant analysis Huimin Peng November 20, 2014

  2. Outline SVM Nonlinear SVM = Penalization method discriminant analysis FDA: flexible discriminant analysis penalized discriminant analysis mixture discriminant analysis

  3. Reference: Elements of statistical learning chapter 12 SVM Define a hyperplane that separates the observations: { x : f ( x ) = x T β + β 0 = 0 } . The optimization problem is N 1 2 � β � 2 + C � min ξ i β,β 0 i =1 subject to ξ i ≥ 0 , y i ( x T i β + β 0 ) ≥ 1 − ξ i ∀ i, where � ξ i ≤ C and ξ i is the proportion of wrong predictions. C is the tunning parameter. Large C will reduce positive ξ i and lead to a wiggly boundary. Small C will lead to a smoother boundary. Huimin Peng | NCSU — Department of Statistics 3/22

  4. Reference: Elements of statistical learning chapter 12 The solution is N ˆ � β = α i y i x i . ˆ i =1 The decision function is f ( x )] = sign [ x T ˆ G ( x ) = sign [ ˆ ˆ β + ˆ β 0 ] . Huimin Peng | NCSU — Department of Statistics 4/22

  5. Reference: Elements of statistical learning chapter 12 Nonlinear Input features: (transformed feature vectors) h ( x i ) = ( h 1 ( x i ) , h 2 ( x i ) , · · · , h M ( x i )) . Similarly, the classifier: h ( x ) T ˆ � � G ( x ) = sign ( ˆ ˆ β + ˆ f ( x )) = sign β 0 . The solution function is N ˆ α i y i K ( x, x i ) + ˆ � f ( x ) = ˆ β 0 . i =1 Huimin Peng | NCSU — Department of Statistics 5/22

  6. Reference: Elements of statistical learning chapter 12 SVM = Penalization method If f ( x ) = h ( x ) T β + β 0 , then N [1 − y i f ( x i )] + + λ � 2 � β � 2 . min (1) β 0 ,β i =1 This is the loss + penalty function. It provides the same solution as N 1 2 � β � 2 + C � min ξ i β,β 0 i =1 subject to ξ i ≥ 0 , y i ( x T i β + β 0 ) ≥ 1 − ξ i ∀ i, Huimin Peng | NCSU — Department of Statistics 6/22

  7. Reference: Elements of statistical learning chapter 12 discriminant analysis LDA: linear discriminant analysis QDA: quadratic discriminant analysis FDA: flexible discriminant analysis PDA: penalized discriminant analysis MDA: mixture discriminant analysis R package: mda classes: G = { 1 , 2 , · · · , K } . K classes. score function: θ : G �→ R 1 . Huimin Peng | NCSU — Department of Statistics 7/22

  8. Reference: Elements of statistical learning chapter 12 assign scores to the classes. training data: ( g i , x i ) , i = 1 , 2 , · · · , N . optimization: N � ( θ ( g i ) − x T i β ) 2 . min β,θ i =1 Huimin Peng | NCSU — Department of Statistics 8/22

  9. Reference: Elements of statistical learning chapter 12 FDA: flexible discriminant analysis More generally, build L ≤ K − 1 sets of independent scorings for the class labels, θ 1 , θ 2 , · · · , θ L , and L corresponds to the linear maps η l ( X ) = X T β l , l = 1 , 2 , · · · , L. We can generalize η l ( x ) = x T β l to be more flexible, nonparametric fits and add a J as a regularizer appropriate for some forms of nonparametric regression. � N L � l =1 ) = 1 ( θ l ( g i ) − η l ( x i )) 2 + λJ ( η l ) � � ASR ( { θ l , η l } L . N l =1 i =1 fda(formula, data, weights, theta, dimension, eps, method, keep.fitted, ...) Huimin Peng | NCSU — Department of Statistics 9/22

  10. Reference: Elements of statistical learning chapter 12 Huimin Peng | NCSU — Department of Statistics 10/22

  11. Reference: Elements of statistical learning chapter 12 data(iris) irisfit <- fda(Species ~ ., data = iris) confusion(irisfit, iris) confusion(predict(irisfit, iris), iris$Species) plot(irisfit) coef(irisfit) posteriors <- predict(irisfit, type = "post") confusion(softmax(posteriors), iris[, "Species"]) marsfit <- fda(Species ~ ., data = iris, method = mars) marsfit2 <- update(marsfit, degree = 2) #include interactions up to 2nd degree marsfit3 <- update(marsfit, theta = marsfit$means[, 1:2]) #start from the fitted coef’s in marsfit Huimin Peng | NCSU — Department of Statistics 11/22

  12. Reference: Elements of statistical learning chapter 12 > coef(irisfit) [,1] [,2] Intercept -2.1264786 -6.72910343 Sepal.Length -0.8377979 0.02434685 Sepal.Width -1.5500519 2.18649663 Petal.Length 2.2235596 -0.94138258 Petal.Width 2.8389936 2.86801283 Huimin Peng | NCSU — Department of Statistics 12/22

  13. Reference: Elements of statistical learning chapter 12 Figure 1: plot(irisfit) Huimin Peng | NCSU — Department of Statistics 13/22

  14. Reference: Elements of statistical learning chapter 12 Figure 2: plot(marsfit) Huimin Peng | NCSU — Department of Statistics 14/22

  15. Reference: Elements of statistical learning chapter 12 Figure 3: plot(marsfit1) Huimin Peng | NCSU — Department of Statistics 15/22

  16. Reference: Elements of statistical learning chapter 12 Figure 4: plot(marsfit2) Huimin Peng | NCSU — Department of Statistics 16/22

  17. Reference: Elements of statistical learning chapter 12 penalized discriminant analysis Quadratic penalty on the coefficients: � N L � l =1 ) = 1 ( θ l ( g i ) − h T ( x i ) β l ) 2 + λβ T � � ASR ( { θ l , β l } L l Ω β l . N l =1 i =1 The choice of Ω depends on the problem setting. η l ( x ) = h ( x ) T β l . gen.ridge(x, y, weights, lambda=1, omega, df, ...) Huimin Peng | NCSU — Department of Statistics 17/22

  18. Reference: Elements of statistical learning chapter 12 mixture discriminant analysis A Gaussian mixture model for the kth class has the density R k � P ( X | G = k ) = π kr φ ( X ; µ kr , Σ) , (2) r =1 where � R k r =1 π kr = 1 , R k is the number of points in class k. Incorporating the class prior probabilities Π k : � R k r =1 π kr φ ( X ; µ kr , Σ)Π k P ( G = k | X = x ) = . � K � R l r =1 π lr φ ( X ; µ lr , Σ)Π l l =1 mda(formula, data, subclasses, sub.df, tot.df, dimension, eps, iter, weights, method, keep.fitted, trace, ...) Huimin Peng | NCSU — Department of Statistics 18/22

  19. Reference: Elements of statistical learning chapter 12 Huimin Peng | NCSU — Department of Statistics 19/22

  20. Reference: Elements of statistical learning chapter 12 data(iris) irisfit <- mda(Species ~ ., data = iris) mfit=mda(Species~.,data=iris,subclass=2) coef(mfit) > coef(mfit) [,1] [,2] [,3] [,4] Intercept 6.8563935 -15.1565801 -1.454555 -2.535648 Sepal.Length 0.5545477 1.3506122 1.016966 2.945456 Sepal.Width 1.5867703 2.4658435 -1.345301 -2.562105 Petal.Length -3.2435199 0.3621319 1.341652 -2.921295 Petal.Width -2.3003933 -1.3635028 -4.516518 3.448416 Huimin Peng | NCSU — Department of Statistics 20/22

  21. Reference: Elements of statistical learning chapter 12 Figure 5: plot(mfit) Huimin Peng | NCSU — Department of Statistics 21/22

  22. Questions?

Recommend


More recommend