1 SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wolfgang H¨ ARDLE 2 Rouslan MORO 1 , 2 Dorothea SCH¨ AFER 1 1 Deutsches Institut f¨ ur Wirtschafts- forschung (DIW) 2 Center for Applied Statistics and Eco- nomics (CASE), Humboldt-Universit¨ at zu Berlin Corporate Bankruptcy Prediction with SVMs
Motivation 2 Linear Discriminant Analysis Fisher (1936); company scoring: Beaver (1966), Altman (1968) Z-score: Z i = a 1 x i 1 + a 2 x i 2 + ... + a d x id = a ⊤ x i , where x i = ( x i 1 , ..., x id ) ⊤ ∈ R d are financial ratios for the i -th company. successful company: Z i ≥ z The classification rule: failure: Z i < z Corporate Bankruptcy Prediction with SVMs
Motivation 3 Linear Discriminant Analysis o� o� X� o� o� Surviving� 2� o� o� companies� o� o� o� o� o� o� o� o� x� o� x� o� o� x� o� x� o� o� o� o� o� x� x� o� o� o� x� o� x� x� o� o� o� x� o� x� x� o� o� x� x� x� x� o� o� o� x� x� o� x� x� x� o� x� x� x� x� x� x� x� x� x� x� Failing� x� companies� ?� x� x� X� 1� Corporate Bankruptcy Prediction with SVMs
Motivation 4 Linear Discriminant Analysis Failing� Surviving� companies� companies� Distribution density� Z� Score� Corporate Bankruptcy Prediction with SVMs
Motivation 5 Company Data: Probability of Default Source: Falkenstein et al. (2000) Corporate Bankruptcy Prediction with SVMs
Motivation 6 RiskCalc Private Model Moody’s default model for private firms A semi-parametric model based on the probit regression d � E [ y i | x i ] = Φ { a 0 + a j f j ( x ij ) } j =1 f j are estimated non-parametrically on univariate models Corporate Bankruptcy Prediction with SVMs
Motivation 7 Linearly Non-separable Classification Problem o� X� 2� 3� 1� 2� o� o� o� o� o� o� o� x� x� Surviving� o� o� o� x� companies� o� o� x� o� x� o� o� x� x� o� x� o� o� x� o� x� o� o� o� o� x� x� o� x� x� x� o� x� x� o� o� x� o� x� o� o� o� x� x� o� x� o� x� o� x� x� x� x� o� x� x� x� Failing� o� o� x� companies� x� X� 1� Corporate Bankruptcy Prediction with SVMs
Outline of the Talk 8 Outline 1. Motivation � 2. Support Vector Machines and their Properties 3. Expected Risk vs. Empirical Risk Minimization 4. Realization of an SVM 5. Non-linear Case 6. Company Classification and Rating with SVMs Corporate Bankruptcy Prediction with SVMs
Support Vector Machines and Their Properties 9 Support Vector Machines (SVMs) SVMs are a group of methods for classification (and regression) that make use of classifiers providing “high margin”. ⊡ SVMs possess a flexible structure which is not chosen a priori ⊡ The properties of SVMs can be derived from statistical learning theory ⊡ SVMs do not rely on asymptotic properties; they are especially useful when d/n is big, i.e. in most practically significant cases ⊡ SVMs give a unique solution and outperform Neural Networks Corporate Bankruptcy Prediction with SVMs
Support Vector Machines and Their Properties 10 Classification Problem Training set : { ( x i , y i ) } n i =1 with the distribution P ( x i , y i ) . Find the class y of a new object x using the classifier f : R d �→ { +1; − 1 } , such that the expected risk R ( f ) is minimal . x i ∈ R d is the vector of the i -th object characteristics; y i ∈ {− 1; +1 } or { 0; 1 } is the class of the i -th object. Regression Problem Setup as for the classification problem but: y ∈ R Corporate Bankruptcy Prediction with SVMs
Expected Risk vs. Empirical Risk Minimization 11 Expected Risk Minimization Expected risk � 1 R ( f ) = 2 | f ( x ) − y | dP ( x, y ) = E P ( x,y ) [ L ( x, y )] is minimized wrt f : f opt = arg min f ∈F R ( f ) 0 , if classification is correct, L ( x, y ) = 1 2 | f ( x ) − y | = 1 , if classification is wrong. F is an a priori defined set of (non)linear classifier functions Corporate Bankruptcy Prediction with SVMs
Expected Risk vs. Empirical Risk Minimization 12 Empirical Risk Minimization In practice P ( x, y ) is usually unknown : use Empirical Risk n � R ( f ) = 1 1 ˆ 2 | f ( x i ) − y i | n i =1 Minimization (ERM) over the training set { ( x i , y i ) } n i =1 ˆ ˆ f n = arg min R ( f ) f ∈F Corporate Bankruptcy Prediction with SVMs
Expected Risk vs. Empirical Risk Minimization 13 Empirical Risk vs. Expected Risk Risk� ˆ� R� R� ˆ� R� (f)� � R� (f)� � ˆ� Function class� f� f� f� opt� n� Corporate Bankruptcy Prediction with SVMs
Expected Risk vs. Empirical Risk Minimization 14 Convergence From the law of large numbers ˆ lim R ( f ) = R ( f ) n →∞ In addition ERM satisfies ˆ n →∞ min lim R ( f ) = min f ∈F R ( f ) f ∈F if “ F is not too big”. Corporate Bankruptcy Prediction with SVMs
Expected Risk vs. Empirical Risk Minimization 15 Vapnik-Chervonenkis (VC) Bound Basic result of Statistical Learning Theory (for linear classifiers): � h � n, ln( η ) R ( f ) ≤ ˆ R ( f ) + φ n where the bound holds with probability 1 − η and � � h � h + 1) − ln( η h (ln 2 n 4 ) n, ln ( η ) φ = n n Corporate Bankruptcy Prediction with SVMs
Expected Risk vs. Empirical Risk Minimization 16 Structural Risk Minimization Structural Risk Minimization – search for the model structure S h , S h 1 ⊆ S h 2 ⊆ . . . ⊆ S h ⊆ . . . ⊆ S hk ⊆ F , such that f ∈ S h minimizes the expected risk upper bound. h is VC dimension . S h is a set of classifier functions with the same complexity described by h , e.g. P (1) ⊆ P (2) ⊆ P (3) ⊆ . . . ⊆ F , where P ( i ) are polynomials of degree i . The functional class F is given a priori Corporate Bankruptcy Prediction with SVMs
Expected Risk vs. Empirical Risk Minimization 17 Vapnik-Chervonenkis (VC) Dimension Definition . h is VC dimension of a set of functions if there exists a set of points { x i } h i =1 such that these points can be separated in all 2 h possible configurations, and no set { x i } q i =1 exists where q > h satisfies this property. Example 1 . The functions f = A sin θx have an infinite VC dimension. Example 2 . Three points on a plane can be shattered by a set of linear indicator functions in 2 h = 2 3 = 8 ways (whereas 4 points cannot be shattered in 2 q = 2 4 = 16 ways). The VC dimension equals h = 3 . Example 3 . The VC dimension of f = { Hyperplane ∈ R d } is h = d + 1 . Corporate Bankruptcy Prediction with SVMs
Expected Risk vs. Empirical Risk Minimization 18 VC Dimension (d=2, h=3) Corporate Bankruptcy Prediction with SVMs
Realization of the SVM 19 Linearly Separable Case The training set: { ( x i , y i ) } n i =1 , y i = { +1; − 1 } , x i ∈ R d . Find the classifier with the highest “margin” – the gap between parallel hyperplanes separating two classes where the vectors of neither class can lie. Margin maximization minimizes the VC dimension. o� o� x� o� x� o� x� o� o� x� x� o� o� o� x� x� o� x� x� x� Corporate Bankruptcy Prediction with SVMs
Realization of the SVM 20 Linear SVMs. Separable Case The margin is d + + d − = 2 / � w � . To maximize it minimize the Euclidean norm � w � subject to the constraint (1). x� T� w+b=0� x� 2� margin� o� T� w+b=-1� o� x� o� x� o� o� x� o� x� x� o� w� � b x� - o� � | w | o� x� x� o� x� --� d� +� x� � x� d T� w+b=1� x� 0� x� 1� Corporate Bankruptcy Prediction with SVMs
Realization of the SVM 21 Let x ⊤ w + b = 0 be a separating hyperplane. Then d + ( d − ) will be the shortest distance to the closest objects from the classes +1 ( − 1) . x ⊤ i w + b ≥ +1 for y i = +1 x ⊤ i w + b ≤ − 1 for y i = − 1 combine them into one constraint y i ( x ⊤ i w + b ) − 1 ≥ 0 i = 1 , 2 , ..., n (1) The canonical hyperplanes x ⊤ i w + b = ± 1 are parallel and the distance between each of them and the separating hyperplane is d + = d − = 1 / � w � . Corporate Bankruptcy Prediction with SVMs
Realization of the SVM 22 The Lagrangian Formulation The Lagrangian for the primal problem n � L P = 1 2 � w � 2 − α i { y i ( x ⊤ i w + b ) − 1 } i =1 The Karush-Kuhn-Tucker (KKT) Conditions � n ∂L P ∂w k = 0 ⇔ i =1 α i y i x ik = 0 k = 1 , ..., d � n ∂L P ∂b = 0 ⇔ i =1 α i y i = 0 y i ( x ⊤ i w + b ) − 1 ≥ 0 i = 1 , ..., n α i ≥ 0 α i { y i ( x ⊤ i w + b ) − 1 } = 0 Corporate Bankruptcy Prediction with SVMs
Realization of the SVM 23 Substitute the KKT conditions into L P and obtain the Lagrangian for the dual problem n n n � � � α i − 1 α i α j y i y j x ⊤ L D = i x j 2 i =1 i =1 j =1 The primal and dual problems are min w k ,b max α i L P max α i L D s.t. n � α i ≥ 0 α i y i = 0 i =1 Since the optimization problem is convex the dual and primal formulations give the same solution. Corporate Bankruptcy Prediction with SVMs
Recommend
More recommend