SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wolfgang H ARDLE 2 - PowerPoint PPT Presentation

1 SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wolfgang H¨ ARDLE 2 Rouslan MORO 1 , 2 Dorothea SCH¨ AFER 1 1 Deutsches Institut f¨ ur Wirtschafts- forschung (DIW) 2 Center for Applied Statistics and Eco- nomics (CASE), Humboldt-Universit¨ at zu Berlin Corporate Bankruptcy Prediction with SVMs

Motivation 2 Linear Discriminant Analysis Fisher (1936); company scoring: Beaver (1966), Altman (1968) Z-score: Z i = a 1 x i 1 + a 2 x i 2 + ... + a d x id = a ⊤ x i , where x i = ( x i 1 , ..., x id ) ⊤ ∈ R d are financial ratios for the i -th company. successful company: Z i ≥ z The classification rule: failure: Z i < z Corporate Bankruptcy Prediction with SVMs

Motivation 3 Linear Discriminant Analysis o� o� X� o� o� Surviving� 2� o� o� companies� o� o� o� o� o� o� o� o� x� o� x� o� o� x� o� x� o� o� o� o� o� x� x� o� o� o� x� o� x� x� o� o� o� x� o� x� x� o� o� x� x� x� x� o� o� o� x� x� o� x� x� x� o� x� x� x� x� x� x� x� x� x� x� Failing� x� companies� ?� x� x� X� 1� Corporate Bankruptcy Prediction with SVMs

Motivation 4 Linear Discriminant Analysis Failing� Surviving� companies� companies� Distribution density� Z� Score� Corporate Bankruptcy Prediction with SVMs

Motivation 5 Company Data: Probability of Default Source: Falkenstein et al. (2000) Corporate Bankruptcy Prediction with SVMs

Motivation 6 RiskCalc Private Model Moody’s default model for private firms A semi-parametric model based on the probit regression d � E [ y i | x i ] = Φ { a 0 + a j f j ( x ij ) } j =1 f j are estimated non-parametrically on univariate models Corporate Bankruptcy Prediction with SVMs

Motivation 7 Linearly Non-separable Classification Problem o� X� 2� 3� 1� 2� o� o� o� o� o� o� o� x� x� Surviving� o� o� o� x� companies� o� o� x� o� x� o� o� x� x� o� x� o� o� x� o� x� o� o� o� o� x� x� o� x� x� x� o� x� x� o� o� x� o� x� o� o� o� x� x� o� x� o� x� o� x� x� x� x� o� x� x� x� Failing� o� o� x� companies� x� X� 1� Corporate Bankruptcy Prediction with SVMs

Outline of the Talk 8 Outline 1. Motivation � 2. Support Vector Machines and their Properties 3. Expected Risk vs. Empirical Risk Minimization 4. Realization of an SVM 5. Non-linear Case 6. Company Classification and Rating with SVMs Corporate Bankruptcy Prediction with SVMs

Support Vector Machines and Their Properties 9 Support Vector Machines (SVMs) SVMs are a group of methods for classification (and regression) that make use of classifiers providing “high margin”. ⊡ SVMs possess a flexible structure which is not chosen a priori ⊡ The properties of SVMs can be derived from statistical learning theory ⊡ SVMs do not rely on asymptotic properties; they are especially useful when d/n is big, i.e. in most practically significant cases ⊡ SVMs give a unique solution and outperform Neural Networks Corporate Bankruptcy Prediction with SVMs

Support Vector Machines and Their Properties 10 Classification Problem Training set : { ( x i , y i ) } n i =1 with the distribution P ( x i , y i ) . Find the class y of a new object x using the classifier f : R d �→ { +1; − 1 } , such that the expected risk R ( f ) is minimal . x i ∈ R d is the vector of the i -th object characteristics; y i ∈ {− 1; +1 } or { 0; 1 } is the class of the i -th object. Regression Problem Setup as for the classification problem but: y ∈ R Corporate Bankruptcy Prediction with SVMs

Expected Risk vs. Empirical Risk Minimization 11 Expected Risk Minimization Expected risk � 1 R ( f ) = 2 | f ( x ) − y | dP ( x, y ) = E P ( x,y ) [ L ( x, y )] is minimized wrt f : f opt = arg min f ∈F R ( f )   0 , if classification is correct, L ( x, y ) = 1 2 | f ( x ) − y | =  1 , if classification is wrong. F is an a priori defined set of (non)linear classifier functions Corporate Bankruptcy Prediction with SVMs

Expected Risk vs. Empirical Risk Minimization 12 Empirical Risk Minimization In practice P ( x, y ) is usually unknown : use Empirical Risk n � R ( f ) = 1 1 ˆ 2 | f ( x i ) − y i | n i =1 Minimization (ERM) over the training set { ( x i , y i ) } n i =1 ˆ ˆ f n = arg min R ( f ) f ∈F Corporate Bankruptcy Prediction with SVMs

Expected Risk vs. Empirical Risk Minimization 13 Empirical Risk vs. Expected Risk Risk� ˆ� R� R� ˆ� R� (f)� � R� (f)� � ˆ� Function class� f� f� f� opt� n� Corporate Bankruptcy Prediction with SVMs

Expected Risk vs. Empirical Risk Minimization 14 Convergence From the law of large numbers ˆ lim R ( f ) = R ( f ) n →∞ In addition ERM satisfies ˆ n →∞ min lim R ( f ) = min f ∈F R ( f ) f ∈F if “ F is not too big”. Corporate Bankruptcy Prediction with SVMs

Expected Risk vs. Empirical Risk Minimization 15 Vapnik-Chervonenkis (VC) Bound Basic result of Statistical Learning Theory (for linear classifiers): � h � n, ln( η ) R ( f ) ≤ ˆ R ( f ) + φ n where the bound holds with probability 1 − η and � � h � h + 1) − ln( η h (ln 2 n 4 ) n, ln ( η ) φ = n n Corporate Bankruptcy Prediction with SVMs

Expected Risk vs. Empirical Risk Minimization 16 Structural Risk Minimization Structural Risk Minimization – search for the model structure S h , S h 1 ⊆ S h 2 ⊆ . . . ⊆ S h ⊆ . . . ⊆ S hk ⊆ F , such that f ∈ S h minimizes the expected risk upper bound. h is VC dimension . S h is a set of classifier functions with the same complexity described by h , e.g. P (1) ⊆ P (2) ⊆ P (3) ⊆ . . . ⊆ F , where P ( i ) are polynomials of degree i . The functional class F is given a priori Corporate Bankruptcy Prediction with SVMs

Expected Risk vs. Empirical Risk Minimization 17 Vapnik-Chervonenkis (VC) Dimension Definition . h is VC dimension of a set of functions if there exists a set of points { x i } h i =1 such that these points can be separated in all 2 h possible configurations, and no set { x i } q i =1 exists where q > h satisfies this property. Example 1 . The functions f = A sin θx have an infinite VC dimension. Example 2 . Three points on a plane can be shattered by a set of linear indicator functions in 2 h = 2 3 = 8 ways (whereas 4 points cannot be shattered in 2 q = 2 4 = 16 ways). The VC dimension equals h = 3 . Example 3 . The VC dimension of f = { Hyperplane ∈ R d } is h = d + 1 . Corporate Bankruptcy Prediction with SVMs

Expected Risk vs. Empirical Risk Minimization 18 VC Dimension (d=2, h=3) Corporate Bankruptcy Prediction with SVMs

Realization of the SVM 19 Linearly Separable Case The training set: { ( x i , y i ) } n i =1 , y i = { +1; − 1 } , x i ∈ R d . Find the classifier with the highest “margin” – the gap between parallel hyperplanes separating two classes where the vectors of neither class can lie. Margin maximization minimizes the VC dimension. o� o� x� o� x� o� x� o� o� x� x� o� o� o� x� x� o� x� x� x� Corporate Bankruptcy Prediction with SVMs

Realization of the SVM 20 Linear SVMs. Separable Case The margin is d + + d − = 2 / � w � . To maximize it minimize the Euclidean norm � w � subject to the constraint (1). x� T� w+b=0� x� 2� margin� o� T� w+b=-1� o� x� o� x� o� o� x� o� x� x� o� w� � b x� - o� � | w | o� x� x� o� x� --� d� +� x� � x� d T� w+b=1� x� 0� x� 1� Corporate Bankruptcy Prediction with SVMs

Realization of the SVM 21 Let x ⊤ w + b = 0 be a separating hyperplane. Then d + ( d − ) will be the shortest distance to the closest objects from the classes +1 ( − 1) . x ⊤ i w + b ≥ +1 for y i = +1 x ⊤ i w + b ≤ − 1 for y i = − 1 combine them into one constraint y i ( x ⊤ i w + b ) − 1 ≥ 0 i = 1 , 2 , ..., n (1) The canonical hyperplanes x ⊤ i w + b = ± 1 are parallel and the distance between each of them and the separating hyperplane is d + = d − = 1 / � w � . Corporate Bankruptcy Prediction with SVMs

Realization of the SVM 22 The Lagrangian Formulation The Lagrangian for the primal problem n � L P = 1 2 � w � 2 − α i { y i ( x ⊤ i w + b ) − 1 } i =1 The Karush-Kuhn-Tucker (KKT) Conditions � n ∂L P ∂w k = 0 ⇔ i =1 α i y i x ik = 0 k = 1 , ..., d � n ∂L P ∂b = 0 ⇔ i =1 α i y i = 0 y i ( x ⊤ i w + b ) − 1 ≥ 0 i = 1 , ..., n α i ≥ 0 α i { y i ( x ⊤ i w + b ) − 1 } = 0 Corporate Bankruptcy Prediction with SVMs

Realization of the SVM 23 Substitute the KKT conditions into L P and obtain the Lagrangian for the dual problem n n n � � � α i − 1 α i α j y i y j x ⊤ L D = i x j 2 i =1 i =1 j =1 The primal and dual problems are min w k ,b max α i L P max α i L D s.t. n � α i ≥ 0 α i y i = 0 i =1 Since the optimization problem is convex the dual and primal formulations give the same solution. Corporate Bankruptcy Prediction with SVMs

SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wolfgang H ARDLE 2 - PowerPoint PPT Presentation

1 SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wolfgang H ARDLE 2 Rouslan MORO 1 , 2 Dorothea SCH AFER 1 1 Deutsches Institut f ur Wirtschafts- forschung (DIW) 2 Center for Applied Statistics and Eco- nomics (CASE),

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Bankruptcy Code The Bankruptcy Code (Chapter 11 of the USC) is the source of all bankruptcy

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

ACCT 420: Logistic Regression for Bankruptcy Session 5 Dr. Richard M. Crowley 1 Front matter

Market Timing Is ... Mark Pankin MDP Associates LLC Registered Investment Advisor November 15,

Third Quarter Results 2007 Zurich November 1, 2007 Renato Fassbind Chief Financial Officer

Corporations Part 1 Venture Capital Funds: o Usually a limited partnership o Investors

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep

Introduction to Tournaments Tournament Stphane Airiau Input: Binary relation between

1 Introduction phenomenon Introduction phenomenon Methods & results background Methods

CS-5630 / CS-6630 Visualization for Data Science Filtering & Aggregation Alexander Lex

SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wolfgang H ARDLE 2 - PowerPoint PPT Presentation

1 SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wolfgang H ARDLE 2 Rouslan MORO 1 , 2 Dorothea SCH AFER 1 1 Deutsches Institut f ur Wirtschafts- forschung (DIW) 2 Center for Applied Statistics and Eco- nomics (CASE),

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Bankruptcy Code The Bankruptcy Code (Chapter 11 of the USC) is the source of all bankruptcy

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

ACCT 420: Logistic Regression for Bankruptcy Session 5 Dr. Richard M. Crowley 1 Front matter

Market Timing Is ... Mark Pankin MDP Associates LLC Registered Investment Advisor November 15,

Third Quarter Results 2007 Zurich November 1, 2007 Renato Fassbind Chief Financial Officer

Corporations Part 1 Venture Capital Funds: o Usually a limited partnership o Investors

Deep Learning for Network Biology Marinka Zitnik and Jure Leskovec Stanford University Deep

Introduction to Tournaments Tournament Stphane Airiau Input: Binary relation between

1 Introduction phenomenon Introduction phenomenon Methods &amp; results background Methods

CS-5630 / CS-6630 Visualization for Data Science Filtering &amp; Aggregation Alexander Lex

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

1 Introduction phenomenon Introduction phenomenon Methods & results background Methods

CS-5630 / CS-6630 Visualization for Data Science Filtering & Aggregation Alexander Lex