Logistic regression for probabilit y of defa u lt C R E D IT R ISK - PowerPoint PPT Presentation

Logistic regression for probabilit y of defa u lt C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y

Probabilit y of defa u lt The likelihood that someone w ill defa u lt on a loan is the probabilit y of defa u lt A probabilit y v al u e bet w een 0 and 1 like 0.86 loan_status of 1 is a defa u lt or 0 for non - defa u lt CREDIT RISK MODELING IN PYTHON

Probabilit y of defa u lt The likelihood that someone w ill defa u lt on a loan is the probabilit y of defa u lt A probabilit y v al u e bet w een 0 and 1 like 0.86 loan_status of 1 is a defa u lt or 0 for non - defa u lt Probabilit y of Defa u lt Interpretation Predicted loan stat u s 0.4 Unlikel y to defa u lt 0 0.90 Ver y likel y to defa u lt 1 0.1 Ver y u nlikel y to defa u lt 0 CREDIT RISK MODELING IN PYTHON

Predicting probabilities Probabilities of defa u lt as an o u tcome from machine learning Learn from data in col u mns ( feat u res ) Classi � cation models ( defa u lt , non - defa u lt ) T w o most common models : Logistic regression Decision tree CREDIT RISK MODELING IN PYTHON

Logistic regression Similar to the linear regression , b u t onl y prod u ces v al u es bet w een 0 and 1 CREDIT RISK MODELING IN PYTHON

Training a logistic regression Logistic regression a v ailable w ithin the scikit - learn package from sklearn.linear_model import LogisticRegression Called as a f u nction w ith or w itho u t parameters clf_logistic = LogisticRegression(solver='lbfgs') Uses the method .fit() to train clf_logistic.fit(training_columns, np.ravel(training_labels)) Training Col u mns : all of the col u mns in o u r data e x cept loan_status Labels : loan_status (0,1) CREDIT RISK MODELING IN PYTHON

Training and testing Entire data set is u s u all y split into t w o parts CREDIT RISK MODELING IN PYTHON

Training and testing Entire data set is u s u all y split into t w o parts Data S u bset Usage Portion Train Learn from the data to generate predictions 60% Test Test learning on ne w u nseen data 40% CREDIT RISK MODELING IN PYTHON

Creating the training and test sets Separate the data into training col u mns and labels X = cr_loan.drop('loan_status', axis = 1) y = cr_loan[['loan_status']] Use train_test_split() f u nction alread y w ithin sci - kit learn X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=123) test_size : percentage of data for test set random_state : a random seed v al u e for reprod u cibilit y CREDIT RISK MODELING IN PYTHON

Let ' s practice ! C R E D IT R ISK MOD E L IN G IN P YTH ON

Predicting the probabilit y of defa u lt C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y

Logistic regression coefficients # Model Intercept array([-3.30582292e-10]) # Coefficients for ['loan_int_rate','person_emp_length','person_income'] array([[ 1.28517496e-09, -2.27622202e-09, -2.17211991e-05]]) # Calculating probability of default int_coef_sum = -3.3e-10 + (1.29e-09 * loan_int_rate) + (-2.28e-09 * person_emp_length) + (-2.17e-05 * person_income) prob_default = 1 / (1 + np.exp(-int_coef_sum)) prob_nondefault = 1 - (1 / (1 + np.exp(-int_coef_sum))) CREDIT RISK MODELING IN PYTHON

Interpreting coefficients # Intercept intercept = -1.02 # Coefficient for employment length person_emp_length_coef = -0.056 For e v er y 1 y ear increase in person_emp_length , the person is less likel y to defa u lt CREDIT RISK MODELING IN PYTHON

Interpreting coefficients # Intercept intercept = -1.02 # Coefficient for employment length person_emp_length_coef = -0.056 For e v er y 1 y ear increase in person_emp_length , the person is less likel y to defa u lt intercept person _ emp _ length v al u e * coef probabilit y of defa u lt -1.02 (10 * -0.06 ) 10 .17 -1.02 (11 * -0.06 ) 11 .16 -1.02 (12 * -0.06 ) 12 .15 CREDIT RISK MODELING IN PYTHON

Using non - n u meric col u mns N u meric : loan_int_rate , person_emp_length , person_income Non - n u meric : cr_loan_clean['loan_intent'] EDUCATION MEDICAL VENTURE PERSONAL DEBTCONSOLIDATION HOMEIMPROVEMENT Will ca u se errors w ith machine learning models in P y thon u nless processed CREDIT RISK MODELING IN PYTHON

One - hot encoding Represent a string w ith a n u mber CREDIT RISK MODELING IN PYTHON

One - hot encoding Represent a string w ith a n u mber 0 or 1 in a ne w col u mn column_VALUE CREDIT RISK MODELING IN PYTHON

Get d u mmies Utili z e the get_dummies() w ithin pandas # Separate the numeric columns cred_num = cr_loan.select_dtypes(exclude=['object']) # Separate non-numeric columns cred_cat = cr_loan.select_dtypes(include=['object']) # One-hot encode the non-numeric columns only cred_cat_onehot = pd.get_dummies(cred_cat) # Union the numeric columns with the one-hot encoded columns cr_loan = pd.concat([cred_num, cred_cat_onehot], axis=1) CREDIT RISK MODELING IN PYTHON

Predicting the f u t u re , probabl y Use the .predict_proba() method w ithin scikit - learn # Train the model clf_logistic.fit(X_train, np.ravel(y_train)) # Predict using the model clf_logistic.predict_proba(X_test) Creates arra y of probabilities of defa u lt # Probabilities: [[non-default, default]] array([[0.55, 0.45]]) CREDIT RISK MODELING IN PYTHON

Credit model performance C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y

Model acc u rac y scoring Calc u late acc u rac y Use the .score() method from scikit - learn # Check the accuracy against the test data clf_logistic1.score(X_test,y_test) 0.81 81% of v al u es for loan_status predicted correctl y CREDIT RISK MODELING IN PYTHON

ROC c u r v e charts Recei v er Operating Characteristic c u r v e Plots tr u e positi v e rate ( sensiti v it y) against false positi v e rate ( fall - o u t ) fallout, sensitivity, thresholds = roc_curve(y_test, prob_default) plt.plot(fallout, sensitivity, color = 'darkorange') CREDIT RISK MODELING IN PYTHON

Anal yz ing ROC charts Area Under C u r v e ( AUC ): area bet w een c u r v e and random prediction CREDIT RISK MODELING IN PYTHON

Defa u lt thresholds Threshold : at w hat point a probabilit y is a defa u lt CREDIT RISK MODELING IN PYTHON

Setting the threshold Relabel loans based on o u r threshold of 0.5 preds = clf_logistic.predict_proba(X_test) preds_df = pd.DataFrame(preds[:,1], columns = ['prob_default']) preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.5 else 0) CREDIT RISK MODELING IN PYTHON

Credit classification reports classification_report() w ithin scikit - learn from sklearn.metrics import classification_report classification_report(y_test, preds_df['loan_status'], target_names=target_names) CREDIT RISK MODELING IN PYTHON

Selecting classification metrics Select and store speci � c components from the classification_report() Use the precision_recall_fscore_support() f u nction from scikit - learn from sklearn.metrics import precision_recall_fscore_support precision_recall_fscore_support(y_test,preds_df['loan_status'])[1][1] CREDIT RISK MODELING IN PYTHON

Model discrimination and impact C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y

Conf u sion matrices Sho w s the n u mber of correct and incorrect predictions for each loan_status CREDIT RISK MODELING IN PYTHON

Defa u lt recall for loan stat u s Defa u lt recall ( or sensiti v it y) is the proportion of tr u e defa u lts predicted CREDIT RISK MODELING IN PYTHON

Recall portfolio impact Classi � cation report - Underperforming Logistic Regression model CREDIT RISK MODELING IN PYTHON

Recall portfolio impact Classi � cation report - Underperforming Logistic Regression model N u mber of tr u e defa u lts : 50,000 Loan Amo u nt Defa u lts Predicted / Not Predicted Estimated Loss on Defa u lts $50 .04 / .96 (50000 x .96) x 50 = $2,400,000 CREDIT RISK MODELING IN PYTHON

Recall , precision , and acc u rac y Di � c u lt to ma x imi z e all of them beca u se there is a trade - o � CREDIT RISK MODELING IN PYTHON

Logistic regression for probabilit y of defa u lt C R E D IT R ISK - PowerPoint PPT Presentation

Logistic regression for probabilit y of defa u lt C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y Probabilit y of defa u lt The likelihood that someone w ill defa u lt on a loan is the probabilit

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Universal Scaling in Fast Quenches Near Lifshitz-Like Fixed Points Ali Mollabashi YITP Workshop

The degree distribution Ramon Ferrer-i-Cancho & Argimiro Arratia Universitat Polit` ecnica de

P orting a G AMESS C omputational C hemistry K ernel to F PGAs Uma Klaassen University of Texas

Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry Frank Nielsen 1 etan

Overfitting Many hypotheses consistent with/close to the data About this class With enough

Machine Learning - MT 2016 16. Course Summary Varun Kanade University of Oxford November 30,

Chapter 3: Modeling with First-Order Differential Equations Department of Electrical Engineering

Discriminative vs. Generative Learning CS 760@UW-Madison Goals for the lecture you should

Logistic regression for probabilit y of defa u lt C R E D IT R ISK - PowerPoint PPT Presentation

Logistic regression for probabilit y of defa u lt C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y Probabilit y of defa u lt The likelihood that someone w ill defa u lt on a loan is the probabilit

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Todays lecture Logistic regression How can we use logistic regression for reranking? Shay

From Logistic Regression to Neural Networks CMSC 470 Marine Carpuat Logistic Regression What

LEARNING Outline Math Behind Logistic Regression Visualizing Logistic Regression Loss

Workshop 10.5a: Logistic regression Murray Logan August 23, 2016 Table of contents 1 Logistic

Logistic Regression using OLS1D in Excel 2013 XL4D: V0H XL4D: V0H XL4D: V0H 2015 Schield

Workshop 10.5a: Logistic regression Murray Logan 05 Sep 2016 Section 1 Logistic regression

Lecture 3: Logistic Regression Feng Li Shandong University fli@sdu.edu.cn September 21, 2020

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

XL4B: Logistic Regression using OLS1B in Excel 2013 25 Feb 2018 V0C-2x XL4B: V0C-2x XL4B: V0C-2x

Logistic regression Shay Cohen (based on slides by Sharon Goldwater) 28 October 2019 Todays

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Learning From Data Lecture 9 Logistic Regression and Gradient Descent Logistic Regression

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Logistic Regression: MLE vs. OLS3 in Excel2013 25 Aug 2016 V0H V0H V0H Schield MLE vs.

Universal Scaling in Fast Quenches Near Lifshitz-Like Fixed Points Ali Mollabashi YITP Workshop

The degree distribution Ramon Ferrer-i-Cancho &amp; Argimiro Arratia Universitat Polit` ecnica de

P orting a G AMESS C omputational C hemistry K ernel to F PGAs Uma Klaassen University of Texas

Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry Frank Nielsen 1 etan

Overfitting Many hypotheses consistent with/close to the data About this class With enough

Machine Learning - MT 2016 16. Course Summary Varun Kanade University of Oxford November 30,

Chapter 3: Modeling with First-Order Differential Equations Department of Electrical Engineering

Discriminative vs. Generative Learning CS 760@UW-Madison Goals for the lecture you should

The degree distribution Ramon Ferrer-i-Cancho & Argimiro Arratia Universitat Polit` ecnica de