Model e v al u ation and implementation C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y
Comparing classification reports Create the reports w ith classification_report() and compare CREDIT RISK MODELING IN PYTHON
ROC and AUC anal y sis Models w ith be � er performance w ill ha v e more li � More li � means the AUC score is higher CREDIT RISK MODELING IN PYTHON
Model calibration We w ant o u r probabilities of defa u lt to acc u ratel y represent the model ' s con � dence le v el The probabilit y of defa u lt has a degree of u ncertaint y in it ' s predictions A sample of loans and their predicted probabilities of defa u lt sho u ld be close to the percentage of defa u lts in that sample Sample of A v erage predicted Sample percentage of act u al Calibrated ? loans PD defa u lts 10 0.12 0.12 Yes 10 0.25 0.65 No 1 h � p :// datascienceassn . org / sites / defa u lt /� les / Predicting %20 good %20 probabilities %20w ith %20 s u per v ised %20 le CREDIT RISK MODELING IN PYTHON
Calc u lating calibration Sho w s percentage of tr u e defa u lts for each predicted probabilit y Essentiall y a line plot of the res u lts of calibration_curve() from sklearn.calibration import calibration_curve calibration_curve(y_test, probabilities_of_default, n_bins = 5) # Fraction of positives (array([0.09602649, 0.19521012, 0.62035996, 0.67361111]), # Average probability array([0.09543535, 0.29196742, 0.46898465, 0.65512207])) CREDIT RISK MODELING IN PYTHON
Plotting calibration c u r v es plt.plot(mean_predicted_value, fraction_of_positives, label="%s" % "Example Model") CREDIT RISK MODELING IN PYTHON
Checking calibration c u r v es As an e x ample , t w o e v ents selected ( abo v e and belo w perfect line ) CREDIT RISK MODELING IN PYTHON
Calibration c u r v e interpretation CREDIT RISK MODELING IN PYTHON
Calibration c u r v e interpretation CREDIT RISK MODELING IN PYTHON
Let ' s practice ! C R E D IT R ISK MOD E L IN G IN P YTH ON
Credit acceptance rates C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y
Thresholds and loan stat u s Pre v io u sl y w e set a threshold for a range of prob_default v al u es This w as u sed to change the predicted loan_status of the loan preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.4 else 0) Loan prob _ defa u lt threshold loan _ stat u s 1 0.25 0.4 0 2 0.42 0.4 1 3 0.75 0.4 1 CREDIT RISK MODELING IN PYTHON
Thresholds and acceptance rate Use model predictions to set be � er thresholds Can also be u sed to appro v e or den y ne w loans For all ne w loans , w e w ant to den y probable defa u lts Use the test data as an e x ample of ne w loans Acceptance rate : w hat percentage of ne w loans are accepted to keep the n u mber of defa u lts in a portfolio lo w Accepted loans w hich are defa u lts ha v e an impact similar to false negati v es CREDIT RISK MODELING IN PYTHON
Understanding acceptance rate E x ample : Accept 85% of loans w ith the lo w est prob_default CREDIT RISK MODELING IN PYTHON
Calc u lating the threshold Calc u late the threshold v al u e for an 85% acceptance rate import numpy as np # Compute the threshold for 85% acceptance rate threshold = np.quantile(prob_default, 0.85) 0.804 prob_default Predicted loan_status Loan Threshold Accept or Reject 1 0.65 0.804 0 Accept 2 0.85 0.804 1 Reject CREDIT RISK MODELING IN PYTHON
Implementing the calc u lated threshold Reassign loan_status v al u es u sing the ne w threshold # Compute the quantile on the probabilities of default preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.804 else 0) CREDIT RISK MODELING IN PYTHON
Bad Rate E v en w ith a calc u lated threshold , some of the accepted loans w ill be defa u lts These are loans w ith prob_default v al u es aro u nd w here o u r model is not w ell calibrated CREDIT RISK MODELING IN PYTHON
Bad rate calc u lation #Calculate the bad rate np.sum(accepted_loans['true_loan_status']) / accepted_loans['true_loan_status'].count() If non - defa u lt is 0 , and defa u lt is 1 then the sum() is the co u nt of defa u lts The .count() of a single col u mn is the same as the ro w co u nt for the data frame CREDIT RISK MODELING IN PYTHON
Let ' s practice ! C R E D IT R ISK MOD E L IN G IN P YTH ON
Credit strateg y and minim u m e x pected loss C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y
Selecting acceptance rates First acceptance rate w as set to 85%, b u t other rates might be selected as w ell T w o options to test di � erent rates : Calc u late the threshold , bad rate , and losses man u all y A u tomaticall y create a table of these v al u es and select an acceptance rate The table of all the possible v al u es is called a strateg y table CREDIT RISK MODELING IN PYTHON
Setting u p the strateg y table Set u p arra y s or lists to store each v al u e # Set all the acceptance rates to test accept_rates = [1.0, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05] # Create lists to store thresholds and bad rates thresholds = [] bad_rates = [] CREDIT RISK MODELING IN PYTHON
Calc u lating the table v al u es Calc u late the threshold and bad rate for all acceptance rates for rate in accept_rates: # Calculate threshold threshold = np.quantile(preds_df['prob_default'], rate).round(3) # Store threshold value in a list thresholds.append(np.quantile(preds_gbt['prob_default'], rate).round(3)) # Apply the threshold to reassign loan_status test_pred_df['pred_loan_status'] = \ test_pred_df['prob_default'].apply(lambda x: 1 if x > thresh else 0) # Create accepted loans set of predicted non-defaults accepted_loans = test_pred_df[test_pred_df['pred_loan_status'] == 0] # Calculate and store bad rate bad_rates.append(np.sum((accepted_loans['true_loan_status']) / accepted_loans['true_loan_status'].count()).round(3)) CREDIT RISK MODELING IN PYTHON
Strateg y table interpretation strat_df = pd.DataFrame(zip(accept_rates, thresholds, bad_rates), columns = ['Acceptance Rate','Threshold','Bad Rate']) CREDIT RISK MODELING IN PYTHON
Adding accepted loans The n u mber of loans accepted for each acceptance rate Can u se len() or .count() CREDIT RISK MODELING IN PYTHON
Adding a v erage loan amo u nt A v erage loan_amnt from the test set data CREDIT RISK MODELING IN PYTHON
Estimating portfolio v al u e A v erage v al u e of accepted loan non - defa u lts min u s a v erage v al u e of accepted defa u lts Ass u mes each defa u lt is a loss of the loan_amnt CREDIT RISK MODELING IN PYTHON
Total e x pected loss Ho w m u ch w e e x pect to lose on the defa u lts in o u r portfolio # Probability of default (PD) test_pred_df['prob_default'] # Exposure at default = loan amount (EAD) test_pred_df['loan_amnt'] # Loss given default = 1.0 for total loss (LGD) test_pred_df['loss_given_default'] CREDIT RISK MODELING IN PYTHON
Let ' s practice ! C R E D IT R ISK MOD E L IN G IN P YTH ON
Co u rse w rap u p C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y
Yo u r jo u rne y... so far Prepare credit data for machine learning models Important to u nderstand the data Impro v ing the data allo w s for high performing simple models De v elop , score , and u nderstand logistic regressions and gradient boosted trees Anal yz e the performance of models b y changing the data Understand the � nancial impact of res u lts Implement the model w ith an u nderstanding of strateg y CREDIT RISK MODELING IN PYTHON
Risk modeling techniq u es The models and frame w ork in this co u rse : Discrete - time ha z ard model ( point in time ): the probabilit y of defa u lt is a point - in - time e v ent St u ct u ral model frame w ork : the model e x plains the defa u lt e v en based on other factors Other techniq u es Thro u gh - the - c y cle model ( contin u o u s time ): macro - economic conditions and other e � ects are u sed , b u t the risk is seen as an independent e v ent Red u ced - form model frame w ork : a statistical approach estimating probabilit y of defa u lt as an independent Poisson - based e v ent CREDIT RISK MODELING IN PYTHON
Choosing models Man y machine learning models a v ailable , b u t logistic regression and tree models w ere u sed These models are simple and e x plainable Their performance on probabilities is acceptable Man y � nancial sectors prefer model interpretabilit y Comple x or " black - bo x" models are a risk beca u se the b u siness cannot e x plain their decisions f u ll y Deep ne u ral net w orks are o � en too comple x CREDIT RISK MODELING IN PYTHON
Tips from me to y o u Foc u s on the data Gather as m u ch data as possible Use man y di � erent techniq u es to prepare and enhance the data Learn abo u t the b u siness Increase v al u e thro u gh data Model comple x it y can be a t w o - edged s w ord Reall y comple x models ma y perform w ell , b u t are seen as a " black - bo x" In man y cases , b u siness u sers w ill not accept a model the y cannot u nderstand Comple x models can be v er y large and di � c u lt to p u t into prod u ction CREDIT RISK MODELING IN PYTHON
Thank y o u! C R E D IT R ISK MOD E L IN G IN P YTH ON
Recommend
More recommend