Understanding credit risk C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y
What is credit risk ? The possibilit y that someone w ho has borro w ed mone y w ill not repa y it all Calc u lated risk di � erence bet w een lending someone mone y and a go v ernment bond When someone fails to repa y a loan , it is said to be in defa u lt The likelihood that someone w ill defa u lt on a loan is the probabilit y of defa u lt ( PD ) CREDIT RISK MODELING IN PYTHON
What is credit risk ? The possibilit y that someone w ho has borro w ed mone y w ill not repa y it all Calc u lated risk di � erence bet w een lending someone mone y and a go v ernment bond When someone fails to repa y a loan , it is said to be in defa u lt The likelihood that someone w ill defa u lt on a loan is the probabilit y of defa u lt ( PD ) Pa y ment Pa y ment Date Loan Stat u s $100 J u n 15 Non - Defa u lt $100 J u l 15 Non - Defa u lt $0 A u g 15 Defa u lt CREDIT RISK MODELING IN PYTHON
E x pected loss The dollar amo u nt the � rm loses as a res u lt of loan defa u lt Three primar y components : Probabilit y of Defa u lt ( PD ) E x pos u re at Defa u lt ( EAD ) Loss Gi v en Defa u lt ( LGD ) Form u la for e x pected loss : expected_loss = PD * EAD * LGD CREDIT RISK MODELING IN PYTHON
T y pes of data u sed T w o Primar y t y pes of data u sed : Application data Beha v ioral data Application Beha v ioral Interest Rate Emplo y ment Length Grade Historical Defa u lt Amo u nt Income CREDIT RISK MODELING IN PYTHON
Data col u mns Mi x of beha v ioral and application Col u mn Col u mn Contain col u mns sim u lating credit b u rea u Income Loan grade data Age Loan amo u nt Home o w nership Interest rate Emplo y ment length Loan stat u s Loan intent Historical defa u lt Percent Income Credit histor y length CREDIT RISK MODELING IN PYTHON
E x ploring w ith cross tables pd.crosstab(cr_loan['person_home_ownership'], cr_loan['loan_status'], values=cr_loan['loan_int_rate'], aggfunc='mean').round(2) CREDIT RISK MODELING IN PYTHON
E x ploring w ith v is u als plt.scatter(cr_loan['person_income'], cr_loan['loan_int_rate'],c='blue', alpha=0.5) plt.xlabel("Personal Income") plt.ylabel("Loan Interest Rate") plt.show() CREDIT RISK MODELING IN PYTHON
Let ' s practice ! C R E D IT R ISK MOD E L IN G IN P YTH ON
O u tliers in Credit Data C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y
Data processing Prepared data allo w s models to train faster O � en positi v el y impacts model performance CREDIT RISK MODELING IN PYTHON
O u tliers and performance Possible ca u ses of o u tliers : Problems w ith data entr y s y stems ( h u man error ) Iss u es w ith data ingestion tools CREDIT RISK MODELING IN PYTHON
O u tliers and performance Possible ca u ses of o u tliers : Problems w ith data entr y s y stems ( h u man error ) Iss u es w ith data ingestion tools Feat u re Coe � cient With O u tliers Coe � cient Witho u t O u tliers Interest Rate 0.2 0.01 Emplo y ment Length 0.5 0.6 Income 0.6 0.75 CREDIT RISK MODELING IN PYTHON
Detecting o u tliers w ith cross tables Use cross tables w ith aggregate f u nctions pd.crosstab(cr_loan['person_home_ownership'], cr_loan['loan_status'], values=cr_loan['loan_int_rate'], aggfunc='mean').round(2) CREDIT RISK MODELING IN PYTHON
Detecting o u tliers v is u all y Detecting o u tliers v is u all y Histograms Sca � er plots CREDIT RISK MODELING IN PYTHON
Remo v ing o u tliers Use the .drop() method w ithin Pandas indices = cr_loan[cr_loan['person_emp_length'] >= 60].index cr_loan.drop(indices, inplace=True) CREDIT RISK MODELING IN PYTHON
Let ' s practice ! C R E D IT R ISK MOD E L IN G IN P YTH ON
Risk w ith missing data in loan data C R E D IT R ISK MOD E L IN G IN P YTH ON Michael Crabtree Data Scientist , Ford Motor Compan y
What is missing data ? NULLs in a ro w instead of an act u al v al u e An empt y string '' Not an entirel y empt y ro w Can occ u r in an y col u mn in the data CREDIT RISK MODELING IN PYTHON
Similarities w ith o u tliers Negati v el y a � ect machine learning model performance Ma y bias models in u nanticipated w a y s Ma y ca u se errors for some machine learning models CREDIT RISK MODELING IN PYTHON
Similarities w ith o u tliers Negati v el y a � ect machine learning model performance Ma y bias models in u nanticipated w a y s Ma y ca u se errors for some machine learning models Missing Data T y pe Possible Res u lt NULL in n u meric col u mn Error NULL in string col u mn Error CREDIT RISK MODELING IN PYTHON
Ho w to handle missing data Generall y three w a y s to handle missing data Replace v al u es w here the data is missing Remo v e the ro w s containing missing data Lea v e the ro w s w ith missing data u nchanged Understanding the data determines the co u rse of action CREDIT RISK MODELING IN PYTHON
Ho w to handle missing data Generall y three w a y s to handle missing data Replace v al u es w here the data is missing Remo v e the ro w s containing missing data Lea v e the ro w s w ith missing data u nchanged Understanding the data determines the co u rse of action Missing Data Interpretation Action NULL in loan_status Loan recentl y appro v ed Remo v e from prediction data NULL in person_age Age not recorded or disclosed Replace w ith median CREDIT RISK MODELING IN PYTHON
Finding missing data N u ll v al u es are easil y fo u nd b y u sing the isnull() f u nction N u ll records can easil y be co u nted w ith the sum() f u nction .any() method checks all col u mns null_columns = cr_loan.columns[cr_loan.isnull().any()] cr_loan[null_columns].isnull().sum() # Total number of null values per column person_home_ownership 25 person_emp_length 895 loan_intent 25 loan_int_rate 3140 cb_person_default_on_file 15 CREDIT RISK MODELING IN PYTHON
Replacing Missing data Replace the missing data u sing methods like .fillna() w ith aggregate f u nctions and methods cr_loan['loan_int_rate'].fillna((cr_loan['loan_int_rate'].mean()), inplace = True) CREDIT RISK MODELING IN PYTHON
Dropping missing data Uses indices to identif y records the same as w ith o u tliers Remo v e the records entirel y u sing the .drop() method indices = cr_loan[cr_loan['person_emp_length'].isnull()].index cr_loan.drop(indices, inplace=True) CREDIT RISK MODELING IN PYTHON
Let ' s practice ! C R E D IT R ISK MOD E L IN G IN P YTH ON
Recommend
More recommend