Explaining the predictions of a boosted tree algorithm : application to credit scoring

Gonçalves, Rui Alexandre HenriquesSalvaire, Pierre Antony Jean Marie2019-10-312019-10-312019-09-03http://hdl.handle.net/10362/85991Dissertation report presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Business Intelligence and Knowledge ManagementThe main goal of this report is to contribute to the adoption of complex « Black Box » machine learning models in the field of credit scoring for retail credit. Although numerous investigations have been showing the potential benefits of using complex models, we identified the lack of interpretability as one of the main vector preventing from a full and trustworthy adoption of these new modeling techniques. Intrinsically linked with recent data concerns such as individual rights for explanation, fairness (introduced in the GDPR1) or model reliability, we believe that this kind of research is crucial for easing its adoption among credit risk practitioners. We build a standard Linear Scorecard model along with a more advanced algorithm called Extreme Gradient Boosting (XGBoost) on a retail credit open source dataset. The modeling scenario is a binary classification task consisting in identifying clients that will experienced 90 days past due delinquency state or worse. The interpretation of the Scorecard model is performed using the raw output of the algorithm while more complex data perturbation technique, namely Partial Dependence Plots and Shapley Additive Explanations methods are computed for the XGBoost algorithm. As a result, we observe that the XGBoost algorithm is statistically more performant at distinguishing “bad” from “good” clients. Additionally, we show that the global interpretation of the XGBoost is not as accurate as the Scorecard algorithm. At an individual level however (for each instance of the dataset), we show that the level of interpretability is very similar as they are both able to quantify the contribution of each variable to the predicted risk of a specific application.engCredit ScoringXGBoostModel InterpretationBlack BoxExplaining the predictions of a boosted tree algorithm : application to credit scoringmaster thesis202295230