| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 5.28 MB | Adobe PDF |
Orientador(es)
Resumo(s)
The main goal of this report is to contribute to the adoption of complex « Black Box »
machine learning models in the field of credit scoring for retail credit.
Although numerous investigations have been showing the potential benefits of using
complex models, we identified the lack of interpretability as one of the main vector preventing
from a full and trustworthy adoption of these new modeling techniques. Intrinsically linked
with recent data concerns such as individual rights for explanation, fairness (introduced in the
GDPR1) or model reliability, we believe that this kind of research is crucial for easing its
adoption among credit risk practitioners.
We build a standard Linear Scorecard model along with a more advanced algorithm
called Extreme Gradient Boosting (XGBoost) on a retail credit open source dataset. The
modeling scenario is a binary classification task consisting in identifying clients that will
experienced 90 days past due delinquency state or worse.
The interpretation of the Scorecard model is performed using the raw output of the
algorithm while more complex data perturbation technique, namely Partial Dependence Plots
and Shapley Additive Explanations methods are computed for the XGBoost algorithm.
As a result, we observe that the XGBoost algorithm is statistically more performant
at distinguishing “bad” from “good” clients. Additionally, we show that the global interpretation
of the XGBoost is not as accurate as the Scorecard algorithm. At an individual level however
(for each instance of the dataset), we show that the level of interpretability is very similar as they
are both able to quantify the contribution of each variable to the predicted risk of a specific
application.
Descrição
Dissertation report presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Business Intelligence and Knowledge Management
Palavras-chave
Credit Scoring XGBoost Model Interpretation Black Box
