Explaining the predictions of a boosted tree algorithm : application to credit scoring

Salvaire, Pierre Antony Jean Marie

http://hdl.handle.net/10362/85991

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
TGI0254.pdf		5.28 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Salvaire, Pierre Antony Jean Marie

Orientador(es)

Gonçalves, Rui Alexandre Henriques

Resumo(s)

The main goal of this report is to contribute to the adoption of complex « Black Box » machine learning models in the field of credit scoring for retail credit. Although numerous investigations have been showing the potential benefits of using complex models, we identified the lack of interpretability as one of the main vector preventing from a full and trustworthy adoption of these new modeling techniques. Intrinsically linked with recent data concerns such as individual rights for explanation, fairness (introduced in the GDPR1) or model reliability, we believe that this kind of research is crucial for easing its adoption among credit risk practitioners. We build a standard Linear Scorecard model along with a more advanced algorithm called Extreme Gradient Boosting (XGBoost) on a retail credit open source dataset. The modeling scenario is a binary classification task consisting in identifying clients that will experienced 90 days past due delinquency state or worse. The interpretation of the Scorecard model is performed using the raw output of the algorithm while more complex data perturbation technique, namely Partial Dependence Plots and Shapley Additive Explanations methods are computed for the XGBoost algorithm. As a result, we observe that the XGBoost algorithm is statistically more performant at distinguishing “bad” from “good” clients. Additionally, we show that the global interpretation of the XGBoost is not as accurate as the Scorecard algorithm. At an individual level however (for each instance of the dataset), we show that the level of interpretability is very similar as they are both able to quantify the contribution of each variable to the predicted risk of a specific application.

Descrição

Dissertation report presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Business Intelligence and Knowledge Management

Palavras-chave

Credit Scoring XGBoost Model Interpretation Black Box

URI

http://hdl.handle.net/10362/85991

Coleções

NIMS - Dissertações de Mestrado em Gestão da Informação (Information Management)

Ver registo completo