| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 1.85 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
The decision to grant a loan depends on the lender’s evaluation of the borrower’s ability to repay. Loan default in online banking has been a relevant research topic in recent years as lending has expanded to online platforms and mobile applications where it is performed on a Peer-to-Peer (P2P) basis. The present study considers a merge of two “Lending club loan data” versions; one contains loans issued through 2007–2015 and another version through 2012–2020. The merge of these two datasets with removing the duplicates gave us a dataset consisting of approximately 2,925,493 borrower records and 142 features, which comprises the period between 2007 and the 3rd quarter of 2020. In addition, and to ensure the effectiveness of the modelling, a “Prosper” dataset was analysed, consisting of approximately 1,113,937 borrower records and 81 features, comprising the period between 2006 and 1st quarter of 2014. For both periods, a set of macroeconomic variables were modelled to identify whether these would impact the loan repayment. Given its high underlying risk, this form of lending is a relevant area to study how the various characteristics of the obligor may influence its future repayment behaviour. The core of this dissertation is to understand, through machine learning techniques, the variables that may warn the lender about a potential default and thus make the transaction less risky. This study started with a systematic literature review and tried to summarize the most common algorithms used in other studies and their characteristics. Through our analysis, we conclude that the borrower assessment variables are significant predictors, translating into the effectiveness of the credit risk assessment performed by the platforms. In addition, it is observed that the short-term interest rate and GDP are significant for both datasets, being of most relevance in the smaller universe, the Prosper dataset.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Information Analysis and Management
Palavras-chave
Risk Big Data P2P Lending Default Prediction Machine Learning Logistic Regression Decision Tree Random Forest
