Credit Card Fraud Detection: Using Multi-state Markov Models

Roiçado, Diogo Miguel Cardoso

http://hdl.handle.net/10362/190719

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
TCDMAA4576.pdf		2.46 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Roiçado, Diogo Miguel Cardoso

Orientador(es)

Bravo, Jorge Miguel Ventura

Resumo(s)

In this study, we investigate whether a compact set of behavioural covariates, embedded in a continuoustime multi-state Markov model (MSM), can reliably forecast credit-card fraud risk—and whether combining MSM-derived transition probabilities with a simple machine-learning classifier enhances early-fraud detection. We apply stepwise, Akaike Information Criterion (𝐴𝐼𝐶)-guided selection on MSMs fitted to the original, undersampled, Synthetic Minority Over-sampling Technique (SMOTE)- augmented, and Generative Adversarial Networks (GAN)-augmented datasets to isolate core predictors, such as: time since last transaction, daily transaction count and amount, transaction amount and age. In all datasets, these lean MSMs achieved near-optimal 𝐴𝐼𝐶 and log-likelihood values with consistently stable risk ratios, yet their standalone predictive accuracy remained modest (~0.50) and prone to high false-positive rates. By feeding MSM transition probabilities into a basic Random Forest (RF), without hyperparameter tuning, we increase accuracy to 0.81 on the original data (versus 0.72 with SMOTE and 0.57 with GAN), underscoring the importance of hybridisation in highly imbalanced settings. Using the MSM‐RF hybrid approach also yielded strong class‐specific performance: for fraud-network, precision reached 0.95 and recall 0.92; for normal behaviour, both precision and recall hovered around 0.80. Crucially, the early detection recall jumped from 0.42 with pure MSM to 0.66 under SMOTE (and 0.16 with GAN), illustrating the trade-off between sensitivity and precision in highly imbalanced settings. We further show that synthetic augmentation preserves core temporal signals but can introduce parameter instability. Overall, our findings demonstrate that parsimony and temporal interpretability outweigh the complexity of the model in sequential fraud-risk modelling and provide a practical blueprint for real-time deployment.

Descrição

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science

Palavras-chave

Multi-State Markov Models Fraud Detection Class imbalance Behavioural modelling

URI

http://hdl.handle.net/10362/190719

Coleções

NIMS - Dissertações de Mestrado em Ciência de Dados e Métodos Analíticos Avançados (Data Science and Advanced Analytics)

Licença CC

cclicense-by

Ver registo completo