Credit Card Fraud Detection: Using Multi-state Markov Models

Bravo, Jorge Miguel VenturaRoiçado, Diogo Miguel Cardoso2025-11-142025-11-142025-10-30http://hdl.handle.net/10362/190719Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceIn this study, we investigate whether a compact set of behavioural covariates, embedded in a continuoustime multi-state Markov model (MSM), can reliably forecast credit-card fraud risk—and whether combining MSM-derived transition probabilities with a simple machine-learning classifier enhances early-fraud detection. We apply stepwise, Akaike Information Criterion (𝐴𝐼𝐶)-guided selection on MSMs fitted to the original, undersampled, Synthetic Minority Over-sampling Technique (SMOTE)- augmented, and Generative Adversarial Networks (GAN)-augmented datasets to isolate core predictors, such as: time since last transaction, daily transaction count and amount, transaction amount and age. In all datasets, these lean MSMs achieved near-optimal 𝐴𝐼𝐶 and log-likelihood values with consistently stable risk ratios, yet their standalone predictive accuracy remained modest (~0.50) and prone to high false-positive rates. By feeding MSM transition probabilities into a basic Random Forest (RF), without hyperparameter tuning, we increase accuracy to 0.81 on the original data (versus 0.72 with SMOTE and 0.57 with GAN), underscoring the importance of hybridisation in highly imbalanced settings. Using the MSM‐RF hybrid approach also yielded strong class‐specific performance: for fraud-network, precision reached 0.95 and recall 0.92; for normal behaviour, both precision and recall hovered around 0.80. Crucially, the early detection recall jumped from 0.42 with pure MSM to 0.66 under SMOTE (and 0.16 with GAN), illustrating the trade-off between sensitivity and precision in highly imbalanced settings. We further show that synthetic augmentation preserves core temporal signals but can introduce parameter instability. Overall, our findings demonstrate that parsimony and temporal interpretability outweigh the complexity of the model in sequential fraud-risk modelling and provide a practical blueprint for real-time deployment.engMulti-State Markov ModelsFraud DetectionClass imbalanceBehavioural modellingCredit Card Fraud Detection: Using Multi-state Markov Modelsmaster thesis204072263