| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 1.97 MB | Adobe PDF |
Orientador(es)
Resumo(s)
Class imbalance is a pervasive problem in machine learning, where one class, often the class of interest, is underrepresented relative to others. This imbalance can severely compromise the performance of standard classifiers, which tend to favor the majority class. A well-known example is a classifier that predicts a 99.9% majority class and a 0.1% minority class, achieving high accuracy while being practically useless. This thesis introduces Anchor-Based Density Undersampling with Swarm Stabilization (ABDUSS), a novel resampling method that employs a swarm-inspired undersampling heuristic to intelligently reduce majority-class instances based on data density. The method (i) estimates minority-class density via feature-wise smoothing, (ii) selects a high-density minority anchor, (iii) applies a lightweight continuous swarm update toward this anchor to stabilize the search space, and (iv) removes majority samples within a density-scaled radius using a KD-tree range query. Unlike optimization-based PSO mask methods, ABDUSS avoids inner-loop validation optimization, operating instead as a fast density-guided heuristic with a single scaling parameter. Experimental evaluation on three imbalanced datasets (credit card fraud, telco churn, and customer satisfaction) using XGBoost shows that ABDUSS reduces overlap near minority clusters and achieves competitive F1-score and AUC, with improved minority recall in several scenarios. These results indicate that ABDUSS provides a simple, computationally efficient, and reproducible baseline for density-aware undersampling in imbalanced classification tasks.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
Palavras-chave
Class imbalance Density-based undersampling Instance selection Data-level preprocessing Imbalanced classification Swarm-inspired methods
