| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 3.73 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
Imbalanced data distribution is a recurrent and challenging problem in classification models
as most algorithms are designed to assume balanced data. This imbalance often results in poor
predictive performance for the minority class, despite an acceptable overall accuracy. A
common and easily implementable approach to address this issue is resampling, which can be
categorized into oversampling, undersampling, and hybrid methods—a combination of both.
However, the effectiveness of these techniques varies based on dataset characteristics such
as imbalance ratio, class overlap, and dimensionality. This study evaluates 10 resampling
techniques across 35 benchmark datasets from various domains. To mitigate classifier bias,
the evaluation employs 4 different classifiers. Unlike many studies focusing on individual
resampling types, this research concurrently examines all three categories of resampling
methods. Furthermore, the study offers a detailed analysis of average scores and rankings,
facilitating a deeper understanding of each technique's relative performance. It also provides
specific guidelines for selecting appropriate resampling methods based on the characteristics
of each dataset. These findings aim to improve the application of resampling methods, helping
practitioners make informed decisions to enhance classification performance in the presence
of imbalanced data.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
Palavras-chave
Imbalanced Learning Class Imbalance Oversampling Undersampling Resampling
