Imbalanced Learning: A comparative study of oversampling and undersampling techniques

Marques, Henrique Miguel Pires

http://hdl.handle.net/10362/175263

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
TCDMAA3129.pdf		3.73 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Marques, Henrique Miguel Pires

Orientador(es)

Bação, Fernando José Ferreira Lucas

Resumo(s)

Imbalanced data distribution is a recurrent and challenging problem in classification models as most algorithms are designed to assume balanced data. This imbalance often results in poor predictive performance for the minority class, despite an acceptable overall accuracy. A common and easily implementable approach to address this issue is resampling, which can be categorized into oversampling, undersampling, and hybrid methods—a combination of both. However, the effectiveness of these techniques varies based on dataset characteristics such as imbalance ratio, class overlap, and dimensionality. This study evaluates 10 resampling techniques across 35 benchmark datasets from various domains. To mitigate classifier bias, the evaluation employs 4 different classifiers. Unlike many studies focusing on individual resampling types, this research concurrently examines all three categories of resampling methods. Furthermore, the study offers a detailed analysis of average scores and rankings, facilitating a deeper understanding of each technique's relative performance. It also provides specific guidelines for selecting appropriate resampling methods based on the characteristics of each dataset. These findings aim to improve the application of resampling methods, helping practitioners make informed decisions to enhance classification performance in the presence of imbalanced data.

Descrição

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science

Palavras-chave

Imbalanced Learning Class Imbalance Oversampling Undersampling Resampling

URI

http://hdl.handle.net/10362/175263

Coleções

NIMS - Dissertações de Mestrado em Ciência de Dados e Métodos Analíticos Avançados (Data Science and Advanced Analytics)

Licença CC

cclicense-by

Ver registo completo