Logo do repositório
 
A carregar...
Miniatura
Publicação

G-SOMO : an oversampling approach based on self-organized map oversampling and geometric SMOTE

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
TAA0033.pdf2.06 MBAdobe PDF Ver/Abrir

Resumo(s)

Traditional supervised machine learning classifiers are challenged to learn highly skewed data distributions as they are designed to expect classes to equally contribute to the minimization of the classifiers cost function. Moreover, the classifiers design expects equal misclassification costs, causing a bias for underrepresented classes. Thus, different strategies to handle the issue are proposed by researchers. The modification of the data set managed to establish since the procedure is generalizable to all classifiers. Various algorithms to rebalance the data distribution through the creation of synthetic instances were proposed in the past. In this paper, we propose a new oversampling algorithm named G-SOMO, a method that is inspired by our previous research. The algorithm identifies optimal areas to create artificial data instances in an informed manner and utilizes a geometric region during the data generation to increase variability and to avoid correlation. Our experimental setup compares the performance of G-SOMO with a benchmark of effective oversampling methods. The oversampling methods are repeatedly validated with multiple classifiers on 69 datasets. Different metrics are used to compare the retrieved insights. To aggregate the different performances over all datasets, a mean ranking is introduced. G-SOMO manages to consistently outperform competing oversampling methods. The statistical significance of our results is proven.

Descrição

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics

Palavras-chave

Oversampling Imbalanced Learning Clustering Synthetic Data Generation

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo