Logo do repositório
 
A carregar...
Miniatura
Publicação

Using Self-Organizing Maps to Triage Software Bug Reports: Studying the Effect of Using Different Text Vectorization Methods

Utilize este identificador para referenciar este registo.
Nome:Descrição:Tamanho:Formato: 
TCDMAA2126.pdf7.21 MBAdobe PDF Ver/Abrir

Resumo(s)

Bug Report Triage involves the assignment of software developers to fix bugs for which their experience is a good match. Given the numbers of bugs being reported to software projects, several methods have been proposed to address this task in an automated way. Approaches based on Information Retrieval make use of the textual components of the Bug Reports, however most previous studies have been concerned only with evaluating the performance of the models being used, with very few also considering the question of how best to represent the textual features. Self-Organizing Maps have long been used in Information Retrieval, however very few studies have considered it for use in the field of Bug Report Triage. This thesis aims to bridge these gaps by using Self-Organizing Maps for triaging Bug Reports, and evaluating the effect of using different text vectorization methods to represent the text. Using ten Bug Report corpora and applying six different preprocessing treatments to each, we tested five different vectorization methods: TFIDF, Word2Vec, Doc2Vec, BERT, and SBERT. We found that there was a statistically significant difference in the performance of the five vectorization methods, with SBERT obtaining better Accuracy@1 scores than the rest. We also tested other initialization parameters for training the SOM and found that the results were largely consistent across the different conditions. In addition, we developed SOMBR, an interactive dashboard based on the SOM, which allows users to query the Bug Report corpus to find the query’s Best Matching Unit, enabling the user to inspect the developers that have fixed bugs that are semantically similar. Based on these experiments, we believe that the SOM is a promising tool for semi-automated Bug Report triage, however further studies need to be conducted to determine how the evaluation scores can be improved further.

Descrição

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics

Palavras-chave

Self-Organizing Maps Information Retrieval Bug Report Triage NLP Text Mining Visualization

Contexto Educativo

Citação

Projetos de investigação

Unidades organizacionais

Fascículo