| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 7.21 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
Bug Report Triage involves the assignment of software developers to fix bugs for which their experience
is a good match. Given the numbers of bugs being reported to software projects, several methods have
been proposed to address this task in an automated way. Approaches based on Information Retrieval
make use of the textual components of the Bug Reports, however most previous studies have been
concerned only with evaluating the performance of the models being used, with very few also considering
the question of how best to represent the textual features. Self-Organizing Maps have long been used in
Information Retrieval, however very few studies have considered it for use in the field of Bug Report
Triage. This thesis aims to bridge these gaps by using Self-Organizing Maps for triaging Bug Reports,
and evaluating the effect of using different text vectorization methods to represent the text. Using
ten Bug Report corpora and applying six different preprocessing treatments to each, we tested five
different vectorization methods: TFIDF, Word2Vec, Doc2Vec, BERT, and SBERT. We found that there was
a statistically significant difference in the performance of the five vectorization methods, with SBERT
obtaining better Accuracy@1 scores than the rest. We also tested other initialization parameters for
training the SOM and found that the results were largely consistent across the different conditions. In
addition, we developed SOMBR, an interactive dashboard based on the SOM, which allows users to
query the Bug Report corpus to find the query’s Best Matching Unit, enabling the user to inspect the
developers that have fixed bugs that are semantically similar. Based on these experiments, we believe
that the SOM is a promising tool for semi-automated Bug Report triage, however further studies need to
be conducted to determine how the evaluation scores can be improved further.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics
Palavras-chave
Self-Organizing Maps Information Retrieval Bug Report Triage NLP Text Mining Visualization
