Logo do repositório
 
Publicação

Network TD-SOM, using self-organizing maps and network analysis to make sense of large collections of documents: the case of NOVA IMS Master“s theses

dc.contributor.advisorBação, Fernando José Ferreira Lucas
dc.contributor.authorMunhangane, Venâncio Tobias Antonio
dc.date.accessioned2023-03-13T13:55:59Z
dc.date.available2024-01-24T01:31:48Z
dc.date.issued2023-01-24
dc.descriptionDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Sciencept_PT
dc.description.abstractDigital libraries are a central technology for the dissemination and sharing of knowledge, endless quantities of documents are stored and accessed through them. However, the efficiency of the associated search systems and their ability to identify relevant documents continues to be a bottleneck, and are not keeping pace with the ever-increasing volume of stored data. In this thesis, we present Network TD-SOM, a systematic process that offers a practical method for organizing, searching, visualising, discovering, and extracting knowledge from a vast corpus. Network TD-SOM combines topic modelling with Self-Organizing Maps and Network Analysis algorithms, to provide a visually rich environment where the user can explore and interact with a corpus, and find relevant documents. We test two different topic modelling algorithms separately and use their topic vectors to produce a Self-Organizing Map, which in turn is simplified through the use of a hierarchical clustering algorithm. We apply Network Analysis to the documents using the 3 best topics of each document and visualise the relations between the different documents. Finally, the Network TD-SOM methodology is evaluated on the master’s thesis dataset from NOVA IMS. LDA and BERTopic successfully uncovered the thematic structure and extracted helpful knowledge from the dataset. In this context, BERTopic achieves better results and provides a more meaningful clustering solution. On the contrary, when it comes to the network analysis, and although the arrangement of the two network theses had similarities, the one modelled by using features/topics from LDA presents better results.pt_PT
dc.identifier.tid203218990pt_PT
dc.identifier.urihttp://hdl.handle.net/10362/150427
dc.language.isoengpt_PT
dc.subjectCorpuspt_PT
dc.subjectVisualisationpt_PT
dc.subjectTopic modellingpt_PT
dc.subjectClusteringpt_PT
dc.subjectNetwork analysispt_PT
dc.subjectSDG 4 - Quality educationpt_PT
dc.titleNetwork TD-SOM, using self-organizing maps and network analysis to make sense of large collections of documents: the case of NOVA IMS Master“s thesespt_PT
dc.typemaster thesis
dspace.entity.typePublication
rcaap.embargofct"(…) ter a possibilidade de elaborar e publicar um artigo numa revista cientĆ­fica com base na dissertação de mestrado."pt_PT
rcaap.rightsopenAccesspt_PT
rcaap.typemasterThesispt_PT
thesis.degree.nameMestrado em Ciência de Dados e Métodos Analíticos Avançados, especialização em Ciência de Dadospt_PT

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
TCDMAA1363.pdf
Tamanho:
5.66 MB
Formato:
Adobe Portable Document Format
LicenƧa
A mostrar 1 - 1 de 1
Miniatura indisponĆ­vel
Nome:
license.txt
Tamanho:
348 B
Formato:
Item-specific license agreed upon to submission
Descrição: