Publicação
Network TD-SOM, using self-organizing maps and network analysis to make sense of large collections of documents: the case of NOVA IMS Master“s theses
| dc.contributor.advisor | Bação, Fernando José Ferreira Lucas | |
| dc.contributor.author | Munhangane, Venâncio Tobias Antonio | |
| dc.date.accessioned | 2023-03-13T13:55:59Z | |
| dc.date.available | 2024-01-24T01:31:48Z | |
| dc.date.issued | 2023-01-24 | |
| dc.description | Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science | pt_PT |
| dc.description.abstract | Digital libraries are a central technology for the dissemination and sharing of knowledge, endless quantities of documents are stored and accessed through them. However, the efficiency of the associated search systems and their ability to identify relevant documents continues to be a bottleneck, and are not keeping pace with the ever-increasing volume of stored data. In this thesis, we present Network TD-SOM, a systematic process that offers a practical method for organizing, searching, visualising, discovering, and extracting knowledge from a vast corpus. Network TD-SOM combines topic modelling with Self-Organizing Maps and Network Analysis algorithms, to provide a visually rich environment where the user can explore and interact with a corpus, and find relevant documents. We test two different topic modelling algorithms separately and use their topic vectors to produce a Self-Organizing Map, which in turn is simplified through the use of a hierarchical clustering algorithm. We apply Network Analysis to the documents using the 3 best topics of each document and visualise the relations between the different documents. Finally, the Network TD-SOM methodology is evaluated on the masterās thesis dataset from NOVA IMS. LDA and BERTopic successfully uncovered the thematic structure and extracted helpful knowledge from the dataset. In this context, BERTopic achieves better results and provides a more meaningful clustering solution. On the contrary, when it comes to the network analysis, and although the arrangement of the two network theses had similarities, the one modelled by using features/topics from LDA presents better results. | pt_PT |
| dc.identifier.tid | 203218990 | pt_PT |
| dc.identifier.uri | http://hdl.handle.net/10362/150427 | |
| dc.language.iso | eng | pt_PT |
| dc.subject | Corpus | pt_PT |
| dc.subject | Visualisation | pt_PT |
| dc.subject | Topic modelling | pt_PT |
| dc.subject | Clustering | pt_PT |
| dc.subject | Network analysis | pt_PT |
| dc.subject | SDG 4 - Quality education | pt_PT |
| dc.title | Network TD-SOM, using self-organizing maps and network analysis to make sense of large collections of documents: the case of NOVA IMS Master“s theses | pt_PT |
| dc.type | master thesis | |
| dspace.entity.type | Publication | |
| rcaap.embargofct | "(ā¦) ter a possibilidade de elaborar e publicar um artigo numa revista cientĆfica com base na dissertação de mestrado." | pt_PT |
| rcaap.rights | openAccess | pt_PT |
| rcaap.type | masterThesis | pt_PT |
| thesis.degree.name | Mestrado em CiĆŖncia de Dados e MĆ©todos AnalĆticos AvanƧados, especialização em CiĆŖncia de Dados | pt_PT |
