| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 1.72 MB | Adobe PDF |
Orientador(es)
Resumo(s)
The Automatic identification of characters and their interactions from literary fiction is, arguably, a complex task that requires pipelines that leverage multiple Natural Language Processing (NLP) methods, such as Named Entity Recognition (NER) and Part-of-speech (POS) tagging. However, these methods are not optimized for retrieving Social Networks of Characters. Indeed, the currently available methods tend to underperform, especially in less-represented languages, due to a lack of manually annotated data for training. Here, we propose a pipeline, which we call Taggus, to extract social networks from literary fiction works in Portuguese without requiring a training phase. Our results show that compared to readily available State-of-the-Art tools—off-the-shelf NER tools and Large Language Models (ChatGPT)—the resulting pipeline, which uses POS tagging and a combination of heuristics, achieves satisfying results with an average F1-Score of 92.1% in the task of identifying characters and solving for co-reference and 74.0% in interaction detection. These represent, respectively, increases of 115.7% and 38.1% over the results achieved by the readily available State-of-the-Art tools. Further steps to improve results are outlined, including methods for detecting relationships among characters. Limitations on the size and scope of our testing samples are acknowledged and in the exclusive focus on interactions as the relationship type. The Taggus pipeline is publicly available to encourage development in this field for the Portuguese language.
Descrição
Canário, T. G. G., Duarte, C. R., Pinheiro, F. L., & Pereira, J. L. M. (2026). Taggus: an automated pipeline for the extraction of characters’ social networks from portuguese fiction literature. Social Network Analysis and Mining, 16, Article 40. https://doi.org/10.1007/s13278-026-01584-6 --- FLP acknowledges the financial support provided by FCT Portugal (Fundação para a Ciência e a Tecnologia), under projects - UID/04152/2025 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS - https://doi.org/10.54499/UID/04152/2025 (2025-01-01/2028-12-31), UID/PRR/04152/2025 https://doi.org/10.54499/UID/PRR/04152/2025 (2025-01-01/ 2026-06-30), and Know-Net-Compet (https://doi.org/10.54499/2024.07378.IACD). JLMP acknowledges the financial support from FCT Portugal within the R&D Unit Project Scope UID/00319/2025 - Centro ALGORITMI (ALGORITMI/UM).
Palavras-chave
Social Network Analysis Entity Recognition Entity Extraction Co-reference Resolution Relationship Extraction Information Systems Communication Media Technology Human-Computer Interaction Computer Science Applications
