A carregar...
Projeto de investigação
COALA - Cloud-based AI-driven and Language-agnostic Customer Support Assistant
Financiador
Autores
Publicações
Document Clustering as an approach to template extraction
Publication . Rodrigues, André Miguel Fernandes; Almeida, Mariana Sá Correia Leite de; Rei, Ricardo Costa Dias
A great part of customer support is done via the exchange of emails. As the number of emails exchanged daily is constantly increasing, companies need to find approaches to ensure its efficiency. One common strategy is the usage of template emails as an answer. These answers templates are usually found by a human agent through the repetitive usage of the same answer. In this work, we use a clustering approach to find these answer templates. Several clustering algorithms are researched in this work, with a focus on the k-means methodology, as well as other clustering components such as similarity measures and pre-processing steps. As we are dealing with text data, several text representation methods are also compared. Due to the peculiarity of the provided data, we are able to design methodologies to ensure the feasibility of this task and develop strategies to extract the answer templates from the clustering results.
Multilingual email zoning
Publication . Jardim, Bruno; Rei, Ricardo; Almeida, Mariana S.C.; NOVA Information Management School (NOVA IMS)
The segmentation of emails into functional zones (also dubbed email zoning) is a relevant preprocessing step for most NLP tasks that deal with emails. However, despite the multilingual character of emails and their applications, previous literature regarding email zoning corpora and systems was developed essentially for English. In this paper, we analyse the existing email zoning corpora and propose a new multilingual benchmark composed of 625 emails in Portuguese, Spanish and French. Moreover, we introduce OKAPI, the first multilingual email segmentation model based on a language agnostic sentence encoder. Besides generalizing well for unseen languages, our model is competitive with current English benchmarks, and reached new state-of-the-art performances for domain adaptation tasks in English.
Unidades organizacionais
Descrição
Palavras-chave
Contribuidores
Financiadores
Entidade financiadora
European Commission
Programa de financiamento
H2020
Número da atribuição
873904
