| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 2.71 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
As an insurance company, Ageas Portugal has lots of data related to their customers. Usually, most of
data used by companies (disregarding few companies that already use advanced machine learning and
artificial intelligence techniques) are structured data, that are known as formatted datasets and tables
with customer information. But, with the advance of technology, more companies are starting to use
their unstructured data, which could be helpful to find insights and achieve goals.
From the different data sources in human language form the company has as emails, customer surveys,
medical transcriptions and etc., we have agreed an email database would be the best option for the
project development. This type of data requires a very thorough data preparation as there are
irrelevant parts within emails as signatures and disclaimers, which should be excluded.
Analyzing customer’s interaction with the company we could find insights about how to increase sales
and reduce churn rate. We have applied two Text Mining techniques (Sentiment Analysis and Topic
Classification) and a proof of concept was conducted. It showed that clients who send or are
mentioned in emails tend to cancel their policies at higher rate than those without emails, even if the
email’s topic is not related to cancellation. It has also showed that the effect of sentiment on
cancellations behavior appears to be mixed, requiring further analysis.
The full project was developed in Python but there was also a comparison with other market solutions
as Amazon Web Services, SAS, Google Cloud and Microsoft Azure, in order to find the best Text Mining
tool to fit with the company. As expected, Python was elected as the best option.
Descrição
Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics
Palavras-chave
Text mining Text analytics Natural language processing Sentiment analysis Topic classification
