Predictive model for detecting fake reviews: Exploring the possible enhancements of using word embeddings

Macean, Doris

http://hdl.handle.net/10362/152177

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
TCDMAA2936.pdf		1.48 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Macean, Doris

Orientador(es)

Bação, Fernando José Ferreira Lucas

Resumo(s)

Fake data contaminates the insights that can be obtained about a product or service and ultimately hurts both businesses and consumers. Being able to correctly identify the truthful reviews will ensure consumers are able to more effectively find products that suit their needs. The following paper aims to develop a predictive model for detecting fake hotel reviews using Natural Language Processing techniques and applying various Machine Learning models. The current research in this area has primarily focused on sentiment analysis and the detection of fake reviews using various text mining methods including bag of words, tokenization, POS tagging and TF-IDF. The research mostly looks at some combination of quantitative and qualitative information. The text component is only analyzed with regards to which words appear in the review, while the semantic relationship is ignored. This research attempts to develop a higher level of performance by implementing pretrained word embeddings during the preprocessing of the text data. The goal is to introduce some context to the text data and see how each model’s performance changes. Traditional text mining models were applied to the dataset to provide a benchmark. Subsequently, GloVe, Word2Vec and BERT word embeddings were implemented and the performance of 8 models was reviewed. The analysis shows a somewhat lower performance obtained by the word embeddings. It seems that in texts of short length, the appearance of words is more indicative of a fake review than the semantic meaning of those words.

Descrição

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science

Palavras-chave

Natural Language Processing Machine Learning Text Mining Sentiment Analysis Word Embeddings

URI

http://hdl.handle.net/10362/152177

Coleções

NIMS - Dissertações de Mestrado em Ciência de Dados e Métodos Analíticos Avançados (Data Science and Advanced Analytics)

Licença CC

cclicense-by

Ver registo completo