Logo do repositório
 
Publicação

Predictive model for detecting fake reviews: Exploring the possible enhancements of using word embeddings

dc.contributor.advisorBação, Fernando José Ferreira Lucas
dc.contributor.authorMacean, Doris
dc.date.accessioned2023-04-27T14:38:30Z
dc.date.available2023-04-27T14:38:30Z
dc.date.issued2023-04-11
dc.descriptionDissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Sciencept_PT
dc.description.abstractFake data contaminates the insights that can be obtained about a product or service and ultimately hurts both businesses and consumers. Being able to correctly identify the truthful reviews will ensure consumers are able to more effectively find products that suit their needs. The following paper aims to develop a predictive model for detecting fake hotel reviews using Natural Language Processing techniques and applying various Machine Learning models. The current research in this area has primarily focused on sentiment analysis and the detection of fake reviews using various text mining methods including bag of words, tokenization, POS tagging and TF-IDF. The research mostly looks at some combination of quantitative and qualitative information. The text component is only analyzed with regards to which words appear in the review, while the semantic relationship is ignored. This research attempts to develop a higher level of performance by implementing pretrained word embeddings during the preprocessing of the text data. The goal is to introduce some context to the text data and see how each model’s performance changes. Traditional text mining models were applied to the dataset to provide a benchmark. Subsequently, GloVe, Word2Vec and BERT word embeddings were implemented and the performance of 8 models was reviewed. The analysis shows a somewhat lower performance obtained by the word embeddings. It seems that in texts of short length, the appearance of words is more indicative of a fake review than the semantic meaning of those words.pt_PT
dc.identifier.tid203268580pt_PT
dc.identifier.urihttp://hdl.handle.net/10362/152177
dc.language.isoengpt_PT
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectNatural Language Processingpt_PT
dc.subjectMachine Learningpt_PT
dc.subjectText Miningpt_PT
dc.subjectSentiment Analysispt_PT
dc.subjectWord Embeddingspt_PT
dc.titlePredictive model for detecting fake reviews: Exploring the possible enhancements of using word embeddingspt_PT
dc.typemaster thesis
dspace.entity.typePublication
rcaap.rightsopenAccesspt_PT
rcaap.typemasterThesispt_PT
thesis.degree.nameMestrado em Ciência de Dados e Métodos Analíticos Avançados, especialização em Ciência de Dadospt_PT

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
TCDMAA2936.pdf
Tamanho:
1.48 MB
Formato:
Adobe Portable Document Format
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
348 B
Formato:
Item-specific license agreed upon to submission
Descrição: