Development of a Smartphone Application and Chrome Extension to Detect Fake News in English and European Portuguese

Afonso, Ricardo Oliveira

http://hdl.handle.net/10362/176315

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
Afonso_2023.pdf		27.3 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Afonso, Ricardo Oliveira

Orientador(es)

Rosas, João

Resumo(s)

This project focused on fighting against the threat of fake news that has been increasing for the past few years. Many aspects play an important role when it comes to differentiating fake news from real news, making it challenging for humans to tell them apart. Fortunately, Artificial Intelligence methods can help in this troublesome process. Although many projects have been developed regarding fake news detection in English, the same cannot be said about European Portuguese. As of now and based on the conducted search, only two projects have explored this important task in this language, leaving much more room for creativity and improvement. As a consequence, this project also explored different approaches that had not been considered before in the context of fake news detection in European Portuguese. Many natural language processing and feature extraction techniques were used to gather relevant insights from data, along with different Machine Learning classifiers and Deep Learning models. The data for the English models was obtained from public datasets designed for fake news detection tasks, while the European Portuguese models had to resort to a dataset created from scratch by scraping data from fact-checking websites and real and fake news websites, given the lack of public datasets in this language. The creation of the first public dataset for fake news detection in European Portuguese is a commendable step in addressing the challenge of this field, as it can pave the way for more research and innovation in this specific language. Furthermore, the new techniques allied with the scraped data also made it possible to achieve better results than previously developed work surrounding this important topic. Therefore, this project will prove worthy not only for those who use the developed Chrome extension and smartphone application to analyse the content of websites in both English and European Portuguese languages, but also for any future researcher interested in this field and with the desire to contribute to this cause.

Este projeto teve um foco no combate contra o perigo das notícias falsas que tem vindo a aumentar nos últimos anos. São vários os aspetos a considerar quando se pretende diferenciar notícias verdadeiras de falsas, tornando este processo desafiante para o ser humano. Felizmente, os métodos de Inteligência Artificial podem ajudar neste processo problemático. Apesar de já terem sido desenvolvidos vários projetos relativamente à identificação de notícias falsas em Inglês, o mesmo não se verifica em Português Europeu. Até ao momento e de acordo com a pesquisa realizada, apenas dois projetos exploraram esta importante tarefa neste idioma, havendo assim um maior espaço para melhoria e criatividade. Foram utilizadas várias técnicas de processamento de linguagem natural e recolha de caraterísticas, juntamente com diferentes classificadores de Machine Learning e modelos de Deep Learning. Os dados para os modelos em Inglês foram obtidos através de datasets públicos especificamente criados para este tipo de tarefas, enquanto os modelos em Português Europeu tiveram de recorrer a um dataset criado de raíz, face à ausência de datasets públicos, com recurso a web scrapers que permitiram a extração de dados de websites de verificação de factos e websites de notícias verdadeiras e falsas. A criação do primeiro dataset público para identificação de notícias falsas em Português Europeu representa um passo importante neste desafiante tópico, uma vez que pode abrir novos caminhos para mais investigação e inovação nesta língua específica. Além disso, as novas técnicas aliadas aos dados recolhidos permitiram obter melhores resultados do que os dos trabalhos previamente desenvolvidos. Como tal, este projeto revelar-se-á extremamente útil não só para aqueles que utiliza- rem a extensão Chrome e aplicação smartphone desenvolvidas para analisar o conteúdo de websites em Inglês e Português Europeu mas também para qualquer investigador interessado nesta área e com o desejo de contribuir para esta causa.

Palavras-chave

Machine Learning Deep Learning Web Scraping Natural Language Processing Term Frequency-Inverse Document Frequency Extra Gradient Boosting

URI

http://hdl.handle.net/10362/176315

Coleções

FCT: DEE - Dissertações de Mestrado

Ver registo completo