Utilize este identificador para referenciar este registo: http://hdl.handle.net/10362/190563
Título: Automating news classification with large language models: Exploring fine-tuning, dataset size, and architecture
Autor: Yesilyurt, Burcu
Orientador: Bação, Fernando José Ferreira Lucas
Palavras-chave: Large Language Models
Text Classification
News Classification
Fine-Tuning
Hyperparameter Optimization
BERT
SDG 4 - Quality education
SDG 9 - Industry, innovation and infrastructure
SDG 16 - Peace, justice and strong institutions
Data de Defesa: 29-Out-2025
Resumo: A comprehensive evaluation benchmarks Large Language Models with traditional machine learning algorithms for automatic news classification is done on three standard news classification datasets: BBC News, 20 Newsgroups, and AG News. We implement traditional models, including Naive Bayes, Logistic Regression, Support Vector Machine, and Random Forest, to provide clear and interpretable baselines using manual term-frequency and syntactic features. Then, fine‐tuned transformer architectures, including BERT, RoBERTa, T5, GPT, and their distilled variants, were used to quantify improvements in predictive accuracy, resource efficiency, and explainability. Performance is measured via 5-fold cross-validation using F1 and accuracy metrics, and statistical significance is assessed with a Friedman test followed by Holm’s correction. Results show that transformer models consistently outperform classical approaches, with BERT achieving the highest scores under both balanced and imbalanced conditions. Distilled models rival or surpass full-size transformers on larger datasets while reducing memory requirements and maintaining comparable inference latency. Attention‐based attribution methods provide semantic explanations on par with feature‐importance metrics, confirming that LLMs deliver superior accuracy, adaptability, and transparency in news classification. Future work should investigate multilingual pretraining, multilabel classification, and ensemble techniques to further strengthen real‐time, explainable news‐analysis pipelines.
Descrição: Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
URI: http://hdl.handle.net/10362/190563
Designação: Mestrado em Ciência de Dados e Métodos Analíticos Avançados, especialização em Data Science
Aparece nas colecções:NIMS - Dissertações de Mestrado em Ciência de Dados e Métodos Analíticos Avançados (Data Science and Advanced Analytics)

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
TCDMAA4245.pdf7,33 MBAdobe PDFVer/Abrir    Acesso Restrito. Solicitar cópia ao autor!


FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpace
Formato BibTex MendeleyEndnote 

Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.