Bação, Fernando José Ferreira LucasGayer, Felix Sebastian2024-11-142024-10-21http://hdl.handle.net/10362/175217Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThis study aims to empower content creators to adopt a data-driven approach by enabling them to independently understand the dynamics and improve the performance metrics of article-based subscriptions/conversions and content views. Focusing on online articles published by Stuttgarter Zeitung from 2021 to 2023, the study classifies articles into different performance categories to identify key similarities and differences. Using advanced Machine Learning techniques for feature extraction such as Named Entity Recognition (NER), Part-of-Speech tagging (POS) and Transformerbased Topic Modelling, the study extracts pre-publication metadata and content information, emphasizing human-interpretable results. The results provide valuable insights into customers' content interests and metadata preferences. Despite the overall similarity in the profiles of high and low performing articles for both target variables, numerous nuanced factors influencing conversions and content views were identified. These factors are often newspaper section or topic specific and can differ significantly from global (all articles combined) trends. Consequently, the result notebooks provide detailed information that is particularly useful for content creators. Based on these insights, an interactive tool has been developed to help journalists align their efforts with the company's goals to independently increase conversions and content views, without prescribing specific stories or formats.engData-Driven JournalismArticle-PerformanceContent-ViewsConversionsMetadata AnalysisTopic ModellingClusteringNamed Entity RecognitionPart-of-Speech taggingFeature ExtractionBERTopicKeyBERTLarge Language ModelCRISP-DMSpaCyFLAIRSDG 4 - Quality educationSDG 8 - Decent work and economic growthEmpowering Journalists with Data: Improving online article performancemaster thesis203776585