NIMS - Dissertações de Mestrado em Ciência de Dados e Métodos Analíticos Avançados (Data Science and Advanced Analytics)
URI permanente para esta coleção:
Anteriormente: Dissertações de Mestrado em Métodos Analíticos Avançados (Advanced Analytics)
Navegar
Entradas recentes
- Data-Driven Analysis and Prediction of Food Donation Dynamics: The Case of Intermittent Campaign Events for a Portuguese Food BankPublication . Elias, Maria Benedita Alves da Cruz Sêrro; Caldeira, João Carlos Palmela PinheiroFood banks face substantial uncertainty in the supply of in-kind donations. A major donation supply stream for food banks is their campaign events to collect non-perishable goods donated by the general public. This thesis aims to extend the literature on mitigating food donation uncertainty in food bank contexts by taking a data-driven approach embedded in the CRISP-DM framework. Its objective is to assess whether per-capita donation intensity differs across municipalities and retail chains using nonparametric Kruskal–Wallis tests and post-hoc Dunn tests under Holm and Bonferroni corrections, and to evaluate the extent to which next-campaign donations can be forecast accurately to support operational decision-making, considering a seasonal benchmark, a regression model (Ridge), and tree-based machine learning models (XGBoost and CatBoost). The Kruskal–Wallis results indicate statistically significant differences in per-capita donation distributions between chain groups (𝐿 < 0.001) and between municipalities ( 𝐿 < 0.001). Post-hoc Dunn comparisons identify materially different group pairs. Moreover, this thesis compares direct forecasting at each aggregation level (store, municipality, chain, total) against bottom-up hierarchical forecasting (store-level predictions aggregated upward). Results suggest that model choice depends on the aggregation level. Bottom-up aggregation improves accuracy relative to direct forecasting for municipalities and chains.
- Interactive Gamified Application: Improving Students' Motivation and Engagement in Database Courses: SQL Adventure in Ancient EgyptPublication . Vasconcelos, Beatriz Leonardo; Rio, José Américo Alves SusteloTraditional teaching methods in database courses often struggle to sustain student motivation and engagement, or to develop their practical skills. This thesis addresses these challenges by designing and developing a narrative driven gamified application aimed at improving SQL learning in higher education. Adopting a Design Science Research approach, the project integrates an escape room narrative, progressive SQL challenges, and behavioural analytics to provide an immersive learning experience aligned with key pedagogical objectives. Students progress by submitting correct SQL queries, which promotes iterative problem solving, experimentation, and self correction. The design is based on well-established frameworks, including MDA/MDE, Self Determination Theory, the Theory of Gamified Learning, and Digital Game Based Learning principles, to ensure that the mechanics, feedback loops, and aesthetics effectively support intrinsic motivation and skill acquisition. The analysis reveals strong learner engagement, positive perceptions of usability and narrative immersion, and measurable improvements in SQL proficiency. However, many limitations were found, so the thesis concludes with recommendations for future work.
- Emerging Social Trends among Portuguesespeaking Communities: Analysing TikTok Interactions and Digital DiscoursePublication . Taveira, Guilherme Rendeiro; Caldeira, João Carlos Palmela PinheiroWhile TikTok’s algorithmic architecture shapes global social dynamics, computational sociology remains heavily biased toward Anglophone datasets. Consequently, the discourse of the Portuguese-speaking community remains critically under-researched, obscuring whether digital polarization trends are universal or culturally specific. To address this gap, this study analyzed a massive corpus of TikTok interactions (N = 215,107) to uncover emerging social trends and sentiment dynamics within this ecosystem. The methodology integrated Neural Topic Modeling (BERTopic) with LLMassisted representation for thematic clustering, alongside a multilingual Transformer (XLM-RoBERTa) to calculate continuous emotional polarity. Results revealed distinct "Cultural Fingerprints", where the Portuguese-speaking community exhibited a significant "Positivity Bias," using the platform primarily for entertainment and social connection, in contrast to the critical baseline of Germanic users. Crucially, polarization was largely endogenous; rather than macro-political events, the most fiercely polarized topics were performative platform mechanics, such as livestream virtual gifting, which algorithmically manufacture friction between user factions. These findings challenge the assumption of a universal emotional baseline on global social networks. Ultimately, failing to account for distinct cultural baselines risks severe cross-cultural algorithmic bias in content moderation, highlighting the need for culturally specific frameworks in future digital sociology
- Understanding Which Factors Influence Wellbeing in Rural Tourism: An Expectation Confirmation Model ApproachPublication . Jacinto, Sofia Cordeiro Henriques; Tam Chuem Vai, CarlosThis study aims to identify and understand the factors that influence well-being in the context of rural tourism by applying the expectation confirmation model (ECM). Using a survey-based approach with data from 252 respondents, the results show that confirmation is shaped by novelty, while tourist experience has no direct effect. Perceived usefulness positively influences tourists' well-being and satisfaction, serving as an intermediary between the two. Overall, perceived usefulness stands out as the strongest driver of both well-being and satisfaction. These findings suggest that rural tourism destinations should align expectations with actual experiences, offer activities that feel meaningful and beneficial, and create environments that encourage relaxation and positive evaluations.
- Assessing the Success of ChatGPT as a Tool to Access Health Information: A Quantitative Study Employing PLS-SEMPublication . Ferreira, Yuri Marques; Tam Chuem Vai, CarlosDespite global commitments to universal health coverage (SDG 3.8), over half the world’s population lacks access to essential health services. This study explores the potential of large language models, specifically ChatGPT, as a tool to bridge this gap by improving health information accessibility. Grounded in the Delone & McLean Information Systems Success Model, augmented with constructs from the Health Belief Model and trust as a moderator, the research analyzes survey data from 225 Brazilian users. Results indicate that perceived usefulness driven by system, information, and service quality significantly predicts ChatGPT’s use for health information, while user satisfaction—influenced by use, information quality, and service quality—strongly correlates with net benefits. Trust moderates the relationship between use and satisfaction, with low-trust users experiencing greater satisfaction gains. The study highlights ChatGPT’s promise as a scalable tool to increase access to health information but underscores user satisfaction in perceived benefits. Practical insights include prioritizing information accuracy, user-friendly design, and targeted trust-building campaigns. This research contributes to the Delone & McLean Model literature by integrating health-specific variables and offers a foundation for future large language models applications in healthcare.
- An LLM-Based Conversational Agent for Multi-Operator Public Transport InformationPublication . Berthele, Louis Jacob; Jardim, João Bruno Morais de Sousa; Neto, Miguel de Castro Simões FerreiraRegional public transport information landscapes play a key role in shaping passengers’ ability to plan and undertake journeys, yet they remain difficult to navigate. In many regions, passenger information is fragmented across multiple websites and applications, creating barriers to adoption and limiting the modal shift required to achieve CO2 emission targets. Although a growing body of evidence suggests an increasing interest in conversational systems for public transport, existing solutions have largely focused on narrow segments of the passenger journey. This study proposes a framework for developing modular passenger-facing agentic conversational systems, combining a retrieval-augmented generation (RAG) component for textual information with a ReAct-inspired agentic routing component for transit network information. The framework includes a reproducible methodology for tuning and evaluating both components and is validated through a real-world pilot deployment in the Oeste region of Portugal, where three public transport operators serve 388,000 residents across 203 bus routes. Results showed reliable component-level performance on unseen evaluation datasets. User-based evaluation further indicated fitness for use, with participants reporting high overall satisfaction. The findings thus establish a baseline for agentic conversational systems in multi-operator public transport settings, and the framework’s reliance on standardized data formats supports transferability to other regions.
- The Dark Side of Technology: The Role of Social Media Features on Social Media Infidelity-Related BehaviorsPublication . Walihullah, Mohammad; Naranjo-Zolotov, Mijail JuanovichThe rapid expansion of social media has significantly transformed interpersonal communication, while also introducing new challenges for romantic relationships. As platforms promote constant connectivity, private messaging, and emotional accessibility, concerns have grown regarding their potential to facilitate infidelity-related behaviors. This study explores the dark side of digital technology by examining how specific psychological and behavioral factors contribute to Social Media Infidelity-Related Behaviors (SMIRB).Drawing on contemporary relationship and digital behavior literature, the research investigates the effects of Social Networking Site Addiction (SNSA), Attachment Anxiety (AA), Relationship Ambivalence (RAMB), Sexual Satisfaction (SS), and Problematic Internet Usage (PIU) on SMIRB. A quantitative approach was adopted, and data were collected through an online survey of 202 active social media users. The proposed model was tested using Partial Least Squares Structural Equation Modeling (PLS-SEM).The findings indicate that Attachment Anxiety, Relationship Ambivalence, and Problematic Internet Usage significantly and positively predict SMIRB. These results suggest that emotional insecurity, relational uncertainty, and maladaptive internet use increase vulnerability to digital boundary-crossing behaviors. In contrast, Social Networking Site Addiction and Sexual Satisfaction show no significant influence. Overall, the study highlights that psychological and relational vulnerabilities, rather than usage intensity alone, play a central role in technology-mediated infidelity.
- Anchor-Based Density Undersampling with Swarm StabilizationPublication . Páris, Pedro Maria Queiroz Pereira Rocha; Damásio, Bruno Miguel PintoClass imbalance is a pervasive problem in machine learning, where one class, often the class of interest, is underrepresented relative to others. This imbalance can severely compromise the performance of standard classifiers, which tend to favor the majority class. A well-known example is a classifier that predicts a 99.9% majority class and a 0.1% minority class, achieving high accuracy while being practically useless. This thesis introduces Anchor-Based Density Undersampling with Swarm Stabilization (ABDUSS), a novel resampling method that employs a swarm-inspired undersampling heuristic to intelligently reduce majority-class instances based on data density. The method (i) estimates minority-class density via feature-wise smoothing, (ii) selects a high-density minority anchor, (iii) applies a lightweight continuous swarm update toward this anchor to stabilize the search space, and (iv) removes majority samples within a density-scaled radius using a KD-tree range query. Unlike optimization-based PSO mask methods, ABDUSS avoids inner-loop validation optimization, operating instead as a fast density-guided heuristic with a single scaling parameter. Experimental evaluation on three imbalanced datasets (credit card fraud, telco churn, and customer satisfaction) using XGBoost shows that ABDUSS reduces overlap near minority clusters and achieves competitive F1-score and AUC, with improved minority recall in several scenarios. These results indicate that ABDUSS provides a simple, computationally efficient, and reproducible baseline for density-aware undersampling in imbalanced classification tasks.
- Comparing Econometric and Machine Learning Approaches to Volatility Forecasting: Evidence from Cryptocurrency and Traditional MarketsPublication . Montali, Davide; Bação, Fernando José Ferreira LucasCan machine learning methods outperform traditional econometric models for volatility forecasting? The answer, this thesis finds, depends on the specific asset under consideration. This thesis compares GARCH, EGARCH, Random Forest, XGBoost, and LSTM models using daily data from 2019 to 2025 for Bitcoin, Ethereum, the S&P 500, and VIX. Performance is measured out-of-sample via RMSE, with Diebold-Mariano tests employed to assess whether observed differences are statistically significant. The results resist a simple “ML wins” or “stick with GARCH” conclusion. For the S&P 500, no machine learning method significantly improves on GARCH, and LSTM performs worse than GARCH outright. The only statistically significant improvement for equities comes from EGARCH (p<0.01), which captures the leverage effect that symmetric GARCH misses. Model structure, not model complexity, is what matters for equity volatility. For Bitcoin, all machine learning methods significantly outperform GARCH, with XGBoost achieving the lowest RMSE and an improvement of approximately 11 per cent relative to the GARCH baseline. But for Ethereum, machine learning offers no significant improvement; EGARCH performs best. Same asset class. Different outcomes. From a practical standpoint, the 11% RMSE reduction for Bitcoin could meaningfully affect risk management and position sizing decisions. For equity markets, EGARCH represents the appropriate upgrade from basic GARCH. For Ethereum, the additional model complexity appears to offer no benefit. The findings suggest that model selection needs to be asset-specific and validated through statistical testing rather than RMSE rankings alone. Deep learning neither “always works” nor “never works” for volatility. It depends on the asset.
- Automation and productivity in Air Traffic Control: The role of automation in increasing air traffic controller productivityPublication . Sousa, Nuno Miguel Guerra de; Castelli, MauroAs European air traffic recovers to near pre-pandemic levels, the pressure to increase controller productivity without compromising safety or service quality has intensified. This thesis examines whether higher levels of automation and digitalisation among European Air Navigation Service Providers (ANSPs) are associated with greater ATCO-hour productivity, defined in EUROCONTROL’s ACE framework as composite flight-hours per ATCO hour on duty. Using 2023 ANSP-level benchmarking data and a custom-verified automation deployment index, we estimate cross-sectional OLS regressions with heteroskedasticity-robust standard errors across 33 ANSPs with complete covariates, controlling for scale and cost structure. In pooled specifications, the automation index is positive but not statistically significant once structural factors are accounted for, while traffic scale and support cost intensity are the most consistent correlates of productivity. Influence diagnostics identify MUAC as a highly influential, structurally distinct upper-airspace-only provider; when service scope is controlled for or MUAC is excluded, the estimated automation coefficient becomes positive and statistically significant, indicating sensitivity to provider scope and to how automation is measured. These findings are consistent with the literature showing that cross-sectional ANSP performance is dominated by structural heterogeneity and that automation benefits may manifest through workload, predictability, and resilience rather than through immediate changes in annual productivity ratios. From a policy perspective, the results caution against interpreting binary adoption of ATM technologies as a direct productivity lever in mature deployment environments and support greater emphasis on integration, usage-intensity metrics, and support-cost efficiency. Future research should combine longitudinal data analysis with usage-based operational indicators to estimate automation’s contribution more precisely.
