Empowering LLMs for Process Mining: The Role of Retrieval-Augmented Generation

Caldeira, João Carlos Palmela PinheiroJesus, Marta de Oliveira2025-11-112025-10-29http://hdl.handle.net/10362/190487Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsIn recent years, Artificial Intelligence has made significant contributions to a diverse range of industries. Process mining is an emerging research field that leverages event data to extract insights from the business processes of organizations. Analyzing and extracting these insights often requires specific and expert knowledge and technical skills. However, Large Language Models are being used by managers without technical skills to extract insights from processes and are yet to be recognized as accurate tools for such process analytics tasks. This research investigates the application of Large Language Models to enhance the accessibility of process mining tasks, aiming to bridge the gap between technical complexity and business user usability. The primary objective is to empower non-technical stakeholders to ask questions in natural language and receive accurate answers derived from process data, without the need for programming expertise or in-depth familiarity with the underlying data. The proposed framework evaluates how well various LLMs, including both proprietary and open-source models, can interpret process mining queries and return reliable, data-driven responses. Although a library of Process Mining for Python (PM4Py) is available, it can present a steep learning curve for analysts unfamiliar with its structure and functionalities, not leveraging to full advantage. To address this, the framework integrates Retrieval-Augmented Generation (RAG) to provide models with structured documentation as external context. Evaluation was performed by comparing model responses to expert-validated ground truths across two datasets, Sepsis and Insurance Claims, using 25 standardized questions. GPT-4.1-mini achieved the best overall results, reaching 88% accuracy on Sepsis data with semantic chunking and 84% on Insurance Claims data with recursive chunking. GPT-4.1 and GPT-4o also showed strong performances, particularly when supported by RAG. In contrast, open-source models failed to match the performance of proprietary alternatives. Overall, results demonstrated the significant potential of certain Large Language Models as valuable tools for interpreting event logs and making process mining more accessible. However, performance varies considerably across models and configurations, and not all benefit from added context. These findings suggest that while LLMs hold strong potential in this domain, careful model selection and retrieval design remain essential for real-world applications.engProcess MiningLarge Language ModelsRetrieval-Augmented GenerationSDG 8 - Decent work and economic growthSDG 9 - Industry, innovation and infrastructureEmpowering LLMs for Process Mining: The Role of Retrieval-Augmented Generationmaster thesis204072050