| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 1.55 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
In recent years, Artificial Intelligence has made significant contributions to a diverse range of
industries. Process mining is an emerging research field that leverages event data to extract
insights from the business processes of organizations. Analyzing and extracting these insights
often requires specific and expert knowledge and technical skills. However, Large Language
Models are being used by managers without technical skills to extract insights from processes
and are yet to be recognized as accurate tools for such process analytics tasks. This research
investigates the application of Large Language Models to enhance the accessibility of process
mining tasks, aiming to bridge the gap between technical complexity and business user
usability. The primary objective is to empower non-technical stakeholders to ask questions in
natural language and receive accurate answers derived from process data, without the need
for programming expertise or in-depth familiarity with the underlying data. The proposed
framework evaluates how well various LLMs, including both proprietary and open-source
models, can interpret process mining queries and return reliable, data-driven responses.
Although a library of Process Mining for Python (PM4Py) is available, it can present a steep
learning curve for analysts unfamiliar with its structure and functionalities, not leveraging to
full advantage. To address this, the framework integrates Retrieval-Augmented Generation
(RAG) to provide models with structured documentation as external context. Evaluation was
performed by comparing model responses to expert-validated ground truths across two
datasets, Sepsis and Insurance Claims, using 25 standardized questions. GPT-4.1-mini achieved
the best overall results, reaching 88% accuracy on Sepsis data with semantic chunking and
84% on Insurance Claims data with recursive chunking. GPT-4.1 and GPT-4o also showed
strong performances, particularly when supported by RAG. In contrast, open-source models
failed to match the performance of proprietary alternatives. Overall, results demonstrated the
significant potential of certain Large Language Models as valuable tools for interpreting event
logs and making process mining more accessible. However, performance varies considerably
across models and configurations, and not all benefit from added context. These findings
suggest that while LLMs hold strong potential in this domain, careful model selection and
retrieval design remain essential for real-world applications.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business Analytics
Palavras-chave
Process Mining Large Language Models Retrieval-Augmented Generation SDG 8 - Decent work and economic growth SDG 9 - Industry, innovation and infrastructure
