Fine-tuning a Multimodal Machine Learning Model for Key Information Extraction from Invoices and Receipts

Silva, Rodrigo Miguel Vidal da

Publicação

Fine-tuning a Multimodal Machine Learning Model for Key Information Extraction from Invoices and Receipts

2025-10-29Dissertação de mestrado

datacite.subject.fos	Ciências Naturais::Ciências da Computação e da Informação	pt_PT
dc.contributor.advisor	Damásio, Bruno Miguel Pinto
dc.contributor.author	Silva, Rodrigo Miguel Vidal da
dc.date.accessioned	2025-11-11T14:56:49Z
dc.date.available	2025-11-11T14:56:49Z
dc.date.issued	2025-10-29
dc.description	Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science	pt_PT
dc.description.abstract	The automated extraction of important information from different types of documents, especially invoices, is essential for improving business operations and increasing efficiency in finance. In the past, this was a time-consuming and error-prone manual task. Recently, progress in deep learning and transformer-based learning has renewed interest in automating this work. It offers promising solutions for smart document processing. This thesis tackles this issue by focusing on fine-tuning LayoutLMv3, a transformer-based model, to extract key fields from Portuguese invoices and receipts. The main goal of this research is to adjust LayoutLMv3 for a custom dataset of 813 invoice and receipt images in Portuguese. The model will be trained to clearly identify and extract important details like company name, address, date and total amount. This information is essential for keeping financial records and streamlining workflows. To prepare the training data, we first use Tesseract for an OCR step. This extracts raw text and their corresponding bounding box coordinates from the images. After that, we use a custom algorithm to accurately label text categories that either match or closely resemble the predefined annotations. This process ensures the dataset is properly formatted for LayoutLMv3's multimodal input needs. After the preprocessing and labeling steps, the LayoutLMv3 model is fine-tuned and evaluated. Its effectiveness is measured by comparing its performance to a well-known commercial solution, Google Document AI. This comparison aims to show the practical use and limitations of a custom-trained open-source model in a real-world scenario. The results show that Google Document AI outperforms the fine-tuned LayoutLMv3 model by a large margin. However, the findings offer valuable insights into the strengths and weaknesses of fine-tuned Transformer models for extracting information from documents in a low-resource language context and semi-structured document types. Additionally, this research can help improve automation in financial processes, reduce manual work, and provide a solid framework for similar document understanding tasks in different industries.	pt_PT
dc.identifier.tid	204070830
dc.identifier.uri	http://hdl.handle.net/10362/190490
dc.language.iso	eng	pt_PT
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	pt_PT
dc.subject	Optical Character Recognition	pt_PT
dc.subject	Key Information Extraction	pt_PT
dc.subject	Invoices	pt_PT
dc.subject	Multimodal Machine Learning Models	pt_PT
dc.subject	SDG 8 - Decent work and economic growth	pt_PT
dc.subject	SDG 12 - Responsible production and consumption	pt_PT
dc.subject	SDG 13 - Climate action	pt_PT
dc.subject	SDG 15 - Life on land	pt_PT
dc.title	Fine-tuning a Multimodal Machine Learning Model for Key Information Extraction from Invoices and Receipts	pt_PT
dc.type	master thesis
dspace.entity.type	Publication
rcaap.rights	openAccess	pt_PT
rcaap.type	masterThesis	pt_PT
thesis.degree.name	Mestrado em Ciência de Dados e Métodos Analíticos Avançados, especialização em Data Science	pt_PT

Ficheiros

Principais

A mostrar 1 - 1 de 1

Nome:: TCDMAA4263.pdf
Tamanho:: 1.93 MB
Formato:: Adobe Portable Document Format

Ver/Abrir

Licença

A mostrar 1 - 1 de 1

Nome:: license.txt
Tamanho:: 348 B
Formato:: Item-specific license agreed upon to submission
Descrição:

Ver/Abrir

Coleções

NIMS - Dissertações de Mestrado em Ciência de Dados e Métodos Analíticos Avançados (Data Science and Advanced Analytics)