| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 1.93 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
The automated extraction of important information from different types of documents,
especially invoices, is essential for improving business operations and increasing efficiency in
finance. In the past, this was a time-consuming and error-prone manual task. Recently,
progress in deep learning and transformer-based learning has renewed interest in automating
this work. It offers promising solutions for smart document processing. This thesis tackles this
issue by focusing on fine-tuning LayoutLMv3, a transformer-based model, to extract key fields
from Portuguese invoices and receipts. The main goal of this research is to adjust LayoutLMv3
for a custom dataset of 813 invoice and receipt images in Portuguese. The model will be
trained to clearly identify and extract important details like company name, address, date and
total amount. This information is essential for keeping financial records and streamlining
workflows. To prepare the training data, we first use Tesseract for an OCR step. This extracts
raw text and their corresponding bounding box coordinates from the images. After that, we
use a custom algorithm to accurately label text categories that either match or closely
resemble the predefined annotations. This process ensures the dataset is properly formatted
for LayoutLMv3's multimodal input needs. After the preprocessing and labeling steps, the
LayoutLMv3 model is fine-tuned and evaluated. Its effectiveness is measured by comparing
its performance to a well-known commercial solution, Google Document AI. This comparison
aims to show the practical use and limitations of a custom-trained open-source model in a
real-world scenario. The results show that Google Document AI outperforms the fine-tuned
LayoutLMv3 model by a large margin. However, the findings offer valuable insights into the
strengths and weaknesses of fine-tuned Transformer models for extracting information from
documents in a low-resource language context and semi-structured document types.
Additionally, this research can help improve automation in financial processes, reduce manual
work, and provide a solid framework for similar document understanding tasks in different
industries.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
Palavras-chave
Optical Character Recognition Key Information Extraction Invoices Multimodal Machine Learning Models SDG 8 - Decent work and economic growth SDG 12 - Responsible production and consumption SDG 13 - Climate action SDG 15 - Life on land
