Graphrag for the Portuguese legal domain - a comparative study of graph-based document relationships and traditional RAG pipelines

Esteves, Patrícia Nunes Domingos

http://hdl.handle.net/10362/181580

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
FALL25_57836.pdf		2.05 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Esteves, Patrícia Nunes Domingos

Orientador(es)

Han, Qiwei

Resumo(s)

This study explores RAG systems tailored to the Portuguese legal domain, highlighting challenges in underrepresented languages. Fixed-size chunking strategies, particularly TokenTextSplitter, were found to be most effective, while more advanced techniques like Recursive and Semantic splitting showed little benefits. Larger chunk sizes improved retrieval accuracy and answer quality, though the impact of chunk overlap remains inconclusive. One issue of the vector databases is their lack of explainability and understanding of complex relationships. This work will analyse a solution named GraphRAG, and advanced RAG technique that leverages the strength of knowledge graphs. It shows promise with faster results than traditional RAG approaches and performs better in questions that need relations understanding.

Palavras-chave

Retrieval-Augmented Generation RAG Large Language Models LLM Artificial Intelligence, AI Retrieval-augmented generation Hallucination Question answering RAG evaluation Vector store Chunking Legal AI Knowledge graph GraphRAG RDF Legal information retrieval Portuguese legal retrieval Natural language processing Chain-of-Thought CoT

URI

http://hdl.handle.net/10362/181580

Coleções

NSBE - MA Dissertations

Ver registo completo