Enhancing product categorization with retrieval augmented generation: a comparative study of architectures, techniques and strategies

Marrero, Rafael Alejandro Moles

http://hdl.handle.net/10362/186190

Utilize este identificador para referenciar este registo.

Nome:	Descrição:	Tamanho:	Formato:
RafaelMoles_THESIS.pdf		3.63 MB	Adobe PDF	Ver/Abrir

Contacte-nos

Autores

Marrero, Rafael Alejandro Moles

Orientador(es)

Han, Qiwei

Resumo(s)

This research explored techniques to improve Large Language Models performance for Hi erarchical Product Classification (HPC), including optimized fine-tuning, optimal prompting techniques, taxonomy-specific Knowledge Graphs, leveraging Retrieval-Augmented Genera tion, and implementing LLM-based Entity Matching. Tested on benchmark datasets Icecat and WDC-222, these methods significantly enhanced LLMs’ ability to solve HPC tasks across various scenarios. The paper investigates the enhancement of product categorization through various RAG configurations such as NaiveRAG, AdvancedRAG, GraphRAG, and HybridRAG, applied in both flat and hierarchical systems. Results achieved a hierarchical F1-score (hF) of 0.921, surpassing traditional DL benchmarks (0.85 hF). While not outperforming proprietary models like GPT, the proposed approaches offer a cost-efficient and effective alternative for businesses, demonstrating strong performance without reliance on expensive LLM solutions.

Palavras-chave

Large Language Models Hierarchical classification E-commerce In-Context Learning Fine tuning Prompt engineering Knowledge graphs Retrieval Augmented Generation Entity matching Multi-tired retrieval NaiveRAG AdvancedRAG GraphRAG HybridRAG

URI

http://hdl.handle.net/10362/186190

Coleções

NSBE - MA Dissertations

Ver registo completo