A carregar...
Projeto de investigação
Neural Information Retrieval for Long Semi-Structured Documents
Financiador
Autores
Publicações
Aligning Web Query Generation with Ranking Objectives via Direct Preference Optimization
Publication . Coelho, João; Martins, Bruno; Magalhães, João; Xiong, Chenyan; Faculdade de Ciências e Tecnologia (FCT)
Neural retrieval models excel in Web search, but their training requires substantial amounts of labeled query-document pairs, which are costly to obtain. With the widespread availability of Web document collections like ClueWeb22, synthetic queries generated by large language models offer a scalable alternative. Still, synthetic training queries often vary in quality, which leads to suboptimal downstream retrieval performance. Existing methods typically filter out noisy query-document pairs based on signals from an external re-ranker. In contrast, we propose a framework that leverages Direct Preference Optimization (DPO) to integrate ranking signals into the query generation process, aiming to directly optimize the model towards generating high-quality queries that maximize downstream retrieval effectiveness. Experiments show higher ranker-assessed relevance between query-document pairs after DPO, leading to stronger downstream performance on the MS MARCO benchmark when compared to baseline models trained with synthetic data.
Unidades organizacionais
Descrição
Palavras-chave
Contribuidores
Financiadores
Entidade financiadora
Fundação para a Ciência e a Tecnologia
Programa de financiamento
OE
Número da atribuição
PRT/BD/153683/2021
