Utilize este identificador para referenciar este registo:
http://hdl.handle.net/10362/184836
Título: | Redefining text-to-SQL metrics by incorporating semantic and structural similarity |
Autor: | Pinna, Giovanni Perezhohin, Yuriy Manzoni, Luca Castelli, Mauro De Lorenzo, Andrea |
Palavras-chave: | SQL metric Evaluation metric Text-to-SQL Benchmark SQL SQL similarity General SDG 9 - Industry, Innovation, and Infrastructure |
Data: | Jul-2025 |
Resumo: | The rapid advancements in text-to-SQL systems have driven the scientific community to create increasingly complex benchmarks for this task. However, evaluation metrics often rely on simplistic or binary approaches that fail to capture the similarities and differences between equivalent SQL queries. Current metrics overlook critical aspects such as partial correctness, structural differences, and semantic equivalence. To address these limitations, we propose a novel metric for SQL query comparison, designed to offer a more precise assessment of the similarity between SQL queries at both the semantic (string) and execution result (resultant table) levels. This new metric allows for a granular evaluation of SQL query similarity, supporting a more accurate assessment and ranking of text-to-SQL tools and models. The proposed approach could have a meaningful impact on text-to-SQL research and development. It might improve evaluation by distinguishing between models that handle simple queries and those capable of tackling more complex ones. The metric could also help to identify where the differences between two queries lie. Additionally, it may support the development of more accurate language models by offering precise training signals to help the model recognize query similarities. The experimental results highlight the metric’s effectiveness over existing evaluation methodologies, allowing us to identify the current best text-to-SQL models through distribution analysis. In some cases, the metric allows the detection of missing aggregation operators or variations in query ordering operators. |
Descrição: | Pinna, G., Perezhohin, Y., Manzoni, L., Castelli, M., & De Lorenzo, A. (2025). Redefining text-to-SQL metrics by incorporating semantic and structural similarity. Scientific Reports, 15, Article 22357. https://doi.org/10.1038/s41598-025-04890-9 --- This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project UIDB/04152 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS, and the project 2024.07277.IACDC (DOI: https://doi.org/10.54499/2024.07277.IACDC) supported by measure RE-C05-i08.M04 of the Recovery and Resilience Plan – RRP, within the scope of the funding agreement signed between the Mission Structure ‘Recover Portugal’ (EMRP) and the FCT (Fundação para a Ciência e a Tecnologia), as an intermediate beneficiary. This research is founded by the Italian Ministry of University and Research under the National Recovery and Resilience Plan - PNRR (Ministerial Decree 351/2022) and by the Plus s.r.l. company in Area Science Park, Basovizza, Italy. |
Peer review: | yes |
URI: | http://hdl.handle.net/10362/184836 |
DOI: | https://doi.org/10.1038/s41598-025-04890-9 |
ISSN: | 2045-2322 |
Aparece nas colecções: | NIMS: MagIC - Artigos em revista internacional com arbitragem científica (Peer-Review articles in international journals) |
Ficheiros deste registo:
Ficheiro | Descrição | Tamanho | Formato | |
---|---|---|---|---|
Redefining_text_to_SQL_metrics.pdf | 3,61 MB | Adobe PDF | Ver/Abrir |
Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.