Logo do repositório
 
Publicação

Redefining text-to-SQL metrics by incorporating semantic and structural similarity

dc.contributor.authorPinna, Giovanni
dc.contributor.authorPerezhohin, Yuriy
dc.contributor.authorManzoni, Luca
dc.contributor.authorCastelli, Mauro
dc.contributor.authorDe Lorenzo, Andrea
dc.contributor.institutionInformation Management Research Center (MagIC) - NOVA Information Management School
dc.contributor.institutionNOVA Information Management School (NOVA IMS)
dc.contributor.pblNature Publishing Group
dc.date.accessioned2025-07-04T21:17:24Z
dc.date.available2025-07-04T21:17:24Z
dc.date.issued2025-07
dc.descriptionPinna, G., Perezhohin, Y., Manzoni, L., Castelli, M., & De Lorenzo, A. (2025). Redefining text-to-SQL metrics by incorporating semantic and structural similarity. Scientific Reports, 15, Article 22357. https://doi.org/10.1038/s41598-025-04890-9 --- This work was supported by national funds through FCT (Fundação para a Ciência e a Tecnologia), under the project UIDB/04152 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS, and the project 2024.07277.IACDC (DOI: https://doi.org/10.54499/2024.07277.IACDC) supported by measure RE-C05-i08.M04 of the Recovery and Resilience Plan – RRP, within the scope of the funding agreement signed between the Mission Structure ‘Recover Portugal’ (EMRP) and the FCT (Fundação para a Ciência e a Tecnologia), as an intermediate beneficiary. This research is founded by the Italian Ministry of University and Research under the National Recovery and Resilience Plan - PNRR (Ministerial Decree 351/2022) and by the Plus s.r.l. company in Area Science Park, Basovizza, Italy.
dc.description.abstractThe rapid advancements in text-to-SQL systems have driven the scientific community to create increasingly complex benchmarks for this task. However, evaluation metrics often rely on simplistic or binary approaches that fail to capture the similarities and differences between equivalent SQL queries. Current metrics overlook critical aspects such as partial correctness, structural differences, and semantic equivalence. To address these limitations, we propose a novel metric for SQL query comparison, designed to offer a more precise assessment of the similarity between SQL queries at both the semantic (string) and execution result (resultant table) levels. This new metric allows for a granular evaluation of SQL query similarity, supporting a more accurate assessment and ranking of text-to-SQL tools and models. The proposed approach could have a meaningful impact on text-to-SQL research and development. It might improve evaluation by distinguishing between models that handle simple queries and those capable of tackling more complex ones. The metric could also help to identify where the differences between two queries lie. Additionally, it may support the development of more accurate language models by offering precise training signals to help the model recognize query similarities. The experimental results highlight the metric’s effectiveness over existing evaluation methodologies, allowing us to identify the current best text-to-SQL models through distribution analysis. In some cases, the metric allows the detection of missing aggregation operators or variations in query ordering operators.en
dc.description.versionpublishersversion
dc.description.versionpublished
dc.format.extent17
dc.format.extent3697586
dc.identifier.doi10.1038/s41598-025-04890-9
dc.identifier.issn2045-2322
dc.identifier.otherPURE: 117080225
dc.identifier.otherPURE UUID: 01d51752-e8ae-40d7-9d6e-6c49f62fef7d
dc.identifier.otherScopus: 105009902701
dc.identifier.otherWOS: 001523033000029
dc.identifier.otherPubMed: 40594429
dc.identifier.otherORCID: /0000-0002-8793-1451/work/187323449
dc.identifier.urihttp://hdl.handle.net/10362/184836
dc.identifier.urlhttps://www.scopus.com/pages/publications/105009902701
dc.identifier.urlhttps://www.webofscience.com/wos/woscc/full-record/WOS:001523033000029
dc.identifier.urlhttps://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/bird
dc.identifier.urlhttps://github.com/giovannipinna96/NL2SQL360/tree/master/data/predict/bird_dev
dc.language.isoeng
dc.peerreviewedyes
dc.relationhttps://doi.org/10.54499/UID/04152/2025
dc.relationhttps://doi.org/10.54499/UID/PRR/04152/2025
dc.relationhttps://doi.org/10.54499/2024.07277.IACDC
dc.subjectSQL metric
dc.subjectEvaluation metric
dc.subjectText-to-SQL
dc.subjectBenchmark SQL
dc.subjectSQL similarity
dc.subjectGeneral
dc.subjectSDG 9 - Industry, Innovation, and Infrastructure
dc.titleRedefining text-to-SQL metrics by incorporating semantic and structural similarityen
dc.typejournal article
degois.publication.titleScientific Reports
degois.publication.volume15
dspace.entity.typePublication
rcaap.rightsopenAccess

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
Redefining_text_to_SQL_metrics.pdf
Tamanho:
3.53 MB
Formato:
Adobe Portable Document Format