| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 4.17 MB | Adobe PDF |
Autores
Orientador(es)
Resumo(s)
This research study focuses on applying sequence-to-sequence models to approach
conversational text-to-SQL by comparing different methodologies. This study proposes
a pre-training in the T5-base model with WikiSQL data later fine-tuned with SParC data,
which involves taxonomy, also known as schema linking and tree dependency parsing
integrated. This model was later compared through an ablation study with training SParC
data with a pre-train in T5-base model with fine-tuning, where all procedure was kept
except the differentiations on the model development itself. The impact of taxonomy
and dependency parsing were checked through model results. These methodologies were
tested through four samples defined in advance using different database domains in a way
that all benchmark was trained and tested. The metrics used were the execution with
values and the exact set match without values that evaluates the capacity of the queries to
access the database and bring a value or build the query structure. Thus, computational
runtime and proper machines were described in order to evaluate the impact of the final
result. The computational power challenges found suggests that future work requires to
be developed using this alternative approach.
Descrição
Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data Science
Palavras-chave
Natural Language Processing Conversational text-to-SQL Schema linking Dependency parsing
