0% found this document useful (0 votes)

13 views6 pages

10 LLM Based SQL Generation Wi

The document presents two models, SQL-RL-GEN and SQL-RL-GEN*, designed to improve text-to-SQL generation using reinforcement learning while minimizing resource usage. These models leverage a novel reward function generation algorithm, EUREKA, to enhance the training of a base LLM for SQL query generation. The results demonstrate a 2-7% accuracy improvement over existing state-of-the-art methods, even with limited training data and a smaller LLM.

Uploaded by

aliassia995

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views6 pages

10 LLM Based SQL Generation Wi

Uploaded by

aliassia995

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

LLM-based SQL Generation with Reinforcement Learning

Mariia Berdnyk1 , Marine Collery 1

1
IBM France Lab
mariia.berdnyk@ibm.com, marine.collery@ibm.com

Abstract incorporating a dependency graph to guide token predic-

tions based on their dependencies (Xu, Liu, and Song 2017).
The text-to-SQL problem remains a challenging task, even However, the question remains open whether generations
with the advancements of Large Language Models (LLMs).
without a solid data background can be further improved and
Current state-of-the-art models require extensive preprocess-
ing steps and powerful LLMs to achieve accurate SQL query generalized easily, no matter the model used. Another ap-
generation, which leads to significant resource utilization. proach based on Reinforcement Learning (RL), Seq2SQL,
We introduce two models deriving from one another S QL - uses basic rewards (1 for correct query generation and -1
∗ otherwise) obtained from in-the-loop query execution over
RL - GEN and S QL - RL - GEN , that improve text-to-sql genera-
tion while minimizing the resources needed for training and the database to learn a policy for generating the better query
maximizing flexibility. The S QL - RL - GEN generates a reward (Zhong, Xiong, and Socher 2017). Despite the impressive
function to guide the agent’s training process, while S QL - RL - results that Seq2SQL has demonstrated at the time of its
∗
GEN uses this reward function to tune a base LLM in solving publication, subsequent work suggest that the base reward
the specified task. Our models achieve an accuracy improve- is not enough to solve the text-to-SQL problem (Xu, Liu,
ment of 2-7% compared to state-of-the-art methods on a lim-
and Song 2017).
ited training dataset composed of only 1000 samples and with
a small LLM of 248M parameters. Reward function design for generation task demands sig-
nificant human effort and is known to be notoriously diffi-
cult in practice (Sutton and Barto 1995). For this purpose,
Code — https://github.com/IBM/sql-rl-gen
recently, a generic novel reward design algorithm, E UREKA
Datasets — https://ibm.box.com/v/sql-rl-gen-data (Ma et al. 2024), powered by coding LLMs was proposed.
Unlike prior works using LLMs to aid reward design, E U -
REKA is completely free of task-specific prompts, reward
Introduction
templates, as well as few-shot examples (Ma et al. 2024).
Large Language Models (LLMs) have exhibited remarkable Instead, it uses evolutionary search and feedback to gener-
capabilities in various tasks, including text and code gen- ate the best reward function with LLM.
eration problems (Jiang et al. 2024). The success is largely In this paper, we introduce two models deriving from one
attributed to the vast amount of data available for training another S QL - RL - GEN and S QL - RL - GEN∗ .
and tuning processes. S QL - RL - GEN algorithm finds the best reward function
The text-to-SQL generation problem is a critical area of (reference reward function) to be used for the training of
research within the fields of natural language processing an RL agent to generate SQL queries from text with sim-
(NLP) and database systems. Since SQL remains one of the ilar techniques as proposed by E UREKA i.e. implementing
most widely used programming languages for database man- the reward design for SQL generation, feedback formulation
agement (51.52%), the text-to-SQL translation enables non- and an evolutionary search of the best reward function.
skilled users to access structured databases like engineers S QL - RL - GEN∗ uses the reference reward function gener-
using everyday language (Hong et al. 2024). ated by S QL - RL - GEN on a reference dataset to tune a base
Current text-to-SQL best models, which achieve the top LLM (flan-t5-base) for SQL generation with limited
scores on the most comprehensive SQL datasets, are based resources.
on modifying the model structure by providing several other The approach makes the following key contributions com-
preprocessing steps in between the model and SQL genera- pared to existing work:
tion. For instance, ExSL + granite-34b-code by IBM
Research combines 2 steps before passing the question to 1. Versatility and efficiency of the reference reward
the model, which are: schema linking and content linking function for SQL generation: S QL - RL - GEN∗ outper-
(Martineau 2024). SQLNet uses a sketch-based approach, forms state-of-the-art SQL generation models on a dif-
ferent dataset than the one used to generate the reference
Copyright © 2025, Association for the Advancement of Artificial reward function, with only 1000 samples used for train-
Intelligence (www.aaai.org). All rights reserved. ing and a relatively small base LLM of 248M parameters.
Figure 1: S QL - RL - GEN takes as inputs: a system prompt, an SQL environment code, and a task description prompt. The coding
LLM iteratively generates N reward function candidates, each used to train an SQL generation model from scratch with the
RL Proximal Policy Optimization (PPO) algorithm. The resulting models are evaluated by comparing the rows obtained from
generated SQL queries execution with those from ground truth queries. The evaluation results (feedback) and the best selected
by accuracy reward function are fed back to the coding LLM for the next iteration. S QL - RL - GEN∗ is a special case where the
best reward function from a previous S QL - RL - GEN training is used directly to train the RL agent.

This makes S QL - RL - GEN∗ efficient in terms of resource dataset D is defined as D = ((t1 , q1 ), s1 ), ...((tN , qN ), sN )
utilization. where N is the number of samples.
2. Domain adaptability: S QL - RL - GEN algorithm is easily Once trained, model Ltrained should return for a specific
adaptable for generating reward functions in various text- prompt p a generated SQL query sgen to be compared with
to-code domains, enabling its application in diverse set- corresponding (ground truth) query s.
tings.
Method
Problem Statement An overview of the approach of S QL - RL - GEN is illustrated
Given a textual prompt input p, which is the part of the set in Figure 1. An initialization step is followed by a loop com-
of all possible textual prompts P = {p1 , p2 , ..., pn }, and an posed of:
LLM L : P → O that maps prompts to code outputs in the • the generation of a reward function,
space of all possible code outputs O = {o1 , o2 , ..., om }, our • the training of the RL agent,
goal is to train L to generate an SQL query s ∈ S from the • the evaluation of the tuned SQL generation model and
input prompt p, where S ∈ O is the set of all possible SQL the supply of textual feedback.
queries. Initialization. In the initialization stage, similarly to E U -
The prompt is represented as p = (I, T, Q), where: REKA original approach, we provide the LLM with a prompt
• I is a set of possible instructions, e.g., “convert”, “sum- that outlines the task and SQL environment. It is composed
marize”, “answer”, etc. It can be represented as a binary of the following parts.
vector i ∈ {0, 1}|I| , where each element corresponds to 1. The system prompt explicitly defines the role of the
one of the instructions in I. LLM as a reward engineer and provides an example of
• T is a set of possible table schemas: T = (t1 , t2 , ..., tj ). the reward function signature.
t is a single table, represented as a tuple of columns 2. The task description specifies the goal of the model dur-
t = (c1 , c2 , ..., ck ) where k is the number of columns ing training and generation. For SQL generation, it is set
in the table t. to “Converting question and database tables into SQL
• Q is a set of possible questions, e.g., “How many...”, query”.
“What is...”, etc. Each question can be represented as a 3. The SQL environment component is crucial and pro-
string q. vides the LLM with context where the trained agent will
As instruction (I) for the problem remains unchanged, operate and execute generated reward functions during
training and testing datasets consist of pairs of input data training. In the same manner as in E UREKA, S QL - RL -
(t, q) and corresponding (ground truth) query s such that a GEN feeds the raw environment source code (excluding
reward code, if present) as context with minimal expla- (61297 examples), development (9145 examples) and test
nations of external functions (Ma et al. 2024). sets (17284 examples).
The entire initialization stage sets the generation goal, al- Experimental Setting. For each dataset, a subset of 1000
lowing adaptation to different tasks by modifying the initial randomly selected samples are used for training and an
prompts to solve similar problems in a comparable manner. other subset of 1000 randomly selected samples are used for
All initialization prompts are available in Appendix. testing. The experiments are carried out with k-fold cross-
Reward Function Generation and Training. Thanks to validation strategy with k = 5.
the provided prompts, the coding LLM generates multiple The reward function generation and reflection are im-
reward functions that are used to train RL agents with PPO plemented using llama-3-405b-instruct (Touvron
(Schulman et al. 2017) algorithm in a similar manner to E U - et al. 2023). This model is free, open-source and is known
REKA , and obtain a tuned SQL generation LLM. for its good instructed generation capabilities (Touvron et al.
2023), which makes it the better choice than the propri-
Evaluation and Feedback. In order to improve the next etary one described in the E UREKA reference paper. Char-
iteration of reward function generation, textual feedback on acteristics of the model are available in Appendix, Ta-
the performance of the best tuned SQL generation LLM is ble 4. The initial LLMs (agents) used for generating SQL
provided to the coding LLM as well as the reward function, queries are flan-t5-base (Chung et al. 2024) and a pre-
with which this model is trained. The SQL generation LLM trained version of flan-t5-base on SQL syntax (noa
is considered the best (out of the multiple generated), if after 2023). flan-t5-base transformer-based model consists
training it yields higher average accuracy during the evalua- of only 248 million parameters, which makes its training
tion step than other models from both previous and current process computationally efficient and light. To evaluate the
iterations. efficiency of S QL - RL - GEN∗ , trained flan-t5-base was
To evaluate the tuned SQL generation LLM performance, compared with the trained on the same samples Seq2SQL
similarly to Seq2SQL approach, both S QL - RL - GEN and and SQLNet reference models, which are configured accord-
S QL - RL - GEN∗ evaluation step consists in comparing the ing to their original papers. All agents characteristics can be
SQL rows resulting from the execution of the generated SQL found in Appendix, Table 5.
query and the ones obtained with the ground truth query. The PPO algorithm is configured in the exact same manner
generated queries are only executed when they do not mod- as in (Schulman et al. 2017) and as described in E UREKA
ify the execution environment. reference paper. The parameters are listed in Table 6 in Ap-
The evaluation results are saved, converted into text and pendix. However, unlike the original PPO approach, which
provided back to the LLM as feedback with quantitative in- only allows a single trial per sample before switching to an-
formation of the performance (accuracy, precision, recall, other, for the training of S QL - RL - GEN and S QL - RL - GEN∗ ,
F1-score and intersect over union (IoU)). In addition, if we introduce an improvement by enabling the model to ex-
errors are encountered during the execution of generated periment 10 times on the same sample before moving on.
queries, error types along with the error descriptions are re- This approach enables the agent to learn from its mistakes
turned in the feedback. The error descriptions do not provide and refine its policy for generating better SQL queries. By
specific information about the database context and are data allowing multiple trials on the same sample, we can more
independent. effectively capture the nuances of text generation problems,
As shown in Figure 1, S QL - RL - GEN∗ is derived from which often demand a more refined approach than the origi-
S QL - RL - GEN and consists in retrieving the best generated nal single-trial method. This modification allows our model
reward function from a former training of S QL - RL - GEN and to learn from its errors and improve the quality of subsequent
using it to directly train a RL agent. SQL generations.
All experiments are GPU-based and were conducted on
Experiments a Lenovo ThinkPad P15 Gen 1 with Intel Core i7-10750H
In order to evaluate the validity and usefulness of S QL - RL - CPU, 12 Cores, Quadro T1000/PCIe/SSE2 graphics with
GEN , we apply it on Spider dataset (Yu et al. 2019) to ob- 4Gb of memory and running Red Hat Enterprise Linux 8.10.
tain our reference reward function. The WikiSQL dataset
(Zhong, Xiong, and Socher 2017) is then used to evaluate Preliminary Results
the validity and robustness of this reference reward function, S QL - RL - GEN and Reference Reward Function Gen-
S QL - RL - GEN∗ . eration. Training S QL - RL - GEN on Spider dataset, with
Spider Dataset Spider consists of 10181 questions and flan-t5-base model as initial SQL generation LLM,
5693 unique complex SQL queries on 200 databases with does not lead to any improvements in terms of accuracy
multiple tables covering 138 different domains. In Spi- (O%). This is due to the fact that the flan-t5-base
der 1.0, different complex SQL queries and databases appear model has not been trained on any code or SQL queries,
in train (8659 examples) and test (1034 examples) sets. and that the training on Spider dataset is severely limited by
the constrained size of 1000 training samples and that Spi-
WikiSQL Dataset WikiSQL consists of a corpus of 87726 der features highly intricate and complex queries. However,
hand-annotated SQL query and natural language question as shown in Table 1 when training S QL - RL - GEN on Spider
pairs. These SQL queries are further split into training dataset, with a pretrained for SQL syntax flan-t5-base
pretrained Seq2SQL
S QL - RL - GEN
flan-t5-base Seq2SQL with S QL - RL - GEN∗
reference reward function
accuracy (%) 44.7 ± 1.6 48.0 ± 0.78
exec sgen (%) 61.5 ± 1.5 64.3 ± 1.3 Dev Accqm 53.1% 55.0%
Dev Accexec 60.4% 62.5%
Table 1: Average accuracies and percentages of generated Test Accqm 52.7% 55.3%
executable queries sgen along with standard errors for 5-fold Test Accexec 60.0% 63.2%
cross validation for the initial LLM (flan-t5-base pre-
trained on SQL syntax model) and after S QL - RL - GEN train- Table 3: Accuracy comparison on WikiSQL dataset between
ing on Spider dataset. Metrics shown are obtained on Spider Seq2SQL and Seq2SQL with S QL - RL - GEN∗ reference re-
testing dataset. ward function. Accqm and Accexec indicate the query-match
(string match) and the execution accuracy (correct result)
(Zhong, Xiong, and Socher 2017) respectively on develop-
Seq2SQL SQLNet S QL - RL - GEN∗
ment and testing datasets.
accuracy 7.1% 11.3% 13.8%
exec sgen 12.8% 12.1% 30.6%
2. Generalization: The model improves when transferring
Table 2: Accuracies and percentages of executable generated from Spider to WikiSQL, but its adaptability to un-
queries sgen for Seq2SQL, SQLNet and S QL - RL - GEN∗ ob- seen schemas requires further evaluation across diverse
tained on WikiSQL test dataset. benchmarks.
3. PPO Trials: Additional trials refine the reward function
but increase computational cost. Analyzing diminishing
model as initial SQL generation LLM, the performance in returns could optimize efficiency.
terms of accuracy is improved by more than 3% and on aver- 4. Scalability: Testing on varied datasets and resource con-
age there are almost 3% more executable generated queries. straints would help assess robustness and adaptability.
Versatility of the Reference Reward Function. As
shown in Table 2, S QL - RL - GEN∗ which uses the refer- Conclusion
ence reward function to fine-tune flan-t5-base, out- We have presented S QL - RL - GEN and S QL - RL - GEN∗ deriv-
performs state-of-the-art models Seq2SQL (Zhong, Xiong, ing from one another. The first one proposes a reference re-
and Socher 2017) and SQLNet (Xu, Liu, and Song 2017) ward function calibrated for SQL generation thanks to evo-
on WikiSQL dataset both in terms of accuracy and number lutionary search and feedback formulation (Ma et al. 2024)
of executable generated SQL queries. It points out the ver- that can be used by the second to tune LLM with limited re-
satility of the reference reward function and how efficient sources. The experiments demonstrated that S QL - RL - GEN∗
in terms of resource utilization S QL - RL - GEN∗ is, as only outperforms state-of-the-art methods and that the reference
1000 samples were used for training compared to the entire reward function can boosts the generation capability of RL-
dataset for the other models. based methods on WikiSQL and Spider datasets.
Reusability of the Reference Reward Function. Finally,
in order to validate that the reference reward function can References
also be used in other RL-based algorithm, we compared 2023. Hugging Face. https://huggingface.co/juierror/flan-
Seq2SQL model to a version of Seq2SQL trained with our t5-text2sql-with-schema-v2. Accessed: 2024-11-15.
reference reward function version as shown in Table 3. The Chung, H. W.; Hou, L.; Longpre, S.; Zoph, B.; Tay, Y.; Fe-
metrics employed for model evaluation align with those uti- dus, W.; Li, Y.; Wang, X.; Dehghani, M.; Brahma, S.; Web-
lized in the Seq2SQL original paper (and are described in son, A.; Gu, S. S.; Dai, Z.; Suzgun, M.; Chen, X.; Chowd-
Appendix). Again, usage of the reference reward function hery, A.; Castro-Ros, A.; Pellat, M.; Robinson, K.; Valter,
improved all of the different accuracies defined in (Zhong, D.; Narang, S.; Mishra, G.; Yu, A.; Zhao, V.; Huang, Y.;
Xiong, and Socher 2017) to evaluate SQL generation. This Dai, A.; Yu, H.; Petrov, S.; Chi, E. H.; Dean, J.; Devlin, J.;
reward function can therefore be reused in other RL-based Roberts, A.; Zhou, D.; Le, Q. V.; and Wei, J. 2024. Scal-
context in the text-to-SQL generation field. ing Instruction-Finetuned Language Models. Journal of Ma-
chine Learning Research, 25(70): 1–53.
Limitations and Future Directions Hong, Z.; Yuan, Z.; Zhang, Q.; Chen, H.; Dong, J.;
While S QL - RL - GEN and S QL - RL - GEN∗ show strong im- Huang, F.; and Huang, X. 2024. Next-Generation
provements with limited data, further analysis is needed: Database Interfaces: A Survey of LLM-based Text-to-SQL.
1. Error Mitigation: The reward function penalizes syntax arXiv:2406.08426.
errors, logical inconsistencies, and schema mismatches. Jiang, J.; Wang, F.; Shen, J.; Kim, S.; and Kim, S. 2024.
A detailed breakdown of its impact on correction rates A Survey on Large Language Models for Code Generation.
would clarify its role in improving performance. arXiv:2406.00515.
Ma, Y. J.; Liang, W.; Wang, G.; Huang, D.-A.; Bastani, SQL environment
O.; Jayaraman, D.; Zhu, Y.; Fan, L.; and Anandkumar, A.
2024. Eureka: Human-Level Reward Design via Coding ‘‘‘python
Large Language Models. arXiv:2310.12931. class SQLRLEnv(TextRLEnv):
Martineau, K. 2024. IBM text-to-SQL generator tops leader- def __init__(self, model,
board. https://research.ibm.com/blog/granite-LLM-text-to- tokenizer, dataset, ...):
SQL. Accessed: 2024-10-31. super().__init__(model,
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; and tokenizer,
Klimov, O. 2017. Proximal Policy Optimization Algorithms. observation_input,
arXiv:1707.06347. max_length,
Sutton, R. S.; and Barto, A. G. 1995. Reinforcement Learn- compare_sample,
ing: An Introduction. MIT Press. unfreeze_layer_from_past)
Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, ...
M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; def sql_query_execution_feedback
Azhar, F.; Rodriguez, A.; Joulin, A.; Grave, E.; and Lample, (self, input_item,
G. 2023. LLaMA: Open and Efficient Foundation Language predicted_text) -> Dict:
Models. arXiv:2302.13971. ...
Xu, X.; Liu, C.; and Song, D. 2017. SQLNet: Generating
# Base method
Structured Queries From Natural Language Without Rein-
def get_reward(self, input_item,
forcement Learning. arXiv:1711.04436.
predicted_list, finish):
Yu, T.; Zhang, R.; Yang, K.; Yasunaga, M.; Wang, D.; Li, if finish:
Z.; Ma, J.; Li, I.; Yao, Q.; Roman, S.; Zhang, Z.; and Radev, predicted_text = self.
D. 2019. Spider: A Large-Scale Human-Labeled Dataset for tokenizer.
Complex and Cross-Domain Semantic Parsing and Text-to- convert_tokens_to_string
SQL Task. arXiv:1809.08887. (predicted_list[0])
Zhong, V.; Xiong, C.; and Socher, R. 2017. Seq2SQL: Gen- reward, metrics = self.
erating Structured Queries from Natural Language using Re- compute_reward(
inforcement Learning. arXiv:1709.00103. input_item,
predicted_text)
Appendix - Initialization Prompts metrics["reward"] =
reward
System Prompt ...
You are a reward engineer trying return reward
to write reward functions to solve return 0.0
reinforcement learning tasks as
effective as possible. Your goal # Skeleton of generation
is to write a reward function for def compute_reward(self,
the environment that will help the input_item, predicted_text)
agent learn the task described in -> Tuple[float, Dict]
text. Your reward function should use ‘‘‘
useful variables from the environment
as inputs. An example of the reward
function signature can be:
‘‘‘python Appendix - Experimental Settings
{task_reward_signature_string}
‘‘‘
You need to generate the reward
functions of EXACTLY this syntax.
Everything else is not accepted. Please llama-3-405b
make sure that the code is compatible -instruct
with Gym env. **PROVIDE ONLY PYTHON
CODE.** Number of parameters 405B
Temperature 0.95
Task Description Context size 15 000
The Python environment is Decoding method sample
{task_environment_code_string}. Write a
reward function for the following task: Table 4: llama-3-405b-instruct and flan-t5-base characteris-
{task_description}. tics.
flan-t5-base SQLNet Seq2SQL
Architecture Encoder-Decoder BiLSTM + attention Encoder-Decoder
Transfomer (T5) + seq2set + RL
Number of parameters 248M 38.5M 37M
Pretrained Yes No No
Fine-tuning required Yes Yes Yes
Temperature 0.8 0.8 0.8

Table 5: Experimental flan-t5-base, Seq2SQL and SQLNet agents models characteristics.

Parameters Values
Tensors type F32
Temperature 0.8
Top k 100
Top p 0.85
Update interval 50
Minibatch size 512
Number of Epochs 5000
Number of steps 1000
Number of evaluation episodes 5
Maximum training episodes length 1000
Evaluation interval 10
Maximum new tokens 250
Minimum new tokens 10

Table 6: PPO algorithm settings.

Appendix - Reference Reward Function

generated with S QL - RL - GEN

Figure 2: Reference Reward Function generated with S QL -

RL - GEN and used for training of S QL - RL - GEN ∗ .

Ms- Word 2024
No ratings yet
Ms- Word 2024
213 pages
LLM for Data Management
No ratings yet
LLM for Data Management
98 pages
Kick Identity
No ratings yet
Kick Identity
37 pages
SQLPaLM
No ratings yet
SQLPaLM
61 pages
HCteam_IT_Proposal
No ratings yet
HCteam_IT_Proposal
15 pages
T2S Retrieval
No ratings yet
T2S Retrieval
16 pages
A Survey On Text-to-SQL Parsing: Concepts, Methods, and Future Directions
No ratings yet
A Survey On Text-to-SQL Parsing: Concepts, Methods, and Future Directions
19 pages
Cpar Admin Guide
No ratings yet
Cpar Admin Guide
272 pages
Roisinluo Reasoning in LLMs
No ratings yet
Roisinluo Reasoning in LLMs
72 pages
2507.13158v1
No ratings yet
2507.13158v1
31 pages
Xbee 140
No ratings yet
Xbee 140
140 pages
521H0502–521H0498–521h0333_NLP_Report
No ratings yet
521H0502–521H0498–521h0333_NLP_Report
27 pages
Thesis-1
No ratings yet
Thesis-1
46 pages
AI_Unit 4
No ratings yet
AI_Unit 4
21 pages
Enhancing Text-To-SQL Capabilities of Large Language Models
No ratings yet
Enhancing Text-To-SQL Capabilities of Large Language Models
22 pages
13657_Spider_2_0_Can_Language_
No ratings yet
13657_Spider_2_0_Can_Language_
45 pages
An Fpga Implementation of Successive Cancellation List Decoding For Polar Codes
No ratings yet
An Fpga Implementation of Successive Cancellation List Decoding For Polar Codes
85 pages
Mini Project - Review 1
No ratings yet
Mini Project - Review 1
36 pages
Structure-guided Large Language Models For
No ratings yet
Structure-guided Large Language Models For
24 pages
Data Generation Using Large Language Models for Text Classification
No ratings yet
Data Generation Using Large Language Models for Text Classification
17 pages
670e4e23bdd7d170839060aa2023.findings-emnlp.227
No ratings yet
670e4e23bdd7d170839060aa2023.findings-emnlp.227
32 pages
new
No ratings yet
new
29 pages
Lucy: Think and Reason To Solve Text-to-SQL: Nina Narodytska Shay Vargaftik
No ratings yet
Lucy: Think and Reason To Solve Text-to-SQL: Nina Narodytska Shay Vargaftik
33 pages
Chase SQL
No ratings yet
Chase SQL
30 pages
LLM Based Survey Text 1741015993
No ratings yet
LLM Based Survey Text 1741015993
20 pages
2504.08600v1
No ratings yet
2504.08600v1
18 pages
Computer Vision - Session 1
No ratings yet
Computer Vision - Session 1
36 pages
Bug Hunter Methodology V4 (@jhaddix) : Finding Seeds
0% (1)
Bug Hunter Methodology V4 (@jhaddix) : Finding Seeds
1 page
2308.15363v4
No ratings yet
2308.15363v4
22 pages
db gpt hub 2024
No ratings yet
db gpt hub 2024
17 pages
Can Analyzer
100% (1)
Can Analyzer
39 pages
Dbms Lab El Report
No ratings yet
Dbms Lab El Report
20 pages
2023.emnlp Main.96SynthIE
No ratings yet
2023.emnlp Main.96SynthIE
20 pages
From Natural Language to SQL Review Of
No ratings yet
From Natural Language to SQL Review Of
15 pages
SSRN Id4209363
No ratings yet
SSRN Id4209363
27 pages
2403.09732v4
No ratings yet
2403.09732v4
15 pages
REWARD DESIGN WITH LANGUAGE MODELS
No ratings yet
REWARD DESIGN WITH LANGUAGE MODELS
18 pages
Nl2sql
No ratings yet
Nl2sql
12 pages
2406.08426v3
No ratings yet
2406.08426v3
18 pages
Logic-RL
No ratings yet
Logic-RL
17 pages
LLM Diversity
No ratings yet
LLM Diversity
17 pages
Large Language Model Enhanced Text-to-SQL Generation- A Survey
No ratings yet
Large Language Model Enhanced Text-to-SQL Generation- A Survey
18 pages
ai-sql-accuracy-2023-08-17
No ratings yet
ai-sql-accuracy-2023-08-17
12 pages
STaR SQL Self Taught Reasoner for Text to SQL
No ratings yet
STaR SQL Self Taught Reasoner for Text to SQL
11 pages
High-Fidelity and Complex Test Data Generation for Real-World
No ratings yet
High-Fidelity and Complex Test Data Generation for Real-World
11 pages
Solid-SQL Enhanced Schema-linking Based in-context Learning For
No ratings yet
Solid-SQL Enhanced Schema-linking Based in-context Learning For
11 pages
247 Sqlnet Generating Structured Q
No ratings yet
247 Sqlnet Generating Structured Q
15 pages
1711 04436v1
No ratings yet
1711 04436v1
13 pages
Arduino IDE Geiger Counter DIY Kit RH K GK 1 A
No ratings yet
Arduino IDE Geiger Counter DIY Kit RH K GK 1 A
18 pages
Enhancing Text-To-SQL Translation for Financial System Design
No ratings yet
Enhancing Text-To-SQL Translation for Financial System Design
11 pages
Diceware Method To Create Passphrases and Passwords
No ratings yet
Diceware Method To Create Passphrases and Passwords
13 pages
Instructions On Applying and Installing A Digital M PESA Certificate
No ratings yet
Instructions On Applying and Installing A Digital M PESA Certificate
17 pages
Natural Language To SQL in Low-Code Platforms
No ratings yet
Natural Language To SQL in Low-Code Platforms
11 pages
2503.21602v1
No ratings yet
2503.21602v1
9 pages
preprints202402.0693.v1
No ratings yet
preprints202402.0693.v1
9 pages
Database CS-229T All - LABs
No ratings yet
Database CS-229T All - LABs
58 pages
Unleashing The Power of ChatGPT For Translation
No ratings yet
Unleashing The Power of ChatGPT For Translation
10 pages
1.3.7 High and Low Level Languages and Their Translators ANSWERS
No ratings yet
1.3.7 High and Low Level Languages and Their Translators ANSWERS
10 pages
In Context Reinforcement Learning Based Retrieval Augmented Generation for Text to SQL
No ratings yet
In Context Reinforcement Learning Based Retrieval Augmented Generation for Text to SQL
8 pages
NLQ 262290 5914375 NLQ
No ratings yet
NLQ 262290 5914375 NLQ
8 pages
Working With The Model Checker API - Version 4.1
No ratings yet
Working With The Model Checker API - Version 4.1
8 pages
AI and Robotics Sample Note
No ratings yet
AI and Robotics Sample Note
7 pages
Semantic Parsing For Complex Data Retrieval: Targeting Query Plans vs. SQL For No-Code Access To Relational Databases
No ratings yet
Semantic Parsing For Complex Data Retrieval: Targeting Query Plans vs. SQL For No-Code Access To Relational Databases
17 pages
Query GPT
No ratings yet
Query GPT
6 pages
Bahria University: Lahore Campus
100% (1)
Bahria University: Lahore Campus
10 pages
MY06 - Kidex Sedenak - Bridge Data Centres
No ratings yet
MY06 - Kidex Sedenak - Bridge Data Centres
5 pages
LLM Based TXT To SQL
No ratings yet
LLM Based TXT To SQL
18 pages
Institute of Engineers: Static and Dynamic Finite Element Analysis and Design of Structures
No ratings yet
Institute of Engineers: Static and Dynamic Finite Element Analysis and Design of Structures
15 pages
Sios Whitepaper Understanding DR Options
No ratings yet
Sios Whitepaper Understanding DR Options
8 pages
Experiment # 08
No ratings yet
Experiment # 08
6 pages
Assignment 1.2 CHAPTER 1
No ratings yet
Assignment 1.2 CHAPTER 1
4 pages
Department of Education: Silanga National High School
No ratings yet
Department of Education: Silanga National High School
4 pages
Pdrrmo Hhprofiler: User'S Manual
No ratings yet
Pdrrmo Hhprofiler: User'S Manual
8 pages
Avr750 Review
No ratings yet
Avr750 Review
4 pages
Lasitha Ekanayake - Software Engineer
No ratings yet
Lasitha Ekanayake - Software Engineer
1 page
Mechatronics Question Bank For Mid-II
No ratings yet
Mechatronics Question Bank For Mid-II
10 pages
Computer Operation Level 1
0% (2)
Computer Operation Level 1
13 pages
Efficient Numerical Computing with Intel MKL: Definitive Reference for Developers and Engineers
From Everand
Efficient Numerical Computing with Intel MKL: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
IGNOU MCA Design and Analysis of Algorithms Previous Years Unsolved Papers MCS 211
From Everand
IGNOU MCA Design and Analysis of Algorithms Previous Years Unsolved Papers MCS 211
Manish Soni
No ratings yet
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Cilk Programming and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Study Guide 300-835 CLAUTO Automating and Programming Cisco Collaboration Solutions Exam
From Everand
Study Guide 300-835 CLAUTO Automating and Programming Cisco Collaboration Solutions Exam
Anand Vemula
No ratings yet
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
From Everand
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
César Pérez López
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Oracle SQL and PL/SQL
From Everand
Oracle SQL and PL/SQL
Niraj Gupta
4.5/5 (8)
ML Ops on Azure: From Models to Production
From Everand
ML Ops on Azure: From Models to Production
Kameron Hussain
No ratings yet
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
From Everand
C++ VS JAVA A PERFORMANCE DEEPDIVE: Unraveling the Performance Characteristics of C++ and Java for High-Performance Computing
Manoj R Chakravarthi
No ratings yet
Introduction to MATLAB for Scientists and Engineers: A Practical Guide to Computational Problem Solving
From Everand
Introduction to MATLAB for Scientists and Engineers: A Practical Guide to Computational Problem Solving
Eric Okoth Ogur
No ratings yet
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet
PLC Controls with Structured Text (ST): IEC 61131-3 and best practice ST programming
From Everand
PLC Controls with Structured Text (ST): IEC 61131-3 and best practice ST programming
Tom Mejer Antonsen
4/5 (12)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

10 LLM Based SQL Generation Wi

Uploaded by

10 LLM Based SQL Generation Wi

Uploaded by

LLM-based SQL Generation with Reinforcement Learning

Mariia Berdnyk1 , Marine Collery 1

Abstract incorporating a dependency graph to guide token predic-

Table 5: Experimental flan-t5-base, Seq2SQL and SQLNet agents models characteristics.

Table 6: PPO algorithm settings.

Appendix - Reference Reward Function

Figure 2: Reference Reward Function generated with S QL -

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.