HCteam_IT_Proposal
HCteam_IT_Proposal
Research title Securing the Future: Enhancing Text-to-SQL Systems for Secure
(English) and Efficient Database Querying
Abstract
This research proposal focuses on advancing Text-to-SQL systems, which translate natural
language queries into SQL commands, by addressing critical security vulnerabilities and
performance trade-offs. The primary objectives are to develop a novel methodology that
integrates robust security measures, specifically targeting prompt injection attacks, while
ensuring high semantic accuracy and efficient performance for practical deployment. The study
poses two key research questions: (1) How can access control mechanisms enhance security in
Text-to-SQL systems? (2) What strategies can balance security and performance effectively?
Current systems face problems such as mismatches with user intent, ambiguous vocabulary
handling, complex query generation, and security risks like data leakage due to inadequate
access controls. To tackle these, the research employs an experimental methodology across
three phases: designing a prototype with role-based and attribute-based access controls,
optimizing performance using techniques like query caching, and evaluating the system with a
custom dataset of 500 domain-specific questions. Simulated user scenarios and performance
metrics will validate security and efficiency. This work aims to deliver a secure, accurate, and
usable Text-to-SQL framework, enhancing its real-world applicability.
Key words: Text-to-SQL, Security, Access control, Large language models, Natural language
processing
1. Introduction
Structured Query Language (SQL) is a cornerstone of data management within
organizations, serving as the standard language for interacting with relational databases to store,
retrieve, and manipulate data efficiently [1]. SQL enables organizations to extract actionable
insights from vast datasets, supporting critical functions such as business intelligence, financial
reporting, and operational analytics [1]. However, its traditional usage poses significant
challenges. Non-technical users often depend on developers to create predefined data extraction
forms, which restricts flexibility for [1]
1
To address these limitations, Text-to-SQL has emerged as an innovative solution, allowing
users to query databases using natural language rather than structured SQL commands [2]. This
technology leverages advancements in natural language processing (NLP) to translate human
language into executable SQL queries, empowering non-technical individuals, such as sales
staff generating reports or managers making data-driven decisions, to interact with data
independently [1]. The evolution of Text-to-SQL has progressed from early rule-based systems
[1], to deep learning techniques [1], and most recently to the integration of large language
models (LLMs), significantly improving natural language understanding and query generation
[1]. Despite these advancements, Text-to-SQL systems face persistent challenges that limit their
practical adoption.
In terms of accuracy, Text-to-SQL encounters issues such as mismatches with user intent
(mismatch problem), difficulties in handling ambiguous vocabulary (lexical problem), and
limited capability when dealing with complex queries (complex query) [1]. Regarding security,
current Text-to-SQL systems lack effective security policies, posing risks of data leakage or
violations of security regulations, leading to severe consequences such as loss of sensitive
information [1]. Consequently, these limitations prevent Text-to-SQL from being widely
applied in real-world enterprise environments, where high reliability and security are required
[1].
Therefore, we propose to develop a Text-to-SQL framework that aims to address some of
the aforementioned challenges while maintaining a balance between performance and security
2
strategies, where multiple models or agents collaborate to generate robust queries. Techniques
such as ensemble modeling [5] and voting mechanisms [6] aggregate outputs from several
LLMs to enhance accuracy, while agent-based systems like the C3 framework [6] assign
specialized roles to different components, enabling them to tackle complex queries more
effectively. External verification is also crucial, as it refines initial LLM outputs using tools like
SQL execution engines [7] and verification models [8]. These methods, along with interactive
Spider [2]
Mostly, but with exceptions: FUXI [44], Dubo-SQL [40], Distillery [28], ReFORCE [31]
DIN-SQL [3], CoT [4, 9, 10, 13], DAIL-SQL [9], MAC-SQL [18], DELLM [39], SGU-SQL [13], POT
BIRD [4] [21], SQL-CRAFT [21], FUXI [44], R³ [21], Dubo-SQL [40], TA-SQL [24], MCS-SQL [25], CHESS
[26], SuperSQL [27], Distillery [28], E-SQL [47], CHASE-SQL [45], RSL-SQL [29], Gen-SQL [30],
SAFE-SQL [42], SQL-LLAMA [18], CodeS [11], Dubo-SQL [40], Distillery [28], KaSLA [12], MSc-SQL
[54], XiYan-SQL [51], ROUTE [55]
Spider-DK [36]
TA-SQL [24], ROUTE [55]
Spider-SYN [37] DESEM+P [32], StructGPT [33], SD+SA+Voting [34], ACT-SQL [37], PURPLE [23], DTS-SQL [53],
ROUTE [55]
Datasets
Spider-Realistic [15] CoT [4],[9],[13],[10], StructGPT [33], SD+SA+Voting [34], QDecomp [10], DAIL-SQL [9], PURPLE
[23], TA-SQL [24], DAIL-SQL [9], ROUTE [55]
Spider 2.0 [16]
DIN-SQL [3], DAIL-SQL [9], CHESS [26], ReFORCE [31], CodeS [11]
Component Matching
[2]
Syntax-
based DIN-SQL [3],
[3], DAIL-SQL
DAIL-SQL[9],[9],ACT-SQL
ACT-SQL[37],
[37],
MAC-SQL
MAC-SQL [18],
[18],
SGU-SQL
SGU-SQL
[13],[13],
MetaSQL [20],
Exact Matching [2]
Evaluation MetaSQL [23],
PURPLE [20], SuperSQL
PURPLE [23],
[27], SuperSQL
Gen-SQL [30],
[27], SAFE-SQL
Gen-SQL [30],
[42], SAFE-SQL
Symbol-LLM[42],
[49], StructLM
Metrics Symbol-LLM
[50], SQL-LLAMA
[49], [18],
StructLM
DTS-SQL
[50], SQL-LLAMA
[53] [18], DTS-SQL [53]
Execution Accuracy
Execution-
[2]
based Mostly,
Mostly,
but with
butexceptions:
with exceptions:
Symbol-LLM
Symbol-LLM
[49], StructLM
[49], StructLM
[50] [50]
Input
Database Schema LLaMA Series DAIL-SQL [9], MAC-SQL [18], Symbol-LLM [49], SQL-LLAMA
[18], StructLM [50], Distillery [28], Gen-SQL [30]
Text-to-SQL Open-Source for
Customization Open-Source Models Self-Debugging [46], CodeS [11], DTS-SQL [53], CLLMs [81],
RSL-SQL [29], KaSLA [12], XiYan-SQL [51], ROUTE [55],
MSc-SQL [54]
Models
DIN-SQL [3], CoT [1, 8, 30, 47], DAIL-SQL [9], ACT-SQL [37],
MAC-SQL [18], DEA-SQL [19], DELLM [39], SGU-SQL [13],
POT [21], SQL-CRAFT [21], FUXI [44], MetaSQL [20], R³ [21],
GPT Series PET-SQL [22], PURPLE [23], TA-SQL [24], MCS-SQL [25],
CHESS [26], SuperSQL [27], Distillery [28], E-SQL [47], SAFE-
SQL [42], DESEM+P [32], StructGPT [33], SD+SA+Voting
[34], RAG+SP&DRC [35], C3 [6], FUSED [38], Dubo-SQL [40],
ICRL [41], ReFORCE [31]
Single Enhanced CoT [1, 8, 30, 47], StructGPT [33], Least-to-Most [10], ODIS [36], ACT-SQL [37],
FUSED [38], POT [21], Gen-SQL [30], SAFE-SQL [42]
Multi-LLM/ Multi- Agent/
Ensemble Systems Coder-Reviewer [17], SD+SA+Voting [34], C3 [6], MAC-SQL [18], R³ [21], MCS-
SQL [25], CHESS [26], XiYan-SQL [51], MSc-SQL [54]
MBR-Exec [7], LEVER [8], Self-Debugging [46], DESEM+P [32], DIN-SQL [3],
Techniques External Refinement SQL-PaLM [43], RAG+SP&DRC [35], DAIL-SQL [9], DELLM [39], SQL-CRAFT
[21], FUXI [44], PURPLE [23], Dubo-SQL [40], SuperSQL [27], Distillery [28], E-
SQL [47], CHASE-SQL [45], RSL-SQL [29], ReFORCE [31]
Task Decomposition DIN-SQL [3], QDecomp [10], C3 [6], DEA-SQL [19], SGU-SQL [13], MetaSQL
[20], PET-SQL [22], TA-SQL [24], Distillery [28]
Specialized Training DELLM [39], ICRL [41], CodeS [11], Symbol-LLM [49], SQL-LLAMA [18],
StructLM [50], DTS-SQL [53], CLLMs [81], KaSLA [12], MSc-SQL [54], XiYan-
SQL [51], ROUTE [55]
Figure 1: Categorization of these methods and techniques, taking inspiration from [3]
3
user feedback and rule-based systems [9] ensure generated SQL queries are accurate and
executable. Task decomposition, using techniques like question decomposition [10] and multi-
step workflows (e.g., RESDSQL [57]), breaks complex T2SQL queries into manageable sub-
tasks, improving handling of intricate queries. LLMs with specialized training, such as those
pre-trained on SQL datasets like GRAPPA [58] or adapted with architectural changes [59],
enhance SQL and schema comprehension, boosting T2SQL performance.
1.1.2 Performance in LLM-Based Text-to-SQL Systems
Despite advancements, T2SQL systems face challenges in interpreting ambiguous user
intent, as natural language nuances are hard to map to SQL. Complex database schemas,
especially in cross-domain settings, and generating rare SQL constructs like nested subqueries
remain difficult due to limited training data [4], [9]. These issues highlight the need for research
in few-shot learning, domain adaptation, and external knowledge integration to enhance T2SQL
performance.
Benchmark datasets like Spider [60] (10,000+ cross-domain examples) and BIRD [4], along
with variants like Spider-DK [61], Spider-SYN [62], Spider-Realistic [15], and Spider 2.0 [16],
support T2SQL development. Metrics such as component matching, exact matching, execution
accuracy, and valid efficiency score [60], [63] ensure robust evaluation. However, challenges
like vague intent, complex schemas, and rare constructs persist, necessitating ongoing
innovation to meet real-world demands.
1.1.3 Security in LLM-Based Text-to-SQL Systems
LLM-based T2SQL systems, while advancing natural language database interaction,
introduce security risks due to direct database access [64], [65]. Key vulnerabilities include
access control bypass [66], SQL injection via backdoors/poisoning [67], schema inference
attacks [68], prompt injection [64], sensitive data disclosure [68], and DoS risks [65]. These
threats emphasize the need for robust security measures. Defense mechanisms targeting various
system lifecycle stages are critical, with common attack types, descriptions, and defenses
summarized in the accompanying table.
Table 1: Summarizing the attack types and their corresponding defense mechanisms
Attack Type Description Defense Mechanisms Limitations of
Defense
Prompt Manipulating the LLM - Input/Output Filtering & - May reduce
Injection [69] via crafted input Validation [69] flexibility or
prompts to override - Constrain Model limit
instructions or bypass Behavior [69] capabilities.
filters. - Human Approval
4
- Sandboxing
- Instruction Defense [64]
Backdoor / Compromising the - Training Data - Trigger
Data model during training by Monitoring/Validation detection is
Poisoning [69] injecting malicious [69] challenging.
data/triggers, causing - Input Filtering (trigger - Defenses like
harmful SQL generation detection) ONION show
upon activation. - Model Pruning [64] limitations [70]
- Adversarial Training
[67]
- Static Analysis
Schema Exploiting model - Limit Schema Info in - May decrease
Inference [68] responses to deduce the Prompt [68] SQL accuracy.
underlying database - Defensive Prompting - Defensive
schema without prior (Limited) [68] prompts have
knowledge. - Access Control on limited
Schema effectiveness [6
8]
SQL Injection The model generates - Output - Output
(Generated SQL containing Validation/Sanitization sanitization is
SQL) [69] executable malicious [69] essential.
code, often influenced - Least Privilege [69]. - Static analysis
by other attacks. - Static SQL Analysis may miss
(Limited) [67]. sophisticated
payloads [67].
Sensitive Info The model inadvertently - Data Sanitization [69] - DP might
Disclosure [69] outputs confidential - Differential Privacy (DP) reduce model
information from [64] utility.
training data or - Output Filtering [69]
interaction context. - Access Control
Excessive Granting the LLM or - Least Privilege - Requires
Agency [69] associated tools (Model/Extensions) [69] careful system
excessive permissions or - Human Approval [69] design.
autonomy beyond safe - Limit Extension
operational scope. Functionality [69]
5
- Complete Mediation
DoS / Exploiting the system to - Rate Limiting [69] - Balance abuse
Unbounded cause excessive resource - Input Validation prevention vs.
Consumption usage (computation, API (size/complexity) [69] legitimate use.
[69] calls), leading to service - Resource Allocation
degradation or high Management [69]
costs. - Timeouts/Throttling [69]
However, implementing security measures often involves a security-performance trade-off.
Additional checks can introduce latency [65], while certain defenses might degrade model
accuracy or flexibility[64]. Achieving an optimal balance between security and operational
efficiency remains a key challenge for the practical deployment of these powerful systems.
1.2. The necessity of the research
This research is developed to solve two main problems that exist in current Text-to-SQL
approaches:
Addressing Security Deficiencies: Systematically identifying the under-explored
security vulnerabilities inherent in current Text-to-SQL systems and proposing novel
defense mechanisms to enhance system stability, security, and availability.
Optimizing Practical Viability: Investigating and balancing the crucial trade-offs
between implementing robust security measures, maintaining high translation accuracy,
and achieving efficient performance to meet rigorous industrial standards for real-world
deployment.
1.3. Feasibility of Research
Research question and objectives
This study addresses two primary questions:
1. How can access control mechanisms enhance security in Text-to-SQL systems?
2. How can performance and security be balanced in Text-to-SQL systems?
The objective is to develop and evaluate a secure Text-to-SQL system that integrates access
control while maintaining computational efficiency.
Research design
An experimental approach is employed, structured in three phases:
1. System design and implementation incorporating RBAC and ABAC.
2. Performance optimization via query caching and schema filtering.
3. System evaluation using a custom dataset and simulated user scenarios.
6
Data collection methods:
Data collection will involve both quantitative and qualitative methods to evaluate the system's
security and performance. Specifically, three approaches will be used: (1) a custom benchmark
dataset of approximately 500 domain-specific questions, designed to test access control
mechanisms and complex query handling, addressing the limitations of existing datasets like
Spider and BIRD; this dataset will be publicly released. (2) Simulated user scenarios with
synthetic profiles (e.g., administrator, analyst, guest) to assess RBAC and ABAC in realistic
access control contexts. (3) Collection of quantitative performance metrics, such as query
execution time and security-induced latency, during system testing.
Data analysis
Security evaluation through unauthorized query blocking rates and penetration testing.
Performance analysis by comparing accuracy and latency against a baseline system.
Data sources: Primary data sources include the custom dataset and performance logs; user
scenarios serve as supplementary data.
Timeframe: Research is divided into three phases: initial survey, experimental development,
and final evaluation.
Risks and mitigation
Technical integration challenges will be managed through iterative development.
Dataset suitability will be ensured via rigorous testing.
Realistic user scenarios will be designed by referencing real-world standards.
2. Research objectives
Our main research objectives include:
Develop and propose a novel methodology for Text-to-SQL translation that
intrinsically integrates security considerations with high semantic accuracy.
Design and implement specific defense mechanisms tailored to mitigate the identified
vulnerabilities in Text-to-SQL systems.
Rigorously evaluate the effectiveness and precision of the proposed methodology and
defense mechanisms.
3. Research scope
The scope of this research is defined by the following boundaries:
Attack Vector Focus: The investigation of security vulnerabilities and the development
of defense mechanisms will concentrate specifically on prompt injection attacks
executed through the natural language interface of the Text-to-SQL system. Other
potential attack vectors that do not directly leverage the manipulation of the underlying
Large Language Model (LLM) input are considered outside the scope of this work.
7
SQL Statement Limitation: The Text-to-SQL system developed and evaluated within
this research will be constrained to generating SELECT statements only. The generation
of SQL statements that modify data (e.g., INSERT, UPDATE, DELETE) or perform
other database actions beyond data retrieval is explicitly excluded.
4. Approach and Method
Let 𝑄𝑟𝑎𝑤 be the raw natural language question from the user 𝑈. Let 𝑆 be the database
schema, and 𝐻 be the history of successfully answered (𝑄ℎ𝑖𝑠𝑡 , 𝑆𝑄𝐿ℎ𝑖𝑠𝑡 ) pairs. The process can
be formalized as follows:
4.1 Security Check Phase:
Input validation: Apply a sanitization function 𝑆𝑎𝑛𝑖𝑡𝑖𝑧𝑒. Let 𝐼𝑠𝑀𝑎𝑙𝑖𝑐𝑖𝑜𝑢𝑠(𝑄𝑟𝑎𝑤 ) →
𝑇𝑟𝑢𝑒, 𝐹𝑎𝑙𝑠𝑒 detect threats:
𝑄𝑠𝑎𝑓𝑒 = 𝑆𝑎𝑛𝑛𝑖𝑡𝑖𝑧𝑒(𝑄𝑟𝑎𝑤 ) 𝑖𝑓 𝐼𝑠𝑀𝑎𝑙𝑖𝑐𝑖𝑜𝑢𝑠(𝑄𝑟𝑎𝑤 )
Access control: Let 𝑃𝑒𝑟𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑠(𝑈) be the user 𝑈 allowed schema subset. Let
𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑𝑆𝑐ℎ𝑒𝑚𝑎(𝑄𝑠𝑎𝑓𝑒 ) ⊆ 𝑆 be the estimated schema needed. Define
8
Apply. Let 𝑅𝑝𝑟𝑜𝑐 = 𝑅𝑠𝑢𝑚 if summarization is applied, else 𝑅𝑝𝑟𝑜𝑐 = 𝑅𝑟𝑎𝑤 .
Output Validation: Check the processed result: 𝐼𝑠𝑃𝑎𝑠𝑠 = 𝑉𝑎𝑙𝑖𝑑𝑎𝑡𝑒𝑂𝑢𝑡𝑝𝑢𝑡(𝑅𝑝𝑟𝑜𝑐 ).
Final Answer: If 𝐼𝑠𝑃𝑎𝑠𝑠, 𝐴𝑛𝑠𝑤𝑒𝑟 = 𝑅𝑝𝑟𝑜𝑐 . Otherwise, handle the error.
Security check Pre-retrieval phase Generate SQL phase
Access control
Access user basic Retrieve similar
Break down task Generate query
information question in history
Identify relevant
Output Value matching Error validation Execute query
Answer table and column
validation
Summary phase
Summarization Database
where 𝐼 is the indicator function, returning 1 if the generated SQL matches the ground
truth exactly, and 0 otherwise.
Exact Accuracy (EA):
1
𝐸𝐴 = 𝑁 ∑𝑁 𝑖 𝑖 𝑖
𝑖=1 𝐼 (𝑅raw = 𝑅true ∧ IsValid(𝑆𝑄𝐿gen ))
𝑖 𝑖
where 𝑅raw is compared to 𝑅true , and IsValid(𝑆𝑄𝐿𝑖gen ) ensures the query is
syntactically correct and executable
Security Compliance (SC):
1 𝑖
𝑆𝐶 = 𝑁 ∑𝑁
𝑖=1 𝐼(CheckAccess(𝑈, 𝑄safe ) = True)
𝑖 𝑖
where CheckAccess(𝑈, 𝑄safe ) = (RequiredSchema(𝑄safe ) ⊆ Permissions(𝑈))
9
approaches and current
problems
2 22-29/4 Create a data test that A well-defined dataset Full team
contains both normal that can be used to
questions and malicious evaluate the proposed
questions method carefully
3 27-17/5 Experiment with multiple A defense mechanism Full team
defense strategies against that can work well under
attacks various attacks
4 27-17/5 Build a pre-retrieval An effective pre- Full team
method to process the retrieval method that can
question support the precision of
the whole system
5 15/5-2/7 Construct a methodology A method that meets the Full team
to generate an accurate requirements for
SQL query and generating a usable and
summarize it for the user safe SQL query
6 1-9/7 Evaluate the result and A detailed result about Full team
draw a conclusion the performance and the
safety of the proposed
method
7 10-14/7 Write the final report A full report that is Full team
ready to submit
6. Expected results
This research is anticipated to yield the following key outcomes:
• A Novel Methodology: A validated Text-to-SQL approach integrating robust
security (anti-prompt injection) and high semantic accuracy for reliable systems.
• Enhanced Security: Practical defenses against prompt injection, reducing data
breach risks and boosting trust in natural language database interfaces.
• Improved Reliability and Usability: More secure and accurate Text-to-SQL systems,
enhancing usability for non-experts and supporting confident real-world
deployment.
10
References
[1] Z. Hong et al., “Next-Generation Database Interfaces: A Survey of LLM-based
Text-to-SQL,” Mar. 13, 2025, arXiv: arXiv:2406.08426. doi:
10.48550/arXiv.2406.08426.
[2] T. Yu et al., “Spider: A Large-Scale Human-Labeled Dataset for Complex and
Cross-Domain Semantic Parsing and Text-to-SQL Task,” Feb. 02, 2019, arXiv:
arXiv:1809.08887. doi: 10.48550/arXiv.1809.08887.
[3] M. Pourreza and D. Rafiei, “DIN-SQL: Decomposed In-Context Learning of
Text-to-SQL with Self-Correction,” Nov. 02, 2023, arXiv: arXiv:2304.11015. doi:
10.48550/arXiv.2304.11015.
[4] J. Li et al., “Can LLM Already Serve as A Database Interface? A BIg Bench for
Large-Scale Database Grounded Text-to-SQLs,” Adv. Neural Inf. Process. Syst., vol.
36, pp. 42330–42357, Dec. 2023.
[5] L. Wang et al., “Proton: Probing Schema Linking Information from Pre-trained
Language Models for Text-to-SQL Parsing,” Aug. 06, 2022, arXiv: arXiv:2206.14017.
doi: 10.48550/arXiv.2206.14017.
[6] X. Dong et al., “C3: Zero-shot Text-to-SQL with ChatGPT,” Jul. 14, 2023,
arXiv: arXiv:2307.07306. doi: 10.48550/arXiv.2307.07306.
[7] F. Shi, D. Fried, M. Ghazvininejad, L. Zettlemoyer, and S. I. Wang, “Natural
Language to Code Translation with Execution,” Nov. 01, 2022, arXiv:
arXiv:2204.11454. doi: 10.48550/arXiv.2204.11454.
[8] A. Ni et al., “LEVER: Learning to Verify Language-to-Code Generation with
Execution,” in Proceedings of the 40th International Conference on Machine
Learning, PMLR, Jul. 2023, pp. 26106–26128. Accessed: Apr. 24, 2025. [Online].
Available: https://proceedings.mlr.press/v202/ni23b.html
[9] D. Gao et al., “Text-to-SQL Empowered by Large Language Models: A
Benchmark Evaluation,” Nov. 20, 2023, arXiv: arXiv:2308.15363. doi:
10.48550/arXiv.2308.15363.
[10] C.-Y. Tai, Z. Chen, T. Zhang, X. Deng, and H. Sun, “Exploring Chain-of-
Thought Style Prompting for Text-to-SQL,” Oct. 27, 2023, arXiv: arXiv:2305.14215.
doi: 10.48550/arXiv.2305.14215.
[11] H. Li et al., “CodeS: Towards Building Open-source Language Models for
Text-to-SQL,” Feb. 26, 2024, arXiv: arXiv:2402.16347. doi:
10.48550/arXiv.2402.16347.
[12] Z. Yuan, H. Chen, Z. Hong, Q. Zhang, F. Huang, and X. Huang, “Knapsack
Optimization-based Schema Linking for LLM-based Text-to-SQL Generation,” Feb.
18, 2025, arXiv: arXiv:2502.12911. doi: 10.48550/arXiv.2502.12911.
[13] Q. Zhang, J. Dong, H. Chen, W. Li, F. Huang, and X. Huang, “Structure
Guided Large Language Model for SQL Generation,” Mar. 27, 2024, arXiv:
arXiv:2402.13284. doi: 10.48550/arXiv.2402.13284.
[14] L. Wang et al., “DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL
Dataset,” in Proceedings of the 2020 Conference on Empirical Methods in Natural
Language Processing (EMNLP), B. Webber, T. Cohn, Y. He, and Y. Liu, Eds.,
Online: Association for Computational Linguistics, Oct. 2020, pp. 6923–6935. doi:
10.18653/v1/2020.emnlp-main.562.
[15] X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson,
“Structure-Grounded Pretraining for Text-to-SQL,” in Proceedings of the 2021
Conference of the North American Chapter of the Association for Computational
11
Linguistics: Human Language Technologies, 2021, pp. 1337–1350. doi:
10.18653/v1/2021.naacl-main.105.
[16] F. Lei et al., “Spider 2.0: Evaluating Language Models on Real-World
Enterprise Text-to-SQL Workflows,” Mar. 17, 2025, arXiv: arXiv:2411.07763. doi:
10.48550/arXiv.2411.07763.
[17] T. Zhang et al., “Coder Reviewer Reranking for Code Generation,” in
Proceedings of the 40th International Conference on Machine Learning, PMLR, Jul.
2023, pp. 41832–41846. Accessed: Apr. 24, 2025. [Online]. Available:
https://proceedings.mlr.press/v202/zhang23av.html
[18] B. Wang et al., “MAC-SQL: A Multi-Agent Collaborative Framework for
Text-to-SQL,” Mar. 18, 2025, arXiv: arXiv:2312.11242. doi:
10.48550/arXiv.2312.11242.
[19] Y. Xie et al., “Decomposition for Enhancing Attention: Improving LLM-based
Text-to-SQL through Workflow Paradigm,” Jul. 03, 2024, arXiv: arXiv:2402.10671.
doi: 10.48550/arXiv.2402.10671.
[20] Y. Fan et al., “Metasql: A Generate-Then-Rank Framework for Natural
Language to SQL Translation,” in 2024 IEEE 40th International Conference on Data
Engineering (ICDE), May 2024, pp. 1765–1778. doi:
10.1109/ICDE60146.2024.00143.
[21] H. Xia et al., “$R^3$: ‘This is My SQL, Are You With Me?’ A Consensus-
Based Multi-Agent System for Text-to-SQL Tasks,” Feb. 01, 2024, arXiv. doi:
10.48550/arXiv.2402.14851.
[22] Z. Li et al., “PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-
to-SQL with Cross-consistency,” Mar. 01, 2024, arXiv. doi:
10.48550/arXiv.2403.09732.
[23] T. Ren et al., “PURPLE: Making a Large Language Model a Better SQL
Writer,” in 2024 IEEE 40th International Conference on Data Engineering (ICDE),
May 2024, pp. 15–28. doi: 10.1109/ICDE60146.2024.00009.
[24] G. Qu et al., “Before Generation, Align it! A Novel and Effective Strategy for
Mitigating Hallucinations in Text-to-SQL Generation,” May 24, 2024, arXiv:
arXiv:2405.15307. doi: 10.48550/arXiv.2405.15307.
[25] D. Lee, C. Park, J. Kim, and H. Park, “MCS-SQL: Leveraging Multiple
Prompts and Multiple-Choice Selection For Text-to-SQL Generation,” May 13, 2024,
arXiv: arXiv:2405.07467. doi: 10.48550/arXiv.2405.07467.
[26] S. Talaei, M. Pourreza, Y.-C. Chang, A. Mirhoseini, and A. Saberi, “CHESS:
Contextual Harnessing for Efficient SQL Synthesis,” Nov. 25, 2024, arXiv:
arXiv:2405.16755. doi: 10.48550/arXiv.2405.16755.
[27] B. Li, Y. Luo, C. Chai, G. Li, and N. Tang, “The Dawn of Natural Language to
SQL: Are We Fully Ready?,” Proc. VLDB Endow., vol. 17, no. 11, pp. 3318–3331,
Jul. 2024, doi: 10.14778/3681954.3682003.
[28] K. Maamari, F. Abubaker, D. Jaroslawicz, and A. Mhedhbi, “The Death of
Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models,” Aug.
18, 2024, arXiv: arXiv:2408.07702. doi: 10.48550/arXiv.2408.07702.
[29] Z. Cao, Y. Zheng, Z. Fan, X. Zhang, W. Chen, and X. Bai, “RSL-SQL: Robust
Schema Linking in Text-to-SQL Generation,” Nov. 26, 2024, arXiv:
arXiv:2411.00073. doi: 10.48550/arXiv.2411.00073.
[30] J. Shi et al., “Gen-SQL: Efficient Text-to-SQL By Bridging Natural Language
Question And Database Schema With Pseudo-Schema,” in Proceedings of the 31st
12
International Conference on Computational Linguistics, O. Rambow, L. Wanner, M.
Apidianaki, H. Al-Khalifa, B. D. Eugenio, and S. Schockaert, Eds., Abu Dhabi, UAE:
Association for Computational Linguistics, Jan. 2025, pp. 3794–3807. Accessed: Apr.
24, 2025. [Online]. Available: https://aclanthology.org/2025.coling-main.256/
[31] M. Deng et al., “ReFoRCE: A Text-to-SQL Agent with Self-Refinement,
Format Restriction, and Column Exploration,” Apr. 11, 2025, arXiv:
arXiv:2502.00675. doi: 10.48550/arXiv.2502.00675.
[32] C. Guo et al., “Prompting GPT-3.5 for Text-to-SQL with De-semanticization
and Skeleton Retrieval,” in PRICAI 2023: Trends in Artificial Intelligence, F. Liu, A.
A. Sadanandan, D. N. Pham, P. Mursanto, and D. Lukose, Eds., Singapore: Springer
Nature, 2024, pp. 262–274. doi: 10.1007/978-981-99-7022-3_23.
[33] J. Jiang, K. Zhou, Z. Dong, K. Ye, W. X. Zhao, and J.-R. Wen, “StructGPT: A
General Framework for Large Language Model to Reason over Structured Data,” Oct.
23, 2023, arXiv: arXiv:2305.09645. doi: 10.48550/arXiv.2305.09645.
[34] L. Nan et al., “Enhancing Text-to-SQL Capabilities of Large Language Models:
A Study on Prompt Design Strategies,” in Findings of the Association for
Computational Linguistics: EMNLP 2023, H. Bouamor, J. Pino, and K. Bali, Eds.,
Singapore: Association for Computational Linguistics, Oct. 2023, pp. 14935–14956.
doi: 10.18653/v1/2023.findings-emnlp.996.
[35] C. Guo et al., “Retrieval-Augmented GPT-3.5-Based Text-to-SQL Framework
with Sample-Aware Prompting and Dynamic Revision Chain,” in Neural Information
Processing, B. Luo, L. Cheng, Z.-G. Wu, H. Li, and C. Li, Eds., Singapore: Springer
Nature, 2024, pp. 341–356. doi: 10.1007/978-981-99-8076-5_25.
[36] S. Chang and E. Fosler-Lussier, “Selective Demonstrations for Cross-domain
Text-to-SQL,” Oct. 10, 2023, arXiv: arXiv:2310.06302. doi:
10.48550/arXiv.2310.06302.
[37] H. Zhang, R. Cao, L. Chen, H. Xu, and K. Yu, “ACT-SQL: In-Context
Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought,” Oct. 26,
2023, arXiv: arXiv:2310.17342. doi: 10.48550/arXiv.2310.17342.
[38] D. Wang, L. Dou, X. Zhang, Q. Zhu, and W. Che, “Improving Demonstration
Diversity by Human-Free Fusing for Text-to-SQL,” Jun. 26, 2024, arXiv:
arXiv:2402.10663. doi: 10.48550/arXiv.2402.10663.
[39] Z. Hong, Z. Yuan, H. Chen, Q. Zhang, F. Huang, and X. Huang, “Knowledge-
to-SQL: Enhancing SQL Generation with Data Expert LLM,” Jun. 06, 2024, arXiv:
arXiv:2402.11517. doi: 10.48550/arXiv.2402.11517.
[40] D. G. Thorpe, A. J. Duberstein, and I. A. Kinsey, “Dubo-SQL: Diverse
Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL,” Apr. 19, 2024,
arXiv: arXiv:2404.12560. doi: 10.48550/arXiv.2404.12560.
[41] R. Toteja, A. Sarkar, and P. M. Comar, “In-Context Reinforcement Learning
with Retrieval-Augmented Generation for Text-to-SQL,” in Proceedings of the 31st
International Conference on Computational Linguistics, O. Rambow, L. Wanner, M.
Apidianaki, H. Al-Khalifa, B. D. Eugenio, and S. Schockaert, Eds., Abu Dhabi, UAE:
Association for Computational Linguistics, Jan. 2025, pp. 10390–10397. Accessed:
Apr. 24, 2025. [Online]. Available: https://aclanthology.org/2025.coling-main.692/
[42] J. Lee, I. Baek, B. Kim, and H. Lee, “SAFE-SQL: Self-Augmented In-Context
Learning with Fine-grained Example Selection for Text-to-SQL,” Feb. 17, 2025,
arXiv: arXiv:2502.11438. doi: 10.48550/arXiv.2502.11438.
13
[43] R. Sun et al., “SQL-PaLM: Improved Large Language Model Adaptation for
Text-to-SQL (extended),” Mar. 30, 2024, arXiv: arXiv:2306.00739. doi:
10.48550/arXiv.2306.00739.
[44] Y. Gu et al., “Middleware for LLMs: Tools Are Instrumental for Language
Agents in Complex Environments,” Oct. 04, 2024, arXiv: arXiv:2402.14672. doi:
10.48550/arXiv.2402.14672.
[45] M. Pourreza et al., “CHASE-SQL: Multi-Path Reasoning and Preference
Optimized Candidate Selection in Text-to-SQL,” Oct. 02, 2024, arXiv:
arXiv:2410.01943. doi: 10.48550/arXiv.2410.01943.
[46] X. Chen, M. Lin, N. Schärli, and D. Zhou, “Teaching Large Language Models
to Self-Debug,” Oct. 05, 2023, arXiv: arXiv:2304.05128. doi:
10.48550/arXiv.2304.05128.
[47] H. A. Caferoğlu and Ö. Ulusoy, “E-SQL: Direct Schema Linking via Question
Enrichment in Text-to-SQL,” Jan. 28, 2025, arXiv: arXiv:2409.16751. doi:
10.48550/arXiv.2409.16751.
[48] S. Kou, L. Hu, Z. He, Z. Deng, and H. Zhang, “CLLMs: Consistency Large
Language Models,” presented at the Forty-first International Conference on Machine
Learning, Jun. 2024. Accessed: Apr. 24, 2025. [Online]. Available:
https://openreview.net/forum?id=8uzBOVmh8H
[49] F. Xu et al., “Symbol-LLM: Towards Foundational Symbol-centric Interface
For Large Language Models,” Feb. 18, 2024, arXiv: arXiv:2311.09278. doi:
10.48550/arXiv.2311.09278.
[50] A. Zhuang et al., “StructLM: Towards Building Generalist Models for
Structured Knowledge Grounding,” Oct. 07, 2024, arXiv: arXiv:2402.16671. doi:
10.48550/arXiv.2402.16671.
[51] Y. Gao et al., “A Preview of XiYan-SQL: A Multi-Generator Ensemble
Framework for Text-to-SQL,” 2024, arXiv. doi: 10.48550/ARXIV.2411.08599.
[52] B. Rozière et al., “Code Llama: Open Foundation Models for Code,” Jan. 31,
2024, arXiv: arXiv:2308.12950. doi: 10.48550/arXiv.2308.12950.
[53] M. Pourreza and D. Rafiei, “DTS-SQL: Decomposed Text-to-SQL with Small
Large Language Models,” Feb. 02, 2024, arXiv: arXiv:2402.01117. doi:
10.48550/arXiv.2402.01117.
[54] S. K. Gorti et al., “MSc-SQL: Multi-Sample Critiquing Small Language
Models For Text-To-SQL Translation,” Feb. 16, 2025, arXiv: arXiv:2410.12916. doi:
10.48550/arXiv.2410.12916.
[55] Y. Qin et al., “ROUTE: Robust Multitask Tuning and Collaboration for Text-
to-SQL,” Dec. 13, 2024, arXiv: arXiv:2412.10138. doi: 10.48550/arXiv.2412.10138.
[56] G. Katsogiannis-Meimarakis and G. Koutrika, “A survey on deep learning
approaches for text-to-SQL,” VLDB J., vol. 32, no. 4, pp. 905–936, Jul. 2023, doi:
10.1007/s00778-022-00776-8.
[57] H. Li, J. Zhang, C. Li, and H. Chen, “RESDSQL: Decoupling Schema Linking
and Skeleton Parsing for Text-to-SQL,” Proc. AAAI Conf. Artif. Intell., vol. 37, no. 11,
Art. no. 11, Jun. 2023, doi: 10.1609/aaai.v37i11.26535.
[58] T. Yu et al., “GraPPa: Grammar-Augmented Pre-Training for Table Semantic
Parsing,” May 29, 2021, arXiv: arXiv:2009.13845. doi: 10.48550/arXiv.2009.13845.
[59] B. Hui et al., “Improving Text-to-SQL with Schema Dependency Learning,”
Dec. 10, 2021, arXiv: arXiv:2103.04399. doi: 10.48550/arXiv.2103.04399.
14
[60] T. Yu et al., “Spider: A Large-Scale Human-Labeled Dataset for Complex and
Cross-Domain Semantic Parsing and Text-to-SQL Task,” Feb. 02, 2019, arXiv:
arXiv:1809.08887. doi: 10.48550/arXiv.1809.08887.
[61] Y. Gan, X. Chen, and M. Purver, “Exploring Underexplored Limitations of
Cross-Domain Text-to-SQL Generalization,” Sep. 11, 2021, arXiv: arXiv:2109.05157.
doi: 10.48550/arXiv.2109.05157.
[62] Y. Gan et al., “Towards Robustness of Text-to-SQL Models against Synonym
Substitution,” Jun. 19, 2021, arXiv: arXiv:2106.01065. doi:
10.48550/arXiv.2106.01065.
[63] V. Zhong, C. Xiong, and R. Socher, “Seq2SQL: Generating Structured Queries
from Natural Language using Reinforcement Learning,” Nov. 09, 2017, arXiv:
arXiv:1709.00103. doi: 10.48550/arXiv.1709.00103.
[64] B. C. Das, M. H. Amini, and Y. Wu, “Security and Privacy Challenges of Large
Language Models: A Survey,” Nov. 14, 2024, arXiv: arXiv:2402.00888. doi:
10.48550/arXiv.2402.00888.
[65] X. Peng, Y. Zhang, J. Yang, and M. Stevenson, “On the Security
Vulnerabilities of Text-to-SQL Models,” May 11, 2024, arXiv: arXiv:2211.15363. doi:
10.48550/arXiv.2211.15363.
[66] P. Subramaniam and S. Krishnan, “Intent-Based Access Control: Using LLMs
to Intelligently Manage Access Control,” Aug. 06, 2024, arXiv: arXiv:2402.07332.
doi: 10.48550/arXiv.2402.07332.
[67] M. Lin et al., “ToxicSQL: Migrating SQL Injection Threats into Text-to-SQL
Models via Backdoor Attack,” Apr. 03, 2025, arXiv: arXiv:2503.05445. doi:
10.48550/arXiv.2503.05445.
[68] Đ. Klisura and A. Rios, “Unmasking Database Vulnerabilities: Zero-
Knowledge Schema Inference Attacks in Text-to-SQL Systems,” Oct. 17, 2024, arXiv:
arXiv:2406.14545. doi: 10.48550/arXiv.2406.14545.
[69] “OWASP Top 10 for Large Language Model Applications | OWASP
Foundation.” Accessed: Apr. 23, 2025. [Online]. Available: https://owasp.org/www-
project-top-10-for-large-language-model-applications/
[70] J. Zhang, Y. Zhou, B. Hui, Y. Liu, Z. Li, and S. Hu, “TrojanSQL: SQL
Injection against Natural Language Interface to Database,” in Proceedings of the 2023
Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J.
Pino, and K. Bali, Eds., Singapore: Association for Computational Linguistics, Oct.
2023, pp. 4344–4359. doi: 10.18653/v1/2023.emnlp-main.264.
15