0% found this document useful (0 votes)

22 views15 pages

HCteam_IT_Proposal

This research proposal aims to enhance Text-to-SQL systems by addressing security vulnerabilities and performance issues, focusing on prompt injection attacks and semantic accuracy. The study will develop a new methodology incorporating access control mechanisms and performance optimization techniques, validated through an experimental approach with a custom dataset. The ultimate goal is to create a secure and efficient Text-to-SQL framework that improves usability for non-technical users in real-world applications.

Uploaded by

ngochai0217

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views15 pages

HCteam_IT_Proposal

Uploaded by

ngochai0217

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

RESEARCH PROPOSAL TEMPLATE

Research title Securing the Future: Enhancing Text-to-SQL Systems for Secure
(English) and Efficient Database Querying

Sub-committee Information Technology

Group name HCteam

Authors Nguyễn Hồng Chiến
Vũ Thị Khánh Huyền
Lê Ngọc Hải
Supervisors Trịnh Tiến Đạt

Abstract
This research proposal focuses on advancing Text-to-SQL systems, which translate natural
language queries into SQL commands, by addressing critical security vulnerabilities and
performance trade-offs. The primary objectives are to develop a novel methodology that
integrates robust security measures, specifically targeting prompt injection attacks, while
ensuring high semantic accuracy and efficient performance for practical deployment. The study
poses two key research questions: (1) How can access control mechanisms enhance security in
Text-to-SQL systems? (2) What strategies can balance security and performance effectively?
Current systems face problems such as mismatches with user intent, ambiguous vocabulary
handling, complex query generation, and security risks like data leakage due to inadequate
access controls. To tackle these, the research employs an experimental methodology across
three phases: designing a prototype with role-based and attribute-based access controls,
optimizing performance using techniques like query caching, and evaluating the system with a
custom dataset of 500 domain-specific questions. Simulated user scenarios and performance
metrics will validate security and efficiency. This work aims to deliver a secure, accurate, and
usable Text-to-SQL framework, enhancing its real-world applicability.
Key words: Text-to-SQL, Security, Access control, Large language models, Natural language
processing

1. Introduction
Structured Query Language (SQL) is a cornerstone of data management within
organizations, serving as the standard language for interacting with relational databases to store,
retrieve, and manipulate data efficiently [1]. SQL enables organizations to extract actionable
insights from vast datasets, supporting critical functions such as business intelligence, financial
reporting, and operational analytics [1]. However, its traditional usage poses significant
challenges. Non-technical users often depend on developers to create predefined data extraction
forms, which restricts flexibility for [1]

1
To address these limitations, Text-to-SQL has emerged as an innovative solution, allowing
users to query databases using natural language rather than structured SQL commands [2]. This
technology leverages advancements in natural language processing (NLP) to translate human
language into executable SQL queries, empowering non-technical individuals, such as sales
staff generating reports or managers making data-driven decisions, to interact with data
independently [1]. The evolution of Text-to-SQL has progressed from early rule-based systems
[1], to deep learning techniques [1], and most recently to the integration of large language
models (LLMs), significantly improving natural language understanding and query generation
[1]. Despite these advancements, Text-to-SQL systems face persistent challenges that limit their
practical adoption.
In terms of accuracy, Text-to-SQL encounters issues such as mismatches with user intent
(mismatch problem), difficulties in handling ambiguous vocabulary (lexical problem), and
limited capability when dealing with complex queries (complex query) [1]. Regarding security,
current Text-to-SQL systems lack effective security policies, posing risks of data leakage or
violations of security regulations, leading to severe consequences such as loss of sensitive
information [1]. Consequently, these limitations prevent Text-to-SQL from being widely
applied in real-world enterprise environments, where high reliability and security are required
[1].
Therefore, we propose to develop a Text-to-SQL framework that aims to address some of
the aforementioned challenges while maintaining a balance between performance and security

1.1. Literature review

1.1.1 Related Works and Approaches in Text-to-SQL
Text-to-SQL (T2SQL) systems enable users to query databases using natural language,
leveraging database schemas to generate SQL queries. Large language models (LLMs) have
advanced T2SQL through five key strategies: single enhanced models, multi-LLM approaches,
external verification, task decomposition, and specialized LLM training. However, challenges
like ambiguous user intent, complex schema comprehension, and rare SQL construct generation
persist, underscoring the need for continued innovation in T2SQL research.
A primary strategy in T2SQL enhancement is the use of single enhanced models, which
optimize a single LLM's performance through techniques like prompt engineering, schema
augmentation, and schema simplification. For instance, DIN-SQL [3] employs prompt
engineering to guide the model toward precise SQL outputs, while the BIRD dataset [4] uses
schema augmentation to provide additional context. These methods improve the model's ability
to align user intent with the database structure. Another approach involves multi-LLM

2
strategies, where multiple models or agents collaborate to generate robust queries. Techniques
such as ensemble modeling [5] and voting mechanisms [6] aggregate outputs from several
LLMs to enhance accuracy, while agent-based systems like the C3 framework [6] assign
specialized roles to different components, enabling them to tackle complex queries more
effectively. External verification is also crucial, as it refines initial LLM outputs using tools like
SQL execution engines [7] and verification models [8]. These methods, along with interactive
Spider [2]
Mostly, but with exceptions: FUXI [44], Dubo-SQL [40], Distillery [28], ReFORCE [31]

DIN-SQL [3], CoT [4, 9, 10, 13], DAIL-SQL [9], MAC-SQL [18], DELLM [39], SGU-SQL [13], POT
BIRD [4] [21], SQL-CRAFT [21], FUXI [44], R³ [21], Dubo-SQL [40], TA-SQL [24], MCS-SQL [25], CHESS
[26], SuperSQL [27], Distillery [28], E-SQL [47], CHASE-SQL [45], RSL-SQL [29], Gen-SQL [30],
SAFE-SQL [42], SQL-LLAMA [18], CodeS [11], Dubo-SQL [40], Distillery [28], KaSLA [12], MSc-SQL
[54], XiYan-SQL [51], ROUTE [55]

Spider-DK [36]
TA-SQL [24], ROUTE [55]

Spider-SYN [37] DESEM+P [32], StructGPT [33], SD+SA+Voting [34], ACT-SQL [37], PURPLE [23], DTS-SQL [53],
ROUTE [55]
Datasets
Spider-Realistic [15] CoT [4],[9],[13],[10], StructGPT [33], SD+SA+Voting [34], QDecomp [10], DAIL-SQL [9], PURPLE
[23], TA-SQL [24], DAIL-SQL [9], ROUTE [55]
Spider 2.0 [16]
DIN-SQL [3], DAIL-SQL [9], CHESS [26], ReFORCE [31], CodeS [11]

Component Matching
[2]
Syntax-
based DIN-SQL [3],
[3], DAIL-SQL
DAIL-SQL[9],[9],ACT-SQL
ACT-SQL[37],
[37],
MAC-SQL
MAC-SQL [18],
[18],
SGU-SQL
SGU-SQL
[13],[13],
MetaSQL [20],
Exact Matching [2]
Evaluation MetaSQL [23],
PURPLE [20], SuperSQL
PURPLE [23],
[27], SuperSQL
Gen-SQL [30],
[27], SAFE-SQL
Gen-SQL [30],
[42], SAFE-SQL
Symbol-LLM[42],
[49], StructLM
Metrics Symbol-LLM
[50], SQL-LLAMA
[49], [18],
StructLM
DTS-SQL
[50], SQL-LLAMA
[53] [18], DTS-SQL [53]

Execution Accuracy
Execution-
[2]
based Mostly,
Mostly,
but with
butexceptions:
with exceptions:
Symbol-LLM
Symbol-LLM
[49], StructLM
[49], StructLM
[50] [50]

Valid Efficiency Score

[4] DIN-SQL [3], CoT [1, 8, 30, 47], DAIL-SQL [9], MAC-SQL [18], DELLM [39], MCS-SQL [25],
CHESS [26], SuperSQL [27], E-SQL [47], CHASE-SQL [45], RSL-SQL [29], Gen-SQL [30],
CodeS [11]

Natural Language Question

Input
Database Schema LLaMA Series DAIL-SQL [9], MAC-SQL [18], Symbol-LLM [49], SQL-LLAMA
[18], StructLM [50], Distillery [28], Gen-SQL [30]
Text-to-SQL Open-Source for
Customization Open-Source Models Self-Debugging [46], CodeS [11], DTS-SQL [53], CLLMs [81],
RSL-SQL [29], KaSLA [12], XiYan-SQL [51], ROUTE [55],
MSc-SQL [54]

Models
DIN-SQL [3], CoT [1, 8, 30, 47], DAIL-SQL [9], ACT-SQL [37],
MAC-SQL [18], DEA-SQL [19], DELLM [39], SGU-SQL [13],
POT [21], SQL-CRAFT [21], FUXI [44], MetaSQL [20], R³ [21],
GPT Series PET-SQL [22], PURPLE [23], TA-SQL [24], MCS-SQL [25],
CHESS [26], SuperSQL [27], Distillery [28], E-SQL [47], SAFE-
SQL [42], DESEM+P [32], StructGPT [33], SD+SA+Voting
[34], RAG+SP&DRC [35], C3 [6], FUSED [38], Dubo-SQL [40],
ICRL [41], ReFORCE [31]

Codex MBR-Exec [7], Coder-Reviewer [17], LEVER [8], QDecomp

Cloud API for ICL [10], Least-to-Most [10], ODIS [36]

PaLM, Gemini Series

SQL-PaLM [43], CHASE-SQL [45]

Single Enhanced CoT [1, 8, 30, 47], StructGPT [33], Least-to-Most [10], ODIS [36], ACT-SQL [37],
FUSED [38], POT [21], Gen-SQL [30], SAFE-SQL [42]
Multi-LLM/ Multi- Agent/
Ensemble Systems Coder-Reviewer [17], SD+SA+Voting [34], C3 [6], MAC-SQL [18], R³ [21], MCS-
SQL [25], CHESS [26], XiYan-SQL [51], MSc-SQL [54]

MBR-Exec [7], LEVER [8], Self-Debugging [46], DESEM+P [32], DIN-SQL [3],
Techniques External Refinement SQL-PaLM [43], RAG+SP&DRC [35], DAIL-SQL [9], DELLM [39], SQL-CRAFT
[21], FUXI [44], PURPLE [23], Dubo-SQL [40], SuperSQL [27], Distillery [28], E-
SQL [47], CHASE-SQL [45], RSL-SQL [29], ReFORCE [31]

Task Decomposition DIN-SQL [3], QDecomp [10], C3 [6], DEA-SQL [19], SGU-SQL [13], MetaSQL
[20], PET-SQL [22], TA-SQL [24], Distillery [28]

Specialized Training DELLM [39], ICRL [41], CodeS [11], Symbol-LLM [49], SQL-LLAMA [18],
StructLM [50], DTS-SQL [53], CLLMs [81], KaSLA [12], MSc-SQL [54], XiYan-
SQL [51], ROUTE [55]

Figure 1: Categorization of these methods and techniques, taking inspiration from [3]

3
user feedback and rule-based systems [9] ensure generated SQL queries are accurate and
executable. Task decomposition, using techniques like question decomposition [10] and multi-
step workflows (e.g., RESDSQL [57]), breaks complex T2SQL queries into manageable sub-
tasks, improving handling of intricate queries. LLMs with specialized training, such as those
pre-trained on SQL datasets like GRAPPA [58] or adapted with architectural changes [59],
enhance SQL and schema comprehension, boosting T2SQL performance.
1.1.2 Performance in LLM-Based Text-to-SQL Systems
Despite advancements, T2SQL systems face challenges in interpreting ambiguous user
intent, as natural language nuances are hard to map to SQL. Complex database schemas,
especially in cross-domain settings, and generating rare SQL constructs like nested subqueries
remain difficult due to limited training data [4], [9]. These issues highlight the need for research
in few-shot learning, domain adaptation, and external knowledge integration to enhance T2SQL
performance.
Benchmark datasets like Spider [60] (10,000+ cross-domain examples) and BIRD [4], along
with variants like Spider-DK [61], Spider-SYN [62], Spider-Realistic [15], and Spider 2.0 [16],
support T2SQL development. Metrics such as component matching, exact matching, execution
accuracy, and valid efficiency score [60], [63] ensure robust evaluation. However, challenges
like vague intent, complex schemas, and rare constructs persist, necessitating ongoing
innovation to meet real-world demands.
1.1.3 Security in LLM-Based Text-to-SQL Systems
LLM-based T2SQL systems, while advancing natural language database interaction,
introduce security risks due to direct database access [64], [65]. Key vulnerabilities include
access control bypass [66], SQL injection via backdoors/poisoning [67], schema inference
attacks [68], prompt injection [64], sensitive data disclosure [68], and DoS risks [65]. These
threats emphasize the need for robust security measures. Defense mechanisms targeting various
system lifecycle stages are critical, with common attack types, descriptions, and defenses
summarized in the accompanying table.
Table 1: Summarizing the attack types and their corresponding defense mechanisms
Attack Type Description Defense Mechanisms Limitations of
Defense
Prompt Manipulating the LLM - Input/Output Filtering & - May reduce
Injection [69] via crafted input Validation [69] flexibility or
prompts to override - Constrain Model limit
instructions or bypass Behavior [69] capabilities.
filters. - Human Approval

4
- Sandboxing
- Instruction Defense [64]
Backdoor / Compromising the - Training Data - Trigger
Data model during training by Monitoring/Validation detection is
Poisoning [69] injecting malicious [69] challenging.
data/triggers, causing - Input Filtering (trigger - Defenses like
harmful SQL generation detection) ONION show
upon activation. - Model Pruning [64] limitations [70]
- Adversarial Training
[67]
- Static Analysis
Schema Exploiting model - Limit Schema Info in - May decrease
Inference [68] responses to deduce the Prompt [68] SQL accuracy.
underlying database - Defensive Prompting - Defensive
schema without prior (Limited) [68] prompts have
knowledge. - Access Control on limited
Schema effectiveness [6
8]
SQL Injection The model generates - Output - Output
(Generated SQL containing Validation/Sanitization sanitization is
SQL) [69] executable malicious [69] essential.
code, often influenced - Least Privilege [69]. - Static analysis
by other attacks. - Static SQL Analysis may miss
(Limited) [67]. sophisticated
payloads [67].
Sensitive Info The model inadvertently - Data Sanitization [69] - DP might
Disclosure [69] outputs confidential - Differential Privacy (DP) reduce model
information from [64] utility.
training data or - Output Filtering [69]
interaction context. - Access Control
Excessive Granting the LLM or - Least Privilege - Requires
Agency [69] associated tools (Model/Extensions) [69] careful system
excessive permissions or - Human Approval [69] design.
autonomy beyond safe - Limit Extension
operational scope. Functionality [69]

5
- Complete Mediation
DoS / Exploiting the system to - Rate Limiting [69] - Balance abuse
Unbounded cause excessive resource - Input Validation prevention vs.
Consumption usage (computation, API (size/complexity) [69] legitimate use.
[69] calls), leading to service - Resource Allocation
degradation or high Management [69]
costs. - Timeouts/Throttling [69]
However, implementing security measures often involves a security-performance trade-off.
Additional checks can introduce latency [65], while certain defenses might degrade model
accuracy or flexibility[64]. Achieving an optimal balance between security and operational
efficiency remains a key challenge for the practical deployment of these powerful systems.
1.2. The necessity of the research
This research is developed to solve two main problems that exist in current Text-to-SQL
approaches:
 Addressing Security Deficiencies: Systematically identifying the under-explored
security vulnerabilities inherent in current Text-to-SQL systems and proposing novel
defense mechanisms to enhance system stability, security, and availability.
 Optimizing Practical Viability: Investigating and balancing the crucial trade-offs
between implementing robust security measures, maintaining high translation accuracy,
and achieving efficient performance to meet rigorous industrial standards for real-world
deployment.
1.3. Feasibility of Research
Research question and objectives
This study addresses two primary questions:
1. How can access control mechanisms enhance security in Text-to-SQL systems?
2. How can performance and security be balanced in Text-to-SQL systems?
The objective is to develop and evaluate a secure Text-to-SQL system that integrates access
control while maintaining computational efficiency.
Research design
An experimental approach is employed, structured in three phases:
1. System design and implementation incorporating RBAC and ABAC.
2. Performance optimization via query caching and schema filtering.
3. System evaluation using a custom dataset and simulated user scenarios.

6
Data collection methods:
Data collection will involve both quantitative and qualitative methods to evaluate the system's
security and performance. Specifically, three approaches will be used: (1) a custom benchmark
dataset of approximately 500 domain-specific questions, designed to test access control
mechanisms and complex query handling, addressing the limitations of existing datasets like
Spider and BIRD; this dataset will be publicly released. (2) Simulated user scenarios with
synthetic profiles (e.g., administrator, analyst, guest) to assess RBAC and ABAC in realistic
access control contexts. (3) Collection of quantitative performance metrics, such as query
execution time and security-induced latency, during system testing.
Data analysis
 Security evaluation through unauthorized query blocking rates and penetration testing.
 Performance analysis by comparing accuracy and latency against a baseline system.
Data sources: Primary data sources include the custom dataset and performance logs; user
scenarios serve as supplementary data.
Timeframe: Research is divided into three phases: initial survey, experimental development,
and final evaluation.
Risks and mitigation
 Technical integration challenges will be managed through iterative development.
 Dataset suitability will be ensured via rigorous testing.
 Realistic user scenarios will be designed by referencing real-world standards.
2. Research objectives
Our main research objectives include:
 Develop and propose a novel methodology for Text-to-SQL translation that
intrinsically integrates security considerations with high semantic accuracy.
 Design and implement specific defense mechanisms tailored to mitigate the identified
vulnerabilities in Text-to-SQL systems.
 Rigorously evaluate the effectiveness and precision of the proposed methodology and
defense mechanisms.
3. Research scope
The scope of this research is defined by the following boundaries:
 Attack Vector Focus: The investigation of security vulnerabilities and the development
of defense mechanisms will concentrate specifically on prompt injection attacks
executed through the natural language interface of the Text-to-SQL system. Other
potential attack vectors that do not directly leverage the manipulation of the underlying
Large Language Model (LLM) input are considered outside the scope of this work.

7
 SQL Statement Limitation: The Text-to-SQL system developed and evaluated within
this research will be constrained to generating SELECT statements only. The generation
of SQL statements that modify data (e.g., INSERT, UPDATE, DELETE) or perform
other database actions beyond data retrieval is explicitly excluded.
4. Approach and Method
Let 𝑄𝑟𝑎𝑤 be the raw natural language question from the user 𝑈. Let 𝑆 be the database
schema, and 𝐻 be the history of successfully answered (𝑄ℎ𝑖𝑠𝑡 , 𝑆𝑄𝐿ℎ𝑖𝑠𝑡 ) pairs. The process can
be formalized as follows:
4.1 Security Check Phase:
 Input validation: Apply a sanitization function 𝑆𝑎𝑛𝑖𝑡𝑖𝑧𝑒. Let 𝐼𝑠𝑀𝑎𝑙𝑖𝑐𝑖𝑜𝑢𝑠(𝑄𝑟𝑎𝑤 ) →
𝑇𝑟𝑢𝑒, 𝐹𝑎𝑙𝑠𝑒 detect threats:
𝑄𝑠𝑎𝑓𝑒 = 𝑆𝑎𝑛𝑛𝑖𝑡𝑖𝑧𝑒(𝑄𝑟𝑎𝑤 ) 𝑖𝑓 𝐼𝑠𝑀𝑎𝑙𝑖𝑐𝑖𝑜𝑢𝑠(𝑄𝑟𝑎𝑤 )
 Access control: Let 𝑃𝑒𝑟𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑠(𝑈) be the user 𝑈 allowed schema subset. Let
𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑𝑆𝑐ℎ𝑒𝑚𝑎(𝑄𝑠𝑎𝑓𝑒 ) ⊆ 𝑆 be the estimated schema needed. Define

𝐶ℎ𝑒𝑐𝑘𝐴𝑐𝑐𝑒𝑠𝑠(𝑈, 𝑄𝑠𝑎𝑓𝑒 ) = (𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑𝑆𝑐ℎ𝑒𝑚𝑎(𝑄𝑠𝑎𝑓𝑒 ) ⊆ 𝑃𝑒𝑟𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑠(𝑈)).

 If 𝐶ℎ𝑒𝑐𝑘𝐴𝑐𝑐𝑒𝑠𝑠(𝑈, 𝑄𝑠𝑎𝑓𝑒 ) is False, reject. Otherwise, 𝑄𝑖𝑛 = 𝑄𝑠𝑎𝑓𝑒 .

4.2 Pre-retrieval Phase:
 Similar Question Retrieval: Define a similarity metric 𝑆𝑖𝑚(𝑄1 , 𝑄2 ). Retrieve the set
𝐸 = argmax ∑(𝑄,𝑆𝑄𝐿)∈𝐻 ′ 𝑆𝑖𝑚(𝑄1 , 𝑄2 )
𝐻 ′ ⊂𝐻,|𝐻 ′ |=𝑘

 Schema Identification: Identify relevant schema 𝑆𝑟𝑒𝑙 = (𝑇𝑟𝑒𝑙 , 𝐶𝑟𝑒𝑙 ) =

𝐼𝑑𝑒𝑛𝑡𝑖𝑓𝑦𝑆𝑐ℎ𝑒𝑚𝑎(𝑄𝑖𝑛, 𝑆) where 𝑆𝑟𝑒𝑙 ⊆ 𝑆.
 Value matching: Extract 𝐾𝑒𝑦𝑤𝑜𝑟𝑑𝑠(𝑄𝑖𝑛 ). Match against values in 𝑆𝑟𝑒𝑙 using
𝑀𝑎𝑡𝑐ℎ𝑉𝑎𝑙𝑢𝑒(𝐾𝑒𝑦𝑤𝑜𝑟𝑑𝑠(𝑄𝑖𝑛 ), 𝑉𝑎𝑙𝑢𝑒𝑠(𝑆𝑟𝑒𝑙 )) → 𝑀, where 𝑀 is a set of keywords to
schema/value mappings.
4.3 Generate SQL phase:
Task decomposition: Decompose the query into logical steps: 𝐿 = 𝑙1 , … , 𝑙𝑛 =
𝐷𝑒𝑐𝑜𝑚𝑝𝑜𝑠𝑒(𝑄𝑖𝑛 ).
Query Generation: Generate the SQL query using a function 𝐺:
𝑆𝑄𝐿𝑔𝑒𝑛 = 𝐺(𝑄𝑖𝑛 , 𝐿, 𝑆𝑟𝑒𝑙 , 𝐸, 𝑀)
Error validation: Validate the generated SQL: 𝐼𝑠𝑉𝑎𝑙𝑖𝑑 = 𝑉𝑎𝑙𝑖𝑑𝑎𝑡𝑒𝑆𝑄𝐿(𝑆𝑄𝐿𝑔𝑒𝑛 , 𝑆).
Query execution: If 𝐼𝑠𝑉𝑎𝑙𝑖𝑑 execute the query 𝑅𝑟𝑜𝑤 = 𝐸𝑥𝑒𝑐𝑢𝑡𝑒(𝑆𝑄𝐿𝑔𝑒𝑛 , 𝐷𝐵).
Otherwise, handle the error. Let 𝑆𝑄𝐿𝑓𝑖𝑛𝑎𝑙 = 𝑆𝑄𝐿𝑔𝑒𝑛 if 𝐼𝑠𝑉𝑎𝑙𝑖𝑑.
4.4 Summary & Output Phase:

8
Apply. Let 𝑅𝑝𝑟𝑜𝑐 = 𝑅𝑠𝑢𝑚 if summarization is applied, else 𝑅𝑝𝑟𝑜𝑐 = 𝑅𝑟𝑎𝑤 .
Output Validation: Check the processed result: 𝐼𝑠𝑃𝑎𝑠𝑠 = 𝑉𝑎𝑙𝑖𝑑𝑎𝑡𝑒𝑂𝑢𝑡𝑝𝑢𝑡(𝑅𝑝𝑟𝑜𝑐 ).
Final Answer: If 𝐼𝑠𝑃𝑎𝑠𝑠, 𝐴𝑛𝑠𝑤𝑒𝑟 = 𝑅𝑝𝑟𝑜𝑐 . Otherwise, handle the error.
Security check Pre-retrieval phase Generate SQL phase

Access control
Access user basic Retrieve similar
Break down task Generate query
information question in history

Question Input validation

Identify relevant
Output Value matching Error validation Execute query
Answer table and column
validation

Summary phase

Summarization Database

Figure 2: Our proposed architecture

4.5 Evaluation Phase:

 Exact Matching (EM):
1
𝐸𝑀 = 𝑁 ∑𝑁 𝑖 𝑖
𝑖=1 𝐼(𝑆𝑄𝐿gen = 𝑆𝑄𝐿true )

where 𝐼 is the indicator function, returning 1 if the generated SQL matches the ground
truth exactly, and 0 otherwise.
 Exact Accuracy (EA):
1
𝐸𝐴 = 𝑁 ∑𝑁 𝑖 𝑖 𝑖
𝑖=1 𝐼 (𝑅raw = 𝑅true ∧ IsValid(𝑆𝑄𝐿gen ))

𝑖 𝑖
where 𝑅raw is compared to 𝑅true , and IsValid(𝑆𝑄𝐿𝑖gen ) ensures the query is
syntactically correct and executable
 Security Compliance (SC):
1 𝑖
𝑆𝐶 = 𝑁 ∑𝑁
𝑖=1 𝐼(CheckAccess(𝑈, 𝑄safe ) = True)

𝑖 𝑖
where CheckAccess(𝑈, 𝑄safe ) = (RequiredSchema(𝑄safe ) ⊆ Permissions(𝑈))

ensuring the query respects user permissions.

5. Research plan
Table 2: Research Plan
No. Date Task Output Person in charge
1 17-22/4 Survey existing Text-to- A comprehensive survey Full team
SQL research about existing

9
approaches and current
problems
2 22-29/4 Create a data test that A well-defined dataset Full team
contains both normal that can be used to
questions and malicious evaluate the proposed
questions method carefully
3 27-17/5 Experiment with multiple A defense mechanism Full team
defense strategies against that can work well under
attacks various attacks
4 27-17/5 Build a pre-retrieval An effective pre- Full team
method to process the retrieval method that can
question support the precision of
the whole system
5 15/5-2/7 Construct a methodology A method that meets the Full team
to generate an accurate requirements for
SQL query and generating a usable and
summarize it for the user safe SQL query
6 1-9/7 Evaluate the result and A detailed result about Full team
draw a conclusion the performance and the
safety of the proposed
method
7 10-14/7 Write the final report A full report that is Full team
ready to submit
6. Expected results
This research is anticipated to yield the following key outcomes:
• A Novel Methodology: A validated Text-to-SQL approach integrating robust
security (anti-prompt injection) and high semantic accuracy for reliable systems.
• Enhanced Security: Practical defenses against prompt injection, reducing data
breach risks and boosting trust in natural language database interfaces.
• Improved Reliability and Usability: More secure and accurate Text-to-SQL systems,
enhancing usability for non-experts and supporting confident real-world
deployment.

10
References
[1] Z. Hong et al., “Next-Generation Database Interfaces: A Survey of LLM-based
Text-to-SQL,” Mar. 13, 2025, arXiv: arXiv:2406.08426. doi:
10.48550/arXiv.2406.08426.
[2] T. Yu et al., “Spider: A Large-Scale Human-Labeled Dataset for Complex and
Cross-Domain Semantic Parsing and Text-to-SQL Task,” Feb. 02, 2019, arXiv:
arXiv:1809.08887. doi: 10.48550/arXiv.1809.08887.
[3] M. Pourreza and D. Rafiei, “DIN-SQL: Decomposed In-Context Learning of
Text-to-SQL with Self-Correction,” Nov. 02, 2023, arXiv: arXiv:2304.11015. doi:
10.48550/arXiv.2304.11015.
[4] J. Li et al., “Can LLM Already Serve as A Database Interface? A BIg Bench for
Large-Scale Database Grounded Text-to-SQLs,” Adv. Neural Inf. Process. Syst., vol.
36, pp. 42330–42357, Dec. 2023.
[5] L. Wang et al., “Proton: Probing Schema Linking Information from Pre-trained
Language Models for Text-to-SQL Parsing,” Aug. 06, 2022, arXiv: arXiv:2206.14017.
doi: 10.48550/arXiv.2206.14017.
[6] X. Dong et al., “C3: Zero-shot Text-to-SQL with ChatGPT,” Jul. 14, 2023,
arXiv: arXiv:2307.07306. doi: 10.48550/arXiv.2307.07306.
[7] F. Shi, D. Fried, M. Ghazvininejad, L. Zettlemoyer, and S. I. Wang, “Natural
Language to Code Translation with Execution,” Nov. 01, 2022, arXiv:
arXiv:2204.11454. doi: 10.48550/arXiv.2204.11454.
[8] A. Ni et al., “LEVER: Learning to Verify Language-to-Code Generation with
Execution,” in Proceedings of the 40th International Conference on Machine
Learning, PMLR, Jul. 2023, pp. 26106–26128. Accessed: Apr. 24, 2025. [Online].
Available: https://proceedings.mlr.press/v202/ni23b.html
[9] D. Gao et al., “Text-to-SQL Empowered by Large Language Models: A
Benchmark Evaluation,” Nov. 20, 2023, arXiv: arXiv:2308.15363. doi:
10.48550/arXiv.2308.15363.
[10] C.-Y. Tai, Z. Chen, T. Zhang, X. Deng, and H. Sun, “Exploring Chain-of-
Thought Style Prompting for Text-to-SQL,” Oct. 27, 2023, arXiv: arXiv:2305.14215.
doi: 10.48550/arXiv.2305.14215.
[11] H. Li et al., “CodeS: Towards Building Open-source Language Models for
Text-to-SQL,” Feb. 26, 2024, arXiv: arXiv:2402.16347. doi:
10.48550/arXiv.2402.16347.
[12] Z. Yuan, H. Chen, Z. Hong, Q. Zhang, F. Huang, and X. Huang, “Knapsack
Optimization-based Schema Linking for LLM-based Text-to-SQL Generation,” Feb.
18, 2025, arXiv: arXiv:2502.12911. doi: 10.48550/arXiv.2502.12911.
[13] Q. Zhang, J. Dong, H. Chen, W. Li, F. Huang, and X. Huang, “Structure
Guided Large Language Model for SQL Generation,” Mar. 27, 2024, arXiv:
arXiv:2402.13284. doi: 10.48550/arXiv.2402.13284.
[14] L. Wang et al., “DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL
Dataset,” in Proceedings of the 2020 Conference on Empirical Methods in Natural
Language Processing (EMNLP), B. Webber, T. Cohn, Y. He, and Y. Liu, Eds.,
Online: Association for Computational Linguistics, Oct. 2020, pp. 6923–6935. doi:
10.18653/v1/2020.emnlp-main.562.
[15] X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson,
“Structure-Grounded Pretraining for Text-to-SQL,” in Proceedings of the 2021
Conference of the North American Chapter of the Association for Computational

11
Linguistics: Human Language Technologies, 2021, pp. 1337–1350. doi:
10.18653/v1/2021.naacl-main.105.
[16] F. Lei et al., “Spider 2.0: Evaluating Language Models on Real-World
Enterprise Text-to-SQL Workflows,” Mar. 17, 2025, arXiv: arXiv:2411.07763. doi:
10.48550/arXiv.2411.07763.
[17] T. Zhang et al., “Coder Reviewer Reranking for Code Generation,” in
Proceedings of the 40th International Conference on Machine Learning, PMLR, Jul.
2023, pp. 41832–41846. Accessed: Apr. 24, 2025. [Online]. Available:
https://proceedings.mlr.press/v202/zhang23av.html
[18] B. Wang et al., “MAC-SQL: A Multi-Agent Collaborative Framework for
Text-to-SQL,” Mar. 18, 2025, arXiv: arXiv:2312.11242. doi:
10.48550/arXiv.2312.11242.
[19] Y. Xie et al., “Decomposition for Enhancing Attention: Improving LLM-based
Text-to-SQL through Workflow Paradigm,” Jul. 03, 2024, arXiv: arXiv:2402.10671.
doi: 10.48550/arXiv.2402.10671.
[20] Y. Fan et al., “Metasql: A Generate-Then-Rank Framework for Natural
Language to SQL Translation,” in 2024 IEEE 40th International Conference on Data
Engineering (ICDE), May 2024, pp. 1765–1778. doi:
10.1109/ICDE60146.2024.00143.
[21] H. Xia et al., “$R^3$: ‘This is My SQL, Are You With Me?’ A Consensus-
Based Multi-Agent System for Text-to-SQL Tasks,” Feb. 01, 2024, arXiv. doi:
10.48550/arXiv.2402.14851.
[22] Z. Li et al., “PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-
to-SQL with Cross-consistency,” Mar. 01, 2024, arXiv. doi:
10.48550/arXiv.2403.09732.
[23] T. Ren et al., “PURPLE: Making a Large Language Model a Better SQL
Writer,” in 2024 IEEE 40th International Conference on Data Engineering (ICDE),
May 2024, pp. 15–28. doi: 10.1109/ICDE60146.2024.00009.
[24] G. Qu et al., “Before Generation, Align it! A Novel and Effective Strategy for
Mitigating Hallucinations in Text-to-SQL Generation,” May 24, 2024, arXiv:
arXiv:2405.15307. doi: 10.48550/arXiv.2405.15307.
[25] D. Lee, C. Park, J. Kim, and H. Park, “MCS-SQL: Leveraging Multiple
Prompts and Multiple-Choice Selection For Text-to-SQL Generation,” May 13, 2024,
arXiv: arXiv:2405.07467. doi: 10.48550/arXiv.2405.07467.
[26] S. Talaei, M. Pourreza, Y.-C. Chang, A. Mirhoseini, and A. Saberi, “CHESS:
Contextual Harnessing for Efficient SQL Synthesis,” Nov. 25, 2024, arXiv:
arXiv:2405.16755. doi: 10.48550/arXiv.2405.16755.
[27] B. Li, Y. Luo, C. Chai, G. Li, and N. Tang, “The Dawn of Natural Language to
SQL: Are We Fully Ready?,” Proc. VLDB Endow., vol. 17, no. 11, pp. 3318–3331,
Jul. 2024, doi: 10.14778/3681954.3682003.
[28] K. Maamari, F. Abubaker, D. Jaroslawicz, and A. Mhedhbi, “The Death of
Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models,” Aug.
18, 2024, arXiv: arXiv:2408.07702. doi: 10.48550/arXiv.2408.07702.
[29] Z. Cao, Y. Zheng, Z. Fan, X. Zhang, W. Chen, and X. Bai, “RSL-SQL: Robust
Schema Linking in Text-to-SQL Generation,” Nov. 26, 2024, arXiv:
arXiv:2411.00073. doi: 10.48550/arXiv.2411.00073.
[30] J. Shi et al., “Gen-SQL: Efficient Text-to-SQL By Bridging Natural Language
Question And Database Schema With Pseudo-Schema,” in Proceedings of the 31st

12
International Conference on Computational Linguistics, O. Rambow, L. Wanner, M.
Apidianaki, H. Al-Khalifa, B. D. Eugenio, and S. Schockaert, Eds., Abu Dhabi, UAE:
Association for Computational Linguistics, Jan. 2025, pp. 3794–3807. Accessed: Apr.
24, 2025. [Online]. Available: https://aclanthology.org/2025.coling-main.256/
[31] M. Deng et al., “ReFoRCE: A Text-to-SQL Agent with Self-Refinement,
Format Restriction, and Column Exploration,” Apr. 11, 2025, arXiv:
arXiv:2502.00675. doi: 10.48550/arXiv.2502.00675.
[32] C. Guo et al., “Prompting GPT-3.5 for Text-to-SQL with De-semanticization
and Skeleton Retrieval,” in PRICAI 2023: Trends in Artificial Intelligence, F. Liu, A.
A. Sadanandan, D. N. Pham, P. Mursanto, and D. Lukose, Eds., Singapore: Springer
Nature, 2024, pp. 262–274. doi: 10.1007/978-981-99-7022-3_23.
[33] J. Jiang, K. Zhou, Z. Dong, K. Ye, W. X. Zhao, and J.-R. Wen, “StructGPT: A
General Framework for Large Language Model to Reason over Structured Data,” Oct.
23, 2023, arXiv: arXiv:2305.09645. doi: 10.48550/arXiv.2305.09645.
[34] L. Nan et al., “Enhancing Text-to-SQL Capabilities of Large Language Models:
A Study on Prompt Design Strategies,” in Findings of the Association for
Computational Linguistics: EMNLP 2023, H. Bouamor, J. Pino, and K. Bali, Eds.,
Singapore: Association for Computational Linguistics, Oct. 2023, pp. 14935–14956.
doi: 10.18653/v1/2023.findings-emnlp.996.
[35] C. Guo et al., “Retrieval-Augmented GPT-3.5-Based Text-to-SQL Framework
with Sample-Aware Prompting and Dynamic Revision Chain,” in Neural Information
Processing, B. Luo, L. Cheng, Z.-G. Wu, H. Li, and C. Li, Eds., Singapore: Springer
Nature, 2024, pp. 341–356. doi: 10.1007/978-981-99-8076-5_25.
[36] S. Chang and E. Fosler-Lussier, “Selective Demonstrations for Cross-domain
Text-to-SQL,” Oct. 10, 2023, arXiv: arXiv:2310.06302. doi:
10.48550/arXiv.2310.06302.
[37] H. Zhang, R. Cao, L. Chen, H. Xu, and K. Yu, “ACT-SQL: In-Context
Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought,” Oct. 26,
2023, arXiv: arXiv:2310.17342. doi: 10.48550/arXiv.2310.17342.
[38] D. Wang, L. Dou, X. Zhang, Q. Zhu, and W. Che, “Improving Demonstration
Diversity by Human-Free Fusing for Text-to-SQL,” Jun. 26, 2024, arXiv:
arXiv:2402.10663. doi: 10.48550/arXiv.2402.10663.
[39] Z. Hong, Z. Yuan, H. Chen, Q. Zhang, F. Huang, and X. Huang, “Knowledge-
to-SQL: Enhancing SQL Generation with Data Expert LLM,” Jun. 06, 2024, arXiv:
arXiv:2402.11517. doi: 10.48550/arXiv.2402.11517.
[40] D. G. Thorpe, A. J. Duberstein, and I. A. Kinsey, “Dubo-SQL: Diverse
Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL,” Apr. 19, 2024,
arXiv: arXiv:2404.12560. doi: 10.48550/arXiv.2404.12560.
[41] R. Toteja, A. Sarkar, and P. M. Comar, “In-Context Reinforcement Learning
with Retrieval-Augmented Generation for Text-to-SQL,” in Proceedings of the 31st
International Conference on Computational Linguistics, O. Rambow, L. Wanner, M.
Apidianaki, H. Al-Khalifa, B. D. Eugenio, and S. Schockaert, Eds., Abu Dhabi, UAE:
Association for Computational Linguistics, Jan. 2025, pp. 10390–10397. Accessed:
Apr. 24, 2025. [Online]. Available: https://aclanthology.org/2025.coling-main.692/
[42] J. Lee, I. Baek, B. Kim, and H. Lee, “SAFE-SQL: Self-Augmented In-Context
Learning with Fine-grained Example Selection for Text-to-SQL,” Feb. 17, 2025,
arXiv: arXiv:2502.11438. doi: 10.48550/arXiv.2502.11438.

13
[43] R. Sun et al., “SQL-PaLM: Improved Large Language Model Adaptation for
Text-to-SQL (extended),” Mar. 30, 2024, arXiv: arXiv:2306.00739. doi:
10.48550/arXiv.2306.00739.
[44] Y. Gu et al., “Middleware for LLMs: Tools Are Instrumental for Language
Agents in Complex Environments,” Oct. 04, 2024, arXiv: arXiv:2402.14672. doi:
10.48550/arXiv.2402.14672.
[45] M. Pourreza et al., “CHASE-SQL: Multi-Path Reasoning and Preference
Optimized Candidate Selection in Text-to-SQL,” Oct. 02, 2024, arXiv:
arXiv:2410.01943. doi: 10.48550/arXiv.2410.01943.
[46] X. Chen, M. Lin, N. Schärli, and D. Zhou, “Teaching Large Language Models
to Self-Debug,” Oct. 05, 2023, arXiv: arXiv:2304.05128. doi:
10.48550/arXiv.2304.05128.
[47] H. A. Caferoğlu and Ö. Ulusoy, “E-SQL: Direct Schema Linking via Question
Enrichment in Text-to-SQL,” Jan. 28, 2025, arXiv: arXiv:2409.16751. doi:
10.48550/arXiv.2409.16751.
[48] S. Kou, L. Hu, Z. He, Z. Deng, and H. Zhang, “CLLMs: Consistency Large
Language Models,” presented at the Forty-first International Conference on Machine
Learning, Jun. 2024. Accessed: Apr. 24, 2025. [Online]. Available:
https://openreview.net/forum?id=8uzBOVmh8H
[49] F. Xu et al., “Symbol-LLM: Towards Foundational Symbol-centric Interface
For Large Language Models,” Feb. 18, 2024, arXiv: arXiv:2311.09278. doi:
10.48550/arXiv.2311.09278.
[50] A. Zhuang et al., “StructLM: Towards Building Generalist Models for
Structured Knowledge Grounding,” Oct. 07, 2024, arXiv: arXiv:2402.16671. doi:
10.48550/arXiv.2402.16671.
[51] Y. Gao et al., “A Preview of XiYan-SQL: A Multi-Generator Ensemble
Framework for Text-to-SQL,” 2024, arXiv. doi: 10.48550/ARXIV.2411.08599.
[52] B. Rozière et al., “Code Llama: Open Foundation Models for Code,” Jan. 31,
2024, arXiv: arXiv:2308.12950. doi: 10.48550/arXiv.2308.12950.
[53] M. Pourreza and D. Rafiei, “DTS-SQL: Decomposed Text-to-SQL with Small
Large Language Models,” Feb. 02, 2024, arXiv: arXiv:2402.01117. doi:
10.48550/arXiv.2402.01117.
[54] S. K. Gorti et al., “MSc-SQL: Multi-Sample Critiquing Small Language
Models For Text-To-SQL Translation,” Feb. 16, 2025, arXiv: arXiv:2410.12916. doi:
10.48550/arXiv.2410.12916.
[55] Y. Qin et al., “ROUTE: Robust Multitask Tuning and Collaboration for Text-
to-SQL,” Dec. 13, 2024, arXiv: arXiv:2412.10138. doi: 10.48550/arXiv.2412.10138.
[56] G. Katsogiannis-Meimarakis and G. Koutrika, “A survey on deep learning
approaches for text-to-SQL,” VLDB J., vol. 32, no. 4, pp. 905–936, Jul. 2023, doi:
10.1007/s00778-022-00776-8.
[57] H. Li, J. Zhang, C. Li, and H. Chen, “RESDSQL: Decoupling Schema Linking
and Skeleton Parsing for Text-to-SQL,” Proc. AAAI Conf. Artif. Intell., vol. 37, no. 11,
Art. no. 11, Jun. 2023, doi: 10.1609/aaai.v37i11.26535.
[58] T. Yu et al., “GraPPa: Grammar-Augmented Pre-Training for Table Semantic
Parsing,” May 29, 2021, arXiv: arXiv:2009.13845. doi: 10.48550/arXiv.2009.13845.
[59] B. Hui et al., “Improving Text-to-SQL with Schema Dependency Learning,”
Dec. 10, 2021, arXiv: arXiv:2103.04399. doi: 10.48550/arXiv.2103.04399.

14
[60] T. Yu et al., “Spider: A Large-Scale Human-Labeled Dataset for Complex and
Cross-Domain Semantic Parsing and Text-to-SQL Task,” Feb. 02, 2019, arXiv:
arXiv:1809.08887. doi: 10.48550/arXiv.1809.08887.
[61] Y. Gan, X. Chen, and M. Purver, “Exploring Underexplored Limitations of
Cross-Domain Text-to-SQL Generalization,” Sep. 11, 2021, arXiv: arXiv:2109.05157.
doi: 10.48550/arXiv.2109.05157.
[62] Y. Gan et al., “Towards Robustness of Text-to-SQL Models against Synonym
Substitution,” Jun. 19, 2021, arXiv: arXiv:2106.01065. doi:
10.48550/arXiv.2106.01065.
[63] V. Zhong, C. Xiong, and R. Socher, “Seq2SQL: Generating Structured Queries
from Natural Language using Reinforcement Learning,” Nov. 09, 2017, arXiv:
arXiv:1709.00103. doi: 10.48550/arXiv.1709.00103.
[64] B. C. Das, M. H. Amini, and Y. Wu, “Security and Privacy Challenges of Large
Language Models: A Survey,” Nov. 14, 2024, arXiv: arXiv:2402.00888. doi:
10.48550/arXiv.2402.00888.
[65] X. Peng, Y. Zhang, J. Yang, and M. Stevenson, “On the Security
Vulnerabilities of Text-to-SQL Models,” May 11, 2024, arXiv: arXiv:2211.15363. doi:
10.48550/arXiv.2211.15363.
[66] P. Subramaniam and S. Krishnan, “Intent-Based Access Control: Using LLMs
to Intelligently Manage Access Control,” Aug. 06, 2024, arXiv: arXiv:2402.07332.
doi: 10.48550/arXiv.2402.07332.
[67] M. Lin et al., “ToxicSQL: Migrating SQL Injection Threats into Text-to-SQL
Models via Backdoor Attack,” Apr. 03, 2025, arXiv: arXiv:2503.05445. doi:
10.48550/arXiv.2503.05445.
[68] Đ. Klisura and A. Rios, “Unmasking Database Vulnerabilities: Zero-
Knowledge Schema Inference Attacks in Text-to-SQL Systems,” Oct. 17, 2024, arXiv:
arXiv:2406.14545. doi: 10.48550/arXiv.2406.14545.
[69] “OWASP Top 10 for Large Language Model Applications | OWASP
Foundation.” Accessed: Apr. 23, 2025. [Online]. Available: https://owasp.org/www-
project-top-10-for-large-language-model-applications/
[70] J. Zhang, Y. Zhou, B. Hui, Y. Liu, Z. Li, and S. Hu, “TrojanSQL: SQL
Injection against Natural Language Interface to Database,” in Proceedings of the 2023
Conference on Empirical Methods in Natural Language Processing, H. Bouamor, J.
Pino, and K. Bali, Eds., Singapore: Association for Computational Linguistics, Oct.
2023, pp. 4344–4359. doi: 10.18653/v1/2023.emnlp-main.264.

SQLPaLM
No ratings yet
SQLPaLM
61 pages
A Survey On Text-to-SQL Parsing: Concepts, Methods, and Future Directions
No ratings yet
A Survey On Text-to-SQL Parsing: Concepts, Methods, and Future Directions
19 pages
T2S Retrieval
No ratings yet
T2S Retrieval
16 pages
SQL-CB-GuArd: a deep learning mechanism for structured query language injection attack detection
No ratings yet
SQL-CB-GuArd: a deep learning mechanism for structured query language injection attack detection
13 pages
Structure-guided Large Language Models For
No ratings yet
Structure-guided Large Language Models For
24 pages
LLM Based Survey Text 1741015993
No ratings yet
LLM Based Survey Text 1741015993
20 pages
LLM BASED TEXT TO SQL
No ratings yet
LLM BASED TEXT TO SQL
9 pages
13657_Spider_2_0_Can_Language_
No ratings yet
13657_Spider_2_0_Can_Language_
45 pages
2405.16755v2
No ratings yet
2405.16755v2
39 pages
Online Voting[4]
No ratings yet
Online Voting[4]
47 pages
MAC-SQL
No ratings yet
MAC-SQL
18 pages
From Natural Language to SQL Review Of
No ratings yet
From Natural Language to SQL Review Of
15 pages
research paper
No ratings yet
research paper
32 pages
Nl2sql
No ratings yet
Nl2sql
12 pages
2403.09732v4
No ratings yet
2403.09732v4
15 pages
DTS-SQL
No ratings yet
DTS-SQL
9 pages
Solid-SQL Enhanced Schema-linking Based in-context Learning For
No ratings yet
Solid-SQL Enhanced Schema-linking Based in-context Learning For
11 pages
Chase SQL
No ratings yet
Chase SQL
30 pages
Lucy: Think and Reason To Solve Text-to-SQL: Nina Narodytska Shay Vargaftik
No ratings yet
Lucy: Think and Reason To Solve Text-to-SQL: Nina Narodytska Shay Vargaftik
33 pages
2308.15363v4
No ratings yet
2308.15363v4
22 pages
Arxiv.1701.00631
No ratings yet
Arxiv.1701.00631
16 pages
2406.08426v3
No ratings yet
2406.08426v3
18 pages
Dbms Lab El Report
No ratings yet
Dbms Lab El Report
20 pages
STaR SQL Self Taught Reasoner for Text to SQL
No ratings yet
STaR SQL Self Taught Reasoner for Text to SQL
11 pages
Recent Advances in Text-To-SQL- A Survey of What We Have and What We Expect
No ratings yet
Recent Advances in Text-To-SQL- A Survey of What We Have and What We Expect
22 pages
Recent Advances in Text to SQL
No ratings yet
Recent Advances in Text to SQL
22 pages
03-01-2024_Jr.JEE APEX & NEET WISDOM_TERM-2_IP_(CBSE)Answer Key
No ratings yet
03-01-2024_Jr.JEE APEX & NEET WISDOM_TERM-2_IP_(CBSE)Answer Key
20 pages
DBMS Report
No ratings yet
DBMS Report
24 pages
Chess Contextual Harnessing for Efficient SQL Synthesis
No ratings yet
Chess Contextual Harnessing for Efficient SQL Synthesis
41 pages
2303.07351v1
No ratings yet
2303.07351v1
16 pages
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
No ratings yet
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL
17 pages
3583140.3583165
No ratings yet
3583140.3583165
14 pages
Syntax_and_Relation_Enhanced_Query_Generation_for_
No ratings yet
Syntax_and_Relation_Enhanced_Query_Generation_for_
12 pages
Enhancing Text-To-SQL Translation for Financial System Design
No ratings yet
Enhancing Text-To-SQL Translation for Financial System Design
11 pages
Project Report -7_merged
No ratings yet
Project Report -7_merged
46 pages
Paper_ppt
No ratings yet
Paper_ppt
14 pages
Dusql
No ratings yet
Dusql
13 pages
Base paper
No ratings yet
Base paper
10 pages
IJCTT-V72I12P103
No ratings yet
IJCTT-V72I12P103
8 pages
2.1 Review of Literature: "SQL Generation and PL/SQL Execution From Natural Language Processing"
No ratings yet
2.1 Review of Literature: "SQL Generation and PL/SQL Execution From Natural Language Processing"
11 pages
Sequential_Feature_Augmentation_for_Robust_Text-to-SQL
No ratings yet
Sequential_Feature_Augmentation_for_Robust_Text-to-SQL
7 pages
A Natural Language Interface To Relational Databases Using An Online Analytic Processing Hypercube
No ratings yet
A Natural Language Interface To Relational Databases Using An Online Analytic Processing Hypercube
18 pages
3.1 Purpose
No ratings yet
3.1 Purpose
10 pages
24_Data_Centric_Text_to_SQL_wi
No ratings yet
24_Data_Centric_Text_to_SQL_wi
6 pages
ADBMS Assignment-UE213076
No ratings yet
ADBMS Assignment-UE213076
8 pages
Large Language Model Enhanced Text-to-SQL Generation- A Survey
No ratings yet
Large Language Model Enhanced Text-to-SQL Generation- A Survey
18 pages
1.1 Overview
No ratings yet
1.1 Overview
4 pages
An Algorithm To Transform Natural Languages To SQL Queries For Relational Databases
No ratings yet
An Algorithm To Transform Natural Languages To SQL Queries For Relational Databases
7 pages
LANLI: A Natural Language Interfacing Tool For Relational Database Query Generation
No ratings yet
LANLI: A Natural Language Interfacing Tool For Relational Database Query Generation
14 pages
Formation of SQL From Natural Language Query Using NLP: Uma M Sneha V Sneha G
No ratings yet
Formation of SQL From Natural Language Query Using NLP: Uma M Sneha V Sneha G
5 pages
Natural Language Processing With Some Abbreviation To SQL
No ratings yet
Natural Language Processing With Some Abbreviation To SQL
5 pages
Llm model transform for short term trading on commodity
No ratings yet
Llm model transform for short term trading on commodity
7 pages
Computer Engineering Curriculum uniben
No ratings yet
Computer Engineering Curriculum uniben
24 pages
Intership Report (1)
No ratings yet
Intership Report (1)
41 pages
NLP To SQL
No ratings yet
NLP To SQL
1 page
Semantic Parsing For Complex Data Retrieval: Targeting Query Plans vs. SQL For No-Code Access To Relational Databases
No ratings yet
Semantic Parsing For Complex Data Retrieval: Targeting Query Plans vs. SQL For No-Code Access To Relational Databases
17 pages
A New System For Database Query Generator
No ratings yet
A New System For Database Query Generator
4 pages
EC1 M2 applied syllabus
No ratings yet
EC1 M2 applied syllabus
102 pages
Data Modeling and T-SQL: Meetings / Methodology
No ratings yet
Data Modeling and T-SQL: Meetings / Methodology
13 pages
LLM Based TXT To SQL
No ratings yet
LLM Based TXT To SQL
18 pages
Unit 1(MLT) Lecture Notes 1 Unit 1mlt Lecture Notes 1
No ratings yet
Unit 1(MLT) Lecture Notes 1 Unit 1mlt Lecture Notes 1
18 pages
defog.ai_blog_open-sourcing-sqleval
No ratings yet
defog.ai_blog_open-sourcing-sqleval
11 pages
ai-900 (1)
No ratings yet
ai-900 (1)
168 pages
Credit Card Fraud as a Classification Problem Minhal New
No ratings yet
Credit Card Fraud as a Classification Problem Minhal New
48 pages
sentiment analysis
No ratings yet
sentiment analysis
77 pages
BatteryML paper
No ratings yet
BatteryML paper
22 pages
ML LAB manual-1
No ratings yet
ML LAB manual-1
33 pages
Unit 4
No ratings yet
Unit 4
23 pages
Bias and Variance in Machine Learning - Javatpoint
100% (2)
Bias and Variance in Machine Learning - Javatpoint
6 pages
AI-Based Phishing Detection Techniques
No ratings yet
AI-Based Phishing Detection Techniques
15 pages
ajol-file-journals_387_articles_263414_65b236d58cc5e
No ratings yet
ajol-file-journals_387_articles_263414_65b236d58cc5e
8 pages
Data Mining Unit 3 Classification-1
No ratings yet
Data Mining Unit 3 Classification-1
24 pages
21ZC63 - Industrial Visit and Technical Seminar
No ratings yet
21ZC63 - Industrial Visit and Technical Seminar
15 pages
ET - Project Presentation Solution
No ratings yet
ET - Project Presentation Solution
29 pages
Transfer Learning With Self-Supervised Vision Transformer For Large-Scale Plant Identification
No ratings yet
Transfer Learning With Self-Supervised Vision Transformer For Large-Scale Plant Identification
15 pages
NNDL Assignment-2 Report
No ratings yet
NNDL Assignment-2 Report
9 pages
2 PB
No ratings yet
2 PB
10 pages
Scopus Cited Journals
No ratings yet
Scopus Cited Journals
7 pages
Deep Learning in Solving Mathematical Equations
No ratings yet
Deep Learning in Solving Mathematical Equations
14 pages
10 AI Project Cycle Questions and Answers
No ratings yet
10 AI Project Cycle Questions and Answers
8 pages
A Vietnamese Dataset For Evaluating Machine Readin
No ratings yet
A Vietnamese Dataset For Evaluating Machine Readin
12 pages
Icstcee49637 2020 9277041
No ratings yet
Icstcee49637 2020 9277041
4 pages
Jet Engine Degradation Prediction Project
No ratings yet
Jet Engine Degradation Prediction Project
2 pages
Jise 202302 14
No ratings yet
Jise 202302 14
17 pages
Approved l7 Comp7067 2023-24 Sub Brief
No ratings yet
Approved l7 Comp7067 2023-24 Sub Brief
7 pages
ICICC - 2023 - Without Ref For Plag
No ratings yet
ICICC - 2023 - Without Ref For Plag
4 pages
Water Bottle Defect Detection System Using Convolutional Neural Network
No ratings yet
Water Bottle Defect Detection System Using Convolutional Neural Network
6 pages
ML-Lab Manual - NEP - DSS
No ratings yet
ML-Lab Manual - NEP - DSS
23 pages
T-SQL Techniques and Best Practices: Definitive Reference for Developers and Engineers
From Everand
T-SQL Techniques and Best Practices: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Transact-SQL Essentials: Definitive Reference for Developers and Engineers
From Everand
Transact-SQL Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

HCteam_IT_Proposal

Uploaded by

HCteam_IT_Proposal

Uploaded by

RESEARCH PROPOSAL TEMPLATE

Sub-committee Information Technology

Group name HCteam

1.1. Literature review

Valid Efficiency Score

Natural Language Question

Codex MBR-Exec [7], Coder-Reviewer [17], LEVER [8], QDecomp

PaLM, Gemini Series

𝐶ℎ𝑒𝑐𝑘𝐴𝑐𝑐𝑒𝑠𝑠(𝑈, 𝑄𝑠𝑎𝑓𝑒 ) = (𝑅𝑒𝑞𝑢𝑖𝑟𝑒𝑑𝑆𝑐ℎ𝑒𝑚𝑎(𝑄𝑠𝑎𝑓𝑒 ) ⊆ 𝑃𝑒𝑟𝑚𝑖𝑠𝑠𝑖𝑜𝑛𝑠(𝑈)).

 If 𝐶ℎ𝑒𝑐𝑘𝐴𝑐𝑐𝑒𝑠𝑠(𝑈, 𝑄𝑠𝑎𝑓𝑒 ) is False, reject. Otherwise, 𝑄𝑖𝑛 = 𝑄𝑠𝑎𝑓𝑒 .

 Schema Identification: Identify relevant schema 𝑆𝑟𝑒𝑙 = (𝑇𝑟𝑒𝑙 , 𝐶𝑟𝑒𝑙 ) =

Question Input validation

Figure 2: Our proposed architecture

4.5 Evaluation Phase:

ensuring the query respects user permissions.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.