Skip to content

Commit 466a4ea

Browse files
committed
updated readme, moved haskell files to own directory
1 parent 42cdd3f commit 466a4ea

File tree

4 files changed

+53
-15
lines changed

4 files changed

+53
-15
lines changed

README.md

Lines changed: 50 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,61 @@
1-
## chiDB SQL Parser
1+
# chiDB SQL Compiler Front-End
22

33
This is a SQL Parser and compiler frontent which can parse a reasonably large subset of SQL. The parser is generated by lex and yacc, and all other code is in C. The parser generates an abstract syntax tree (AST) representation of SQL, which is based on Relational Algebra, extended to fit closer with SQL. This extended relational algebra is termed SRA (sugared relational algebra), as it itself can be compiled into more or less pure relational algebra.
44

5-
### Example
6-
Consider the following SQL query:
5+
Since relational algebra primarily deals with only queries, there are separate data structures to deal with `Create Table`, `Create Index`, `Insert Into`, and `Delete From` commands (and possibly other things in the future). These don't use RA, but they do, for example, use the Expression part of the abstract syntax tree.
76

8-
```sql
7+
## Methodologies
98

10-
SQL:
11-
select f.a as Col1, g.a as Col2 from Foo f, Foo g where Col1 != Col2;
9+
The parser is is generated by Lex (a lexer generator) and Yacc (a parser generator). All other code is written in C, along with prototypes in Haskell.
10+
11+
### Lex
12+
13+
The lexer contains a series of regular expressions defining tokens of the language and associating them with instructions to be performed for when a given regular expression is found. The Lex utility converts a Lex file into C code which will scan an input for the next token and perform whatever instructions are to be performed when that token is found. These can be as simple as returning some integral value indicating what type of token it is (which is drawn out of an enum generated by yacc), or more complicated instructions such as dealing with comments, converting escape sequences, or storing the value of a constant expression (like an integer or string) or name of a variable.
14+
15+
### Yacc
16+
17+
Yacc is used to generate the parser. A yacc file is similar to a lex file in that it has a series of definitions of structures, and instructions for when those structures are encountered. However, in Yacc the structures are grammatical rules, and the instructions that accompany them are usually to build a parse tree (abstract syntax tree), which is a representation of the language in data structures. Yacc allows us to rapidly construct a correct and efficient parser, which gives us detailed error messages (either in parsing or in writing the parser itself), and saves us lots of time compared to a hand-written parser.
18+
19+
## Sugared Relational Algebra
20+
21+
In the ChiDB SQL parser, the instructions in the Yacc file produce a representation which we call Sugared Relational Algebra (SRA). This is an extended form of relational algebra; with several differences. For example:
22+
23+
* it contains as primitives multiple join types (Inner Join, Full/Left/Right outer join, Natural Join) and Intersection
24+
25+
* it has no rho operator, because all renaming and aliasing is contained alongside the expressions. For example in SRA, a `Project` structure has a list of expressions which can optionally have aliases, but in RA, a `Pi` structure has only expressions, and any aliases must be done with a Rho operator.
26+
27+
* SRA is also allowed to use `*` as in SQL to stand for "all columns in the table". The step which translates from SRA to RA will expand all `*`s into the actual list of columns.
1228

29+
The translation from SRA to RA is called desugaring. Here's an example. Say we have a table `t` which has columns `w`, `x`, and `y`:
30+
31+
```
32+
SQL:
33+
SELECT *, x+y as z from t;
34+
35+
SRA:
36+
Project([*, (Add(x, y), z)],
37+
Table(t))
38+
39+
RA:
40+
Pi([w, x, y, z],
41+
Rho(Add(x,y), z,
42+
Pi([w, x, y, Add(x,y)],
43+
Table(t))))
1344
```
1445

15-
The parser will create an AST which looks like this:
46+
A more complicated example:
1647

1748
```
49+
SQL:
50+
select f.a as Col1, g.a as Col2 from Foo f, Foo g where Col1 != Col2;
51+
52+
SRA:
1853
Project([(f,a,Col1), (g,a,Col2)],
1954
Select(Col1 != Col2,
2055
Join([(Foo,f), (Foo,g)])
2156
)
2257
)
23-
```
2458
25-
The desugaring step will produce a Relational Algebra tree:
26-
27-
```
2859
Pi([Col1, Col2],
2960
Sigma(Col1 != Col2,
3061
Cross(
@@ -35,9 +66,15 @@ Pi([Col1, Col2],
3566
)
3667
```
3768

38-
Since relational algebra primarily deals with only queries, there are separate data structures to deal with `Create Table`, `Create Index`, `Insert Into`, and `Delete From` commands (and possibly other things in the future). These don't use RA, but they do, for example, use the Expression part of the abstract syntax tree.
69+
## Current Status
70+
71+
Currently, we have a good deal of machinery in place. The parser and lexer are finished, and all of the data structures for both SRA and RA are written, along with constructors, destructors, pretty printers, etc. There is a doubly-linked list library which is quite robust (though not fully thread-safe, but this could be reasonably easily accomplished). There is also a vector library which might be useful, for example, for string-building, or if a vector-based instead of list-based representation of columns, expressions, etc. is desired.
72+
73+
## Future Directions
74+
75+
The largest thing missing at this point is the desugarer. However, we have one written up in the Haskell language (`Desugar.hs`, `SRA.hs`, examples are in `Tests.hs`), and the translation of this code into C should be fairly straightforward. Haskell allows us to represent and manipulate structures of RA and SRA with ease, conciseness and correctness, and I humbly recommend any modification of the code to be prototyped in Haskell prior to coding it in C.
3976

40-
## Installation and Usage (*nix operating systems)
77+
# Installation and Usage (*nix operating systems)
4178

4279
Clone into the git repository:
4380

tests/Desugar.hs renamed to haskell/Desugar.hs

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@ module Desugar where
22

33
import SRA
44
import Data.List
5-
import Debug.Trace
65

76
desugar :: TableMap -> SRA -> RA
87
-- desugaring a table name means looking up the table, which
@@ -155,7 +154,9 @@ getCols (Table name cols) = cols
155154

156155
-- when we have a project, we are specifying the names of columns to be
157156
-- carried through. We simply use the show function to get a string
158-
-- representation of the expression. Table name qualifiers will be kept.
157+
-- representation of the expression, and we use the getType function to
158+
-- find the type; from this we can construct a (String, Type) tuple, which
159+
-- is a column. Table name qualifiers will be kept.
159160
getCols (Pi exprs ra) = map toCols exprs where
160161
toCols expr = (show expr, getType expr ra)
161162

File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy