Skip to content

Document Oriented Query Partitioning Technique Implementation #319

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Mar 30, 2021

Conversation

pfu3tz
Copy link
Contributor

@pfu3tz pfu3tz commented Mar 22, 2021

Includes extended support for MongoDB, Cosmos and ArangoDB.

pfu3tz added 23 commits March 22, 2021 23:03
Adding support for mongodb database version 4.4. With this commit random collections are created with optional validation options which can enforce the schema on inserts. Additionally the execution is logged in mongo shell code.
Random Document insertion with the option to follow the schema validation or set random types is now supported. Also an option has been added to randomly set an insert to null. Validation has to be turned off for these options though.

Additionally, random indexes are created over random columns, either ascending or descending and when there are multiple indexes added for a single collection it creates a composite index out of the other ones.
The project stage is similar to the SELECT in SQL where we can specify which columns are returned. The Lookup stage is similar to the LEFT OUTER JOIN of SQL, for this we have to specifically make new random columns that specify the join column.
The query ast has multiple key difference to the SQL version. The core is the MongoDBSelect class that holds projection and lookup lists. The filter is then a tree similar to the SQL version.

In order to execute and log, which are different because one are API calls and the other MongoDB shell commands there exist two visitors, the ToQueryVisitor and the ToLogVisitor.
Computed functions allows to perform arithmetic operations in queries. It is part of the projection pipeline stage, where a new field is added that holds the result. This update adds support for random computed fields with functions such as add, multiply, pow, sqrt and more.
…on in MongoDB

Similar to binary comparison regular expression is a new Leaf node that can be added in the tree. To generate the pattern the existing random string generator is used.
… regular expression

Changes to the MongoDBComparatorHelper allow for expected errors to be ignored if they are thrown in a query execution. This feature is mainly necessary for all the illformed regular expressions that the randomized string produces.
Due to the randomly generated computed field tree that has random types as leaf nodes, there are exceptions that we ignore such as wrong type, value not positive for sqrt for example and similar issues.
The options are flags that can be set in a string such as "im". The expression generator now generates a random valid options for the regular expression operator in MongoDB.
… documents in MongoDB

This new variation introduces a variant where we execute the pipeline query once with a count as a last pipeline stage and compare the output to the result set without the count stage.
Until now not has been simulated by nor(id exists, bool_expression) and it has led to believe that the underlying structure works fine. Now after the rework the not gets evaluated and every logical operation inverted by the new NegateVisitor. At the lowest stage type problems remain.
This change reveals some issues with how we form queries for mongodb. The core of it seems to be that neither greater equal nor smaller include null values and it is rather hard to define a query that finds the ones that are null. The query that projects includes all.
…serting data

For four data types of ArangoDB integer, double, string and boolean, this commit supports creating collection, keeping track of the schema, creating tables, collections and inserting randomized data.
ArangoDB now randomly generates queries that support binary comparison, binary logic operations such as or/and and the unary prefix not. Also everything is logged and the results are checked with the new ComparatorHelper.
A new generator and query has been added to support the new functionality.
Similar to MongoDB, in ArangoDB variables for computed values are created and with a special keyword LET calculated.
This oracle generates a random query, executes it and if the result set is not empty, chooses a document at random and removes it from the collection. The query is executed again to check if the document is really removed and at the end a new document is generated and inserted to make sure that the dataset is not decreasing in size.
When using cosmos make sure to set the configuration string in CosmosProvider.
Copy link
Contributor

@mrigger mrigger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR and the clean code! I only have some minor nitpicks. Perhaps you can go over them and see which ones are worth addressing (also considering the risk of introducing an untested change).

Would you also be able to add a check for our CI (see https://github.com/sqlancer/sqlancer/blob/master/.github/workflows/main.yml)?

return result;
} catch (Exception e) {
Main.nrUnsuccessfulActions.addAndGet(1);
if (e instanceof IgnoreMeException) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to put this in a separate catch block above (i.e., catch (IgnoreMeException e) { throw e; } catch (Exception e) ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree. Note that I took the general structure from ComparatorHelper, but there the number of exception types is much greater and counting is not done directly in this helper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running PMD, I now remember, why I put it there. AvoidRethrowingException PMD rule does not allow this. So I just put it in front of the counter for now.


private final List<ArangoDBSchema.ArangoDBTable> schemaTables = new ArrayList<>();

public synchronized void addTable(ArangoDBSchema.ArangoDBTable table) {
Copy link
Contributor

@mrigger mrigger Mar 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this is synchronized? I might have forgotten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a relict from a different version of keeping track of the schema. I took it out and also removed it from the mongodb provider.

try {
database.drop();
} catch (Exception ignored) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add a comment here to explain why the exception is ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment. It is just simpler to ignore the exception than checking if the database already exists in the database. For some reason in arangodb it throws an exception if the database does not exist.

In MongoDB for instance I can drop databases without worrying about the existance...

public abstract class ArangoDBQueryAdapter extends Query<ArangoDBConnection> {
@Override
public String getQueryString() {
throw new UnsupportedOperationException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you could add a short description string as an argument to explain why the operation is not supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually just to make sure that this function is not called, because it is a SQL function. Now with ArangoDB, it makes again sense to support it, just logging is done differently and not through this function, so I avoided logging twice like this.

I added a comment that this function should not be used, but I can switch it to implementing it, if you want.

ArangoDBSchema.ArangoDBColumn column = null;
while (column == null) {
ArangoDBSchema.ArangoDBTable randomTable = globalState.getSchema().getRandomTable();
column = randomTable.getRandomColumn();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it happen that the column is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it cannot, thanks.


@Override
public String getDatabaseVersion() throws Exception {
return "4.4, Java API 4.1";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we really be that specific here? I worry that this might be outdated quickly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned out it was not so easy to get the version over the API, I could not get it to run, so I just hardcoded it. Just looked into it again and now we run a command that gives us the correct version that is used on the server. I addded it here.

}

@Override
public void check() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add an assertion here to check that no null values are generated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you mean by that. I am gonna just bring this up in the next meeting.

public class MongoDBSchema extends AbstractSchema<MongoDBGlobalState, MongoDBSchema.MongoDBTable> {

public enum MongoDBDataType implements HasBsonType {
INTEGER {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor nitpick: I would add a constructor here, so that you can pass the BSon type directly as an argument (i.e., as INTEGER(BsonType.INT32).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, makes sense.

};

public static MongoDBDataType getRandom(MongoDBGlobalState state) {
MongoDBDataType[] valuesWithoutString;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is quite complicated. Could it simplified (e.g., by using a list?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I found a way to make it a bit more readable, check it out and tell me what you think.


private String getRandomizedRegexOptions() {
List<String> s = Randomly.subset("i", "m", "x", "s");
return s.stream().reduce("", (current, newVal) -> current + newVal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.collect(Collectors.joining()) would be simpler here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After entering it, intellij actually suggested: String.join("", s); Is probably even more readable. Will do it like this, if you want we can still go with a stream, though.

See Pull request sqlancer#319 on github.com/sqlancer for more information.
@mrigger
Copy link
Contributor

mrigger commented Mar 30, 2021

LGTM. Thanks a lot!

@mrigger mrigger merged commit 1be16f0 into sqlancer:master Mar 30, 2021
@pfu3tz pfu3tz deleted the DBConnection branch March 30, 2021 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy