-
Notifications
You must be signed in to change notification settings - Fork 366
Document Oriented Query Partitioning Technique Implementation #319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adding support for mongodb database version 4.4. With this commit random collections are created with optional validation options which can enforce the schema on inserts. Additionally the execution is logged in mongo shell code.
Random Document insertion with the option to follow the schema validation or set random types is now supported. Also an option has been added to randomly set an insert to null. Validation has to be turned off for these options though. Additionally, random indexes are created over random columns, either ascending or descending and when there are multiple indexes added for a single collection it creates a composite index out of the other ones.
The project stage is similar to the SELECT in SQL where we can specify which columns are returned. The Lookup stage is similar to the LEFT OUTER JOIN of SQL, for this we have to specifically make new random columns that specify the join column.
The query ast has multiple key difference to the SQL version. The core is the MongoDBSelect class that holds projection and lookup lists. The filter is then a tree similar to the SQL version. In order to execute and log, which are different because one are API calls and the other MongoDB shell commands there exist two visitors, the ToQueryVisitor and the ToLogVisitor.
Computed functions allows to perform arithmetic operations in queries. It is part of the projection pipeline stage, where a new field is added that holds the result. This update adds support for random computed fields with functions such as add, multiply, pow, sqrt and more.
…on in MongoDB Similar to binary comparison regular expression is a new Leaf node that can be added in the tree. To generate the pattern the existing random string generator is used.
… regular expression Changes to the MongoDBComparatorHelper allow for expected errors to be ignored if they are thrown in a query execution. This feature is mainly necessary for all the illformed regular expressions that the randomized string produces.
Due to the randomly generated computed field tree that has random types as leaf nodes, there are exceptions that we ignore such as wrong type, value not positive for sqrt for example and similar issues.
The options are flags that can be set in a string such as "im". The expression generator now generates a random valid options for the regular expression operator in MongoDB.
… documents in MongoDB This new variation introduces a variant where we execute the pipeline query once with a count as a last pipeline stage and compare the output to the result set without the count stage.
Until now not has been simulated by nor(id exists, bool_expression) and it has led to believe that the underlying structure works fine. Now after the rework the not gets evaluated and every logical operation inverted by the new NegateVisitor. At the lowest stage type problems remain.
This change reveals some issues with how we form queries for mongodb. The core of it seems to be that neither greater equal nor smaller include null values and it is rather hard to define a query that finds the ones that are null. The query that projects includes all.
…serting data For four data types of ArangoDB integer, double, string and boolean, this commit supports creating collection, keeping track of the schema, creating tables, collections and inserting randomized data.
ArangoDB now randomly generates queries that support binary comparison, binary logic operations such as or/and and the unary prefix not. Also everything is logged and the results are checked with the new ComparatorHelper.
A new generator and query has been added to support the new functionality.
Similar to MongoDB, in ArangoDB variables for computed values are created and with a special keyword LET calculated.
This oracle generates a random query, executes it and if the result set is not empty, chooses a document at random and removes it from the collection. The query is executed again to check if the document is really removed and at the end a new document is generated and inserted to make sure that the dataset is not decreasing in size.
When using cosmos make sure to set the configuration string in CosmosProvider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR and the clean code! I only have some minor nitpicks. Perhaps you can go over them and see which ones are worth addressing (also considering the risk of introducing an untested change).
Would you also be able to add a check for our CI (see https://github.com/sqlancer/sqlancer/blob/master/.github/workflows/main.yml)?
return result; | ||
} catch (Exception e) { | ||
Main.nrUnsuccessfulActions.addAndGet(1); | ||
if (e instanceof IgnoreMeException) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner to put this in a separate catch block above (i.e., catch (IgnoreMeException e) { throw e; } catch (Exception e) ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Note that I took the general structure from ComparatorHelper, but there the number of exception types is much greater and counting is not done directly in this helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After running PMD, I now remember, why I put it there. AvoidRethrowingException PMD rule does not allow this. So I just put it in front of the counter for now.
|
||
private final List<ArangoDBSchema.ArangoDBTable> schemaTables = new ArrayList<>(); | ||
|
||
public synchronized void addTable(ArangoDBSchema.ArangoDBTable table) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this is synchronized
? I might have forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a relict from a different version of keeping track of the schema. I took it out and also removed it from the mongodb provider.
try { | ||
database.drop(); | ||
} catch (Exception ignored) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to add a comment here to explain why the exception is ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment. It is just simpler to ignore the exception than checking if the database already exists in the database. For some reason in arangodb it throws an exception if the database does not exist.
In MongoDB for instance I can drop databases without worrying about the existance...
public abstract class ArangoDBQueryAdapter extends Query<ArangoDBConnection> { | ||
@Override | ||
public String getQueryString() { | ||
throw new UnsupportedOperationException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could add a short description string as an argument to explain why the operation is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually just to make sure that this function is not called, because it is a SQL function. Now with ArangoDB, it makes again sense to support it, just logging is done differently and not through this function, so I avoided logging twice like this.
I added a comment that this function should not be used, but I can switch it to implementing it, if you want.
ArangoDBSchema.ArangoDBColumn column = null; | ||
while (column == null) { | ||
ArangoDBSchema.ArangoDBTable randomTable = globalState.getSchema().getRandomTable(); | ||
column = randomTable.getRandomColumn(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it happen that the column
is null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it cannot, thanks.
|
||
@Override | ||
public String getDatabaseVersion() throws Exception { | ||
return "4.4, Java API 4.1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we really be that specific here? I worry that this might be outdated quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned out it was not so easy to get the version over the API, I could not get it to run, so I just hardcoded it. Just looked into it again and now we run a command that gives us the correct version that is used on the server. I addded it here.
} | ||
|
||
@Override | ||
public void check() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add an assertion here to check that no null
values are generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean by that. I am gonna just bring this up in the next meeting.
public class MongoDBSchema extends AbstractSchema<MongoDBGlobalState, MongoDBSchema.MongoDBTable> { | ||
|
||
public enum MongoDBDataType implements HasBsonType { | ||
INTEGER { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor nitpick: I would add a constructor here, so that you can pass the BSon type directly as an argument (i.e., as INTEGER(BsonType.INT32)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, makes sense.
}; | ||
|
||
public static MongoDBDataType getRandom(MongoDBGlobalState state) { | ||
MongoDBDataType[] valuesWithoutString; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here is quite complicated. Could it simplified (e.g., by using a list?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I found a way to make it a bit more readable, check it out and tell me what you think.
|
||
private String getRandomizedRegexOptions() { | ||
List<String> s = Randomly.subset("i", "m", "x", "s"); | ||
return s.stream().reduce("", (current, newVal) -> current + newVal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.collect(Collectors.joining())
would be simpler here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After entering it, intellij actually suggested: String.join("", s); Is probably even more readable. Will do it like this, if you want we can still go with a stream, though.
See Pull request sqlancer#319 on github.com/sqlancer for more information.
LGTM. Thanks a lot! |
Includes extended support for MongoDB, Cosmos and ArangoDB.