-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Add expression equivalence testing #4947
Conversation
a304dbd
to
0ba6dd5
Compare
@@ -259,7 +259,7 @@ public String toString() { | |||
@SuppressWarnings("unchecked") | |||
static <T> Set<T> setOf(Iterable<Literal<T>> literals) { | |||
Literal<T> lit = Iterables.get(literals, 0); | |||
if (lit instanceof Literals.StringLiteral && lit.value() instanceof CharSequence) { | |||
if (lit instanceof Literals.StringLiteral) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is always true because StringLiteral
wraps a CharSequence
.
api/src/main/java/org/apache/iceberg/expressions/BoundLiteralPredicate.java
Outdated
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/expressions/BoundTransform.java
Outdated
Show resolved
Hide resolved
* @param other another expression | ||
* @return true if the expressions are equivalent | ||
*/ | ||
default boolean isEquivalentTo(Expression other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which cases we use is
prefix for boolean methods and in which no?
We are not very consistent in the code right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I typically try to make the method name form a natural sentence in English, like if (someFilter.isEquivalentTo(other))
reads like "if some filter is equivalent to other". In some cases it makes more sense to use other words, like has
.
I wasn't sure what to do with ExpressionUtil.equivalent
since it compares two things, but areEquivalent(f1, f2, schema)
doesn't feel right. I just left that as ExpressionUtil.equivalent
. Suggestions are welcome!
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java
Outdated
Show resolved
Hide resolved
* @param spec a partition spec | ||
* @return true if the expression will select whole partitions in the given spec | ||
*/ | ||
public static boolean selectsPartitions(Expression expr, PartitionSpec spec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I correct this should allow us to optimize metadata-only deletes by handling obvious cases without the need to plan files with potential matches and invoking evaluators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'll follow up with that once this is merged.
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me. One question about case sensitivity when building projections.
Thanks for reviewing, @aokolnychyi! |
@rdblue after I picked up this commit in master, my local build fails with
Do you need to add something to .palantir/revapi.yml? |
This adds
ExpressionUtil.equivalent
andExpressionUtil.selectsPartitions
to determine whether an expression will select whole files in a partition spec. This is needed to determine whether a filter is guaranteed to delete only whole files, or whether a row-level plan should be used. It is also needed to implement aggregate pushdown with non-trivial filters because filters must match whole files in order to use file metadata to satisfy a query.