-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Add expression equivalence testing #4947
Changes from 4 commits
0ba6dd5
4aa8308
b3fbd70
c1f29d2
0e93e17
1f3c38c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,6 +21,7 @@ | |
|
||
import java.util.regex.Pattern; | ||
import java.util.stream.Collectors; | ||
import org.apache.iceberg.PartitionSpec; | ||
import org.apache.iceberg.transforms.Transform; | ||
import org.apache.iceberg.transforms.Transforms; | ||
import org.apache.iceberg.types.Types; | ||
|
@@ -66,6 +67,40 @@ public static String toSanitizedString(Expression expr) { | |
return ExpressionVisitors.visit(expr, StringSanitizer.INSTANCE); | ||
} | ||
|
||
/** | ||
* Returns whether two unbound expressions will accept the same inputs. | ||
* <p> | ||
* If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if | ||
* this returns false the expressions may return the same evaluation for the same input. That is, expressions may | ||
* be equivalent even if this returns false. | ||
* | ||
* @param left an unbound expression | ||
* @param right an unbound expression | ||
* @param struct a struct type for binding | ||
* @return true if the expressions are equivalent | ||
*/ | ||
public static boolean equivalent(Expression left, Expression right, Types.StructType struct) { | ||
rdblue marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return Binder.bind(struct, Expressions.rewriteNot(left)) | ||
.isEquivalentTo(Binder.bind(struct, Expressions.rewriteNot(right))); | ||
} | ||
|
||
/** | ||
* Returns whether an expression selects whole partitions for a partition spec. | ||
* <p> | ||
* For example, ts < '2021-03-09T10:00:00.000' selects whole partitions in an hourly spec, [hours(ts)], but does | ||
* not select whole partitions in a daily spec, [days(ts)]. | ||
* | ||
* @param expr an unbound expression | ||
* @param spec a partition spec | ||
* @return true if the expression will select whole partitions in the given spec | ||
*/ | ||
public static boolean selectsPartitions(Expression expr, PartitionSpec spec) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Am I correct this should allow us to optimize metadata-only deletes by handling obvious cases without the need to plan files with potential matches and invoking evaluators? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool, I'll follow up with that once this is merged. |
||
return equivalent( | ||
Projections.inclusive(spec).project(expr), | ||
rdblue marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Projections.strict(spec).project(expr), | ||
spec.partitionType()); | ||
} | ||
|
||
private static class ExpressionSanitizer extends ExpressionVisitors.ExpressionVisitor<Expression> { | ||
private static final ExpressionSanitizer INSTANCE = new ExpressionSanitizer(); | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -259,7 +259,7 @@ public String toString() { | |
@SuppressWarnings("unchecked") | ||
static <T> Set<T> setOf(Iterable<Literal<T>> literals) { | ||
Literal<T> lit = Iterables.get(literals, 0); | ||
if (lit instanceof Literals.StringLiteral && lit.value() instanceof CharSequence) { | ||
if (lit instanceof Literals.StringLiteral) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is always true because |
||
Iterable<T> values = Iterables.transform(literals, Literal::value); | ||
Iterable<CharSequence> charSeqs = Iterables.transform(values, val -> (CharSequence) val); | ||
return (Set<T>) CharSequenceSet.of(charSeqs); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which cases we use
is
prefix for boolean methods and in which no?We are not very consistent in the code right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I typically try to make the method name form a natural sentence in English, like
if (someFilter.isEquivalentTo(other))
reads like "if some filter is equivalent to other". In some cases it makes more sense to use other words, likehas
.I wasn't sure what to do with
ExpressionUtil.equivalent
since it compares two things, butareEquivalent(f1, f2, schema)
doesn't feel right. I just left that asExpressionUtil.equivalent
. Suggestions are welcome!