Content-Length: 429042 | pFad | https://github.com/apache/iceberg/pull/4947

ED API: Add expression equivalence testing by rdblue · Pull Request #4947 · apache/iceberg · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Add expression equivalence testing #4947

Merged
merged 6 commits into from
Jun 3, 2022

Conversation

rdblue
Copy link
Contributor

@rdblue rdblue commented Jun 2, 2022

This adds ExpressionUtil.equivalent and ExpressionUtil.selectsPartitions to determine whether an expression will select whole files in a partition spec. This is needed to determine whether a filter is guaranteed to delete only whole files, or whether a row-level plan should be used. It is also needed to implement aggregate pushdown with non-trivial filters because filters must match whole files in order to use file metadata to satisfy a query.

@github-actions github-actions bot added the API label Jun 2, 2022
@rdblue rdblue force-pushed the add-expression-equivalence branch from a304dbd to 0ba6dd5 Compare June 2, 2022 20:48
@@ -259,7 +259,7 @@ public String toString() {
@SuppressWarnings("unchecked")
static <T> Set<T> setOf(Iterable<Literal<T>> literals) {
Literal<T> lit = Iterables.get(literals, 0);
if (lit instanceof Literals.StringLiteral && lit.value() instanceof CharSequence) {
if (lit instanceof Literals.StringLiteral) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always true because StringLiteral wraps a CharSequence.

* @param other another expression
* @return true if the expressions are equivalent
*/
default boolean isEquivalentTo(Expression other) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which cases we use is prefix for boolean methods and in which no?
We are not very consistent in the code right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I typically try to make the method name form a natural sentence in English, like if (someFilter.isEquivalentTo(other)) reads like "if some filter is equivalent to other". In some cases it makes more sense to use other words, like has.

I wasn't sure what to do with ExpressionUtil.equivalent since it compares two things, but areEquivalent(f1, f2, schema) doesn't feel right. I just left that as ExpressionUtil.equivalent. Suggestions are welcome!

* @param spec a partition spec
* @return true if the expression will select whole partitions in the given spec
*/
public static boolean selectsPartitions(Expression expr, PartitionSpec spec) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I correct this should allow us to optimize metadata-only deletes by handling obvious cases without the need to plan files with potential matches and invoking evaluators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll follow up with that once this is merged.

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. One question about case sensitivity when building projections.

@rdblue rdblue merged commit ff7c217 into apache:master Jun 3, 2022
@rdblue
Copy link
Contributor Author

rdblue commented Jun 3, 2022

Thanks for reviewing, @aokolnychyi!

@wypoon
Copy link
Contributor

wypoon commented Jun 6, 2022

@rdblue after I picked up this commit in master, my local build fails with

Execution failed for task ':iceberg-api:revapi'.
> There were Java public API/ABI breaks reported by revapi:
  
  java.method.addedToInterface: Method was added to an interface.
  
  old: <none>
  new: method boolean org.apache.iceberg.expressions.BoundTerm<T>::isEquivalentTo(org.apache.iceberg.expressions.BoundTerm<?>)
  
  SOURCE: BREAKING, BINARY: NON_BREAKING, SEMANTIC: POTENTIALLY_BREAKING
  
  From old archive: <none>
  From new archive: iceberg-api-0.14.0-SNAPSHOT.jar
  
  If this is an acceptable break that will not harm your users, you can ignore it in future runs like so for:
  
    * Just this break:
        ./gradlew :iceberg-api:revapiAcceptBreak --justification "{why this break is ok}" \
          --code "java.method.addedToInterface" \
          --new "method boolean org.apache.iceberg.expressions.BoundTerm<T>::isEquivalentTo(org.apache.iceberg.expressions.BoundTerm<?>)"
    * All breaks in this project:
        ./gradlew :iceberg-api:revapiAcceptAllBreaks --justification "{why this break is ok}"
    * All breaks in all projects:
        ./gradlew revapiAcceptAllBreaks --justification "{why this break is ok}"
  ----------------------------------------------------------------------------------------------------

Do you need to add something to .palantir/revapi.yml?

@rdblue
Copy link
Contributor Author

rdblue commented Jun 6, 2022

@wypoon, I merged a PR that added BoundTerm.isEquivalentTo earlier today: c9efc4b

@kbendick, can you make sure we're running the checks on PRs correctly?

@wypoon
Copy link
Contributor

wypoon commented Jun 6, 2022

@wypoon, I merged a PR that added BoundTerm.isEquivalentTo earlier today: c9efc4b

Thanks. I haven't picked up #4965. I'll do so then.

namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022
namrathamyske pushed a commit to namrathamyske/iceberg that referenced this pull request Jul 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: https://github.com/apache/iceberg/pull/4947

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy