-
Notifications
You must be signed in to change notification settings - Fork 357
[lake/iceberg] Iceberg encoding strategy #1350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MehulBatra Thanks for the pr. I left minor comment. PTAL
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergKeyEncoder.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Outdated
Show resolved
Hide resolved
fluss-common/src/test/java/com/alibaba/fluss/row/encode/iceberg/IcebergKeyEncoderTest.java
Outdated
Show resolved
Hide resolved
Hi @luoyuxia I tried to take inspiration from |
Currently our CI runs for java8, when I am using iceberg 1.9.1 in my code as it only supports java 11 , it is failing the CI check, trying to downgrade the version to 1.4.3 to support java 8, my assumption it would pass post that. |
Thanks for the explaination. I think it make senses to be inspired from iceberg's encoding. But keep in mind we use it for different purpose:
|
If that's the case, we can downgrade the version to 1.4.3 firstly, and left todo when #1195 is resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MehulBatra Thanks for the pr. Left minor comments
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Outdated
Show resolved
Hide resolved
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Show resolved
Hide resolved
-> Agreed Iceberg does use these to store partition stats and min/max for filtering. |
4f7644c
to
179f1db
Compare
We don't need to encode the values in iceberg compatibal format. We just need to make sure the bucket partition transform align with iceberg. Iceberg just bucket via literal value, without the encoding process which is different from paimon. Fluss need to encode for we need to align the process that fluss write rows, which will
So, for iceberg, although we encode it, we still need to decode the bytes into the literal value(a int value, long value, etc), and use the iceberg bucket strategy. See |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MehulBatra Thanks for the update. Only left minor comments, should be merged in next iteration.
fluss-common/src/main/java/com/alibaba/fluss/row/encode/iceberg/IcebergBinaryRowWriter.java
Outdated
Show resolved
Hide resolved
fluss-common/src/test/java/com/alibaba/fluss/row/encode/iceberg/IcebergKeyEncoderTest.java
Outdated
Show resolved
Hide resolved
pom.xml
Outdated
@@ -98,6 +98,7 @@ | |||
<curator.version>5.4.0</curator.version> | |||
<netty.version>4.1.104</netty.version> | |||
<arrow.version>15.0.0</arrow.version> | |||
<iceberg.version>1.4.3</iceberg.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this, we dont need it in root pom.xml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<!-- paimon bundle, only for test purpose -->
<dependency>
<groupId>org.apache.paimon</groupId>
<artifactId>paimon-bundle</artifactId>
<version>${paimon.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.iceberg</groupId>
<artifactId>iceberg-api</artifactId>
<version>${iceberg.version}</version>
<scope>test</scope>
</dependency>
</dependencies>
Paimon and other packages also uses root pom.xml for version, I tried to match and did the same for iceberg-api in fluss-common pom.xml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for missing that test need it. I'd like to retract this comment
14d9e40
to
9aaa307
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MehulBatra I append a some commit to refine pom file. LGTM
Thank you @luoyuxia for the review & feedbacks! |
Purpose
Implement Iceberg encoding strategy for Fluss.
Linked Issue: #1341
Brief change log
DataLakeFormat.ICEBERG
enum valueIcebergKeyEncoder
using the binary writer for key encodingTests
Wrote a Unit tests to test the encoding matches between fluss and iceberg
API and Format
None
Documentation