-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create empty snapshot for metadata operations #7075
Comments
I don't think we need to create a new snapshot for every metadata operation, but I think it would be reasonable to create empty snapshots when we need to create a branch and there is no current snapshot. And I also think it would be reasonable to create a snapshot when the schema changes to signal when in history that happened. |
How do we revert a table to "empty state" without a snapshot for empty table? Do I have to rebuild table? |
@rdblue, my team and I came across a similar problem with schema updates. The mentioned pull request handles creating empty snapshots for empty tables but I don't see changes that address creating snapshots for schema updates. If so and if not already being worked on, can my team and I contribute to the remainder of this issue? |
Can i take this up if not already done? |
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' |
Feature Request / Improvement
based on #6965 (comment)
Creating an empty snapshot for all metadata operations to make sure table does not have a state with no snapshot might simplify various use cases.
(1) for branching,
main
does not need to be a special case compared to custom branches that has to exist only after the first data write.(2) for time travel, currently schema is derived from snapshot ID at the specific time. If a table added data at t0, has for example schema update at t1, creating an empty snapshot at t1 means that traveling to t0 and t1 will yield different results because schema has changed, which makes more sense.
However, doing so might have other implications and affect behavior of existing operations like snapshot expiration.
Also we will have to keep backwards compatibility and still deal with tables with no snapshot, so maybe we do not gain much and have to live with the current situation.
Would like to know what others think.
cc @rdblue @aokolnychyi @RussellSpitzer @danielcweeks
Query engine
None
The text was updated successfully, but these errors were encountered: