Content-Length: 205304 | pFad | http://github.com/vitessio/vitess/issues/17965

E1 Bug Report: `Reshard Cancel` deletes shards form the topology · Issue #17965 · vitessio/vitess · GitHub
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Reshard Cancel deletes shards form the topology #17965

Open
arthurschreiber opened this issue Mar 14, 2025 · 0 comments
Open

Bug Report: Reshard Cancel deletes shards form the topology #17965

arthurschreiber opened this issue Mar 14, 2025 · 0 comments

Comments

@arthurschreiber
Copy link
Contributor

Overview of the Issue

When cancelling a reshard workflow via the Reshard Cancel command, Vitess currently "cleans up" the target shards of the reshard workflow by deleting them from the topology.

This seems unintended, as it causes two problems:

  1. The new shards (and their tablet records) in the topology are gone, but the tablets are left running. That's unexpected. If we want to keep this behaviour, it should be an optional flag and not enabled by default.
  2. Because the new shards and their tablets get deleted from the topology, the workflow cleanup step doesn't actually clean up anything because at that point Vitess doesn't even know about the existance of the tablets anymore. This means the vttablets for the deleted shards just keep executing the workflow.

This seems to be happening here:

log.Infof("Removing target shards")
if err := sw.dropTargetShards(ctx); err != nil {
return nil, err
}

And here:

if err := sw.dropTargetShards(ctx); err != nil {
return nil, err
}

I don't think we should be dropping the target shards here. Instead, we should only drop the tables on the target shards, to restore them to the state they had before resharding was started.

Reproduction Steps

Create new shards and start a Reshard workflow. Cancel the Reshard workflow via Reshard Cancel command.

Notice that the shards and tablets have been deleted from the topology. Check the tablet status pages to see that the tablets are still running the reshard workflow.

Binary Version

v19 and later

Operating System and Environment details

N/A

Log Fragments

N/A
@arthurschreiber arthurschreiber added Type: Bug Needs Triage This issue needs to be correctly labelled and triaged Component: vtctldclient and removed Needs Triage This issue needs to be correctly labelled and triaged labels Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/vitessio/vitess/issues/17965

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy