-
Notifications
You must be signed in to change notification settings - Fork 40.6k
apiserver OOM due to terminate all watchers for a specified crd cacher #123074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/sig api-machinery |
This isn't a bug, but is something that could be improved. /remove-kind bug To know the RV to send as a final bookmark, we would have to wait for the new cacher to be instantiated and synced before terminating the old watchers. That complicates the handoff between the old handler and new handler. We also have to account for writes to the custom resource itself that are happening while the CRD is updated / the new cacher is instantiated / the new handler is set up / the old handler is terminated so that the RV we issue doesn't ever result in the re-established watcher missing results. |
Sorry, it's a feature A complete solution to this issue would be quite complex. Is it possible to, as a temporary measure, send the latest resource version (rv) from the cacher back to the client before closing? Additionally, could we implement a rate limit during shutdown, similar to how kube-apiserver handles all watchers when to shutdown, to reduce the probability of encountering this problem. |
/triage accepted |
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
What happened?
when crd spec changes, all watchers connected to the cacher of the crd will be terminated. this will result to informer watch from last RV again. the last RV is almost always less than the global RV after cacher recreated, so a "too old resource version" error is returned by cacher and informer will do relist operation which may result to kube-apiserver OOM.
What did you expect to happen?
How can we reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: