-
Notifications
You must be signed in to change notification settings - Fork 40.6k
Allow HPA to scale out when no matched Pods are ready #130130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/sig autoscaling |
cc sig autoscaling maintainers: @gjtempleton @MaciekPytel is this a bug or a known issue? |
refer to #51650? (I will take a look tomorrow) |
The logic were added in #60886.
Meanwhile, https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details(quoted content) and #67252, there is |
Hi @pacoxu, thanks for your explanation. However, we are using external metrics to drive our scaling so I think --horizontal-pod-autoscaler-cpu-initialization-period would not be applicable to our use case. |
Hi @pacoxu Could you please help go on with this issue? |
I'd argue that this is a known issue/design choice rather than a bug, as you pointed out @pacoxu the behaviour for object and external metrics was set to be ~the same as with resource metrics by #60886 (#33593 was the original source of a lot of these choices). Definitely a valid use case that we don't currently support nicely. |
/triage accepted |
/remove-triage accepted |
@gjtempleton Thanks for your explaining. Does your team have any plans to solve this problem? |
@zheyli you can answer that question self-service: the backlog of issues is public, so you can search through them yourself. |
Given #130130 (comment) |
@gjtempleton So instead of always using ready_pods (which becomes 0 in our problem case), we could do:
This way we don't get stuck at 0 when pods are unhealthy, and it should be pretty simple to implement. |
Having a quick look at the code, this seems sane to me. I'm just not 100% sure if there are edge cases we need to be aware of, or if this will change the behaviour in a way that's not expected for the user. |
Ill try to write something and see where it goes. |
Hi @omerap12, thank you for taking this on. Using current replicas as a fallback during an extreme case of all pods becoming unready can prevent the emergency situation of being stuck at zero pods, but I think users utilizing external metrics would benefit from having the option to use replica count instead of ready pods from the start. In the scenario where pods start becoming unready due to an increase in external metric utilization, since HPA does not account for unready pods, we do not see the scaling behavior we would expect. HPA does slowly scale up, but not nearly as aggressively as it should according to the scale up policy. Instead utilization continues to cause pods to become unready until there is potentially service degradation or interruption. |
No problem at all :) In this case, I think it’s better to trigger an alert for the team, rather than expecting the HPA to handle it on its own. |
I agree with everything you're saying. This seems to be a major downside of using external metrics for HPA. The most common scenario we have seen for pods becoming unready is that our Puma workers are exhausted on several pods. The average of our external metric, Puma worker utilization, reaches the the scaling target but since there are unready pods HPA does not scale as quickly as it should. We have also had some Puma worker latency due to slow DB queries which has increased load on the DB, but in the end the interim solution was to scale up the DB until optimization could be performed, and HPA saved us from an actual outage. In the event pods can't connect to the DB, I would rather set alerting for that type of event than not have external metric based HPA not work properly. |
Yeah, exactly — since the HPA doesn’t really know why pods are unready, scaling based on them could cause more problems, especially if the issue isn’t because of high load. I think it’s safer not to scale unready pods. |
What happened?
One of our production pool cannot scale up when its metrics reached its threshold because all pods became all unready at that moment when there was a peak traffic. Dig into the source code, we found that HPA calculate the desired replica using the ready pod count so cause the recommend replica is always 0.
What did you expect to happen?
Could you please help explain why to use the ready count and is there any way to refine the implement?
How can we reproduce it (as minimally and precisely as possible)?
Create a deployment with all unready pods. Or perform the performance test to a deployment and make all pods under it crash.
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: