Prometheus readiness check based on remote write success #16178
Unanswered
juan-ramirez-sp
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
We run HA prometheus in our production clusters with at least 2 prom nodes running at all times.
We have some large prometheus clusters that are taking ~20 minutes to start up and another ~20 minutes to begin remote writing to Mimir.
Our liveness/readiness probes function well for the first 20 minutes, but once prometheus marks as ready the second node goes under. This causes samples to stop arriving in Mimir as both prometheus are loading something for startup. After some time remote write kicks in and we get those samples, but it would be good to prevent this scenario from occuring.
Has anyone created a way to only mark prom as ready when it's remote writing?
Does this make sense as a new HTTP endpoint to mark that remote write has succeed sending samples?
/-/remoteWriteReady
We could always make prometheus startup faster by tweaking block size settings or reducing the size. I'm curious on how we think we can make this scale in an environment such as this.
Beta Was this translation helpful? Give feedback.
All reactions