Prometheus readiness check based on remote write success #16178

juan-ramirez-sp · 2025-03-06T21:16:41Z

juan-ramirez-sp
Mar 6, 2025

We run HA prometheus in our production clusters with at least 2 prom nodes running at all times.

We have some large prometheus clusters that are taking ~20 minutes to start up and another ~20 minutes to begin remote writing to Mimir.

Our liveness/readiness probes function well for the first 20 minutes, but once prometheus marks as ready the second node goes under. This causes samples to stop arriving in Mimir as both prometheus are loading something for startup. After some time remote write kicks in and we get those samples, but it would be good to prevent this scenario from occuring.

Has anyone created a way to only mark prom as ready when it's remote writing?

Does this make sense as a new HTTP endpoint to mark that remote write has succeed sending samples?

/-/remoteWriteReady

We could always make prometheus startup faster by tweaking block size settings or reducing the size. I'm curious on how we think we can make this scale in an environment such as this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prometheus readiness check based on remote write success #16178

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Prometheus readiness check based on remote write success #16178

Uh oh!

juan-ramirez-sp Mar 6, 2025

Replies: 0 comments

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

juan-ramirez-sp
Mar 6, 2025