Cron job staggering / randomization / concurrency limiting #91652

stuartpb · 2020-06-02T00:39:49Z

What would you like to be added: CronJobs should have fields allowing for their start time to be randomized (as systemd's Timers have), or even distributed within a range to spread out in equal intervals with other CronJobs of a set. At the very least, it would be useful to specify concurrency policies like "do not run more than two Jobs with the selected annotation", so that scheduling will be postponed until that policy could be met.

Why is this needed: I have a cluster with very few cores, and KubeApps adds a CronJob to sync every repo in the cluster, one for each repo, each one scheduled to run every 10 minutes. This would only take a few seconds per repo, but when it tries to run the CronJob at the same time for every repo, it causes a "Three Stooges effect", and most of the synchronization jobs end up timing out for lack of available processing. If I could stagger these synchronizations, they could all succeed, and my cluster wouldn't be momentarily starved of resources while it happens.

stuartpb · 2020-06-02T00:42:59Z

/sig apps

pacoxu · 2020-06-03T03:27:54Z

Do you mean something like RandomizedDelaySec?

Here is a workaround: run command like sleep $(shuf -i 10-20 -n 1) before your job cmd

[root@dce-10-7-177-7 ~]# docker logs 2a17fadddca1
Wed Jun  3 03:25:01 UTC 2020
Wed Jun  3 03:25:09 UTC 2020
Hello from the Kubernetes cluster
[root@dce-10-7-177-7 ~]# kubectl get pod
NAME                     READY   STATUS      RESTARTS   AGE
dao-2048                 1/1     Running     2          17h
hello-1591154640-qk4ng   0/1     Completed   0          83s
hello-1591154700-5ftww   0/1     Completed   0          23s
myapp-pod                0/1     Init:0/2    0          73m
[root@dce-10-7-177-7 ~]# kubectl get job
NAME               COMPLETIONS   DURATION   AGE
hello-1591154640   1/1           69s        86s
hello-1591154700   1/1           18s        26s
[root@dce-10-7-177-7 ~]# kubectl get cronjob
NAME    SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
hello   */1 * * * *   False     0        32s             2m
[root@dce-10-7-177-7 ~]# kubectl get cronjob -o yaml
apiVersion: v1
items:
- apiVersion: batch/v1beta1
  kind: CronJob
  metadata:
    creationTimestamp: "2020-06-03T03:23:32Z"
    name: hello
    namespace: default
    resourceVersion: "1005216"
    selfLink: /apis/batch/v1beta1/namespaces/default/cronjobs/hello
    uid: a3ab0566-11ba-47d7-8e70-8918a55926f0
  spec:
    concurrencyPolicy: Allow
    failedJobsHistoryLimit: 1
    jobTemplate:
      metadata:
        creationTimestamp: null
      spec:
        template:
          metadata:
            creationTimestamp: null
          spec:
            containers:
            - args:
              - /bin/sh
              - -c
              - date;sleep $(shuf -i 5-10 -n 1);date;echo Hello from the Kubernetes
                cluster
              image: busybox
              imagePullPolicy: Always
              name: hello
              resources: {}
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
            dnsPolicy: ClusterFirst
            restartPolicy: OnFailure
            schedulerName: default-scheduler
            securityContext: {}
            terminationGracePeriodSeconds: 30
    schedule: '*/1 * * * *'
    successfulJobsHistoryLimit: 3
    suspend: false
  status:
    lastScheduleTime: "2020-06-03T03:25:00Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

kow3ns · 2020-07-27T16:43:16Z

/assign @soltysh

kow3ns · 2020-07-27T16:46:38Z

/cc @MikeSpreitzer

MikeSpreitzer · 2020-07-27T17:57:54Z

I briefly thought about the new API Priority and Fairness feature in the apiservers, but that will not directly do the trick because your concern is not with apiserver requests but rather running workload in pods.

A generalization of leader-election to directly enforce a concurrency limit would probably do the trick, but may be heavier weight than is needed here. I like the random delay idea.

Iridias · 2020-10-09T09:30:30Z

I would like to have the Jenkins-Syntax supported!
In Jenkins you can put an "H" instead of a number, e.g. in the minutes-field, to indicate that the task is performed every hour at an unspecified but invariant time for each task.

pjastrzabek · 2020-10-27T14:41:26Z

Randomizing execution would be awesome. +1 for 'H' support (Hashed) https://en.wikipedia.org/wiki/Cron#Non-standard_characters

kohtala · 2021-01-15T13:07:08Z

The Jenkins syntax can be found at https://www.jenkins.io/doc/book/pipeline/syntax/#cron-syntax

once in every two hours slot between 9 AM and 5 PM every weekday (perhaps at 10:38 AM, 12:38 PM, 2:38 PM, 4:38 PM)
H H(9-16)/2 * * 1-5

But maybe a more modern route is to go with options like for the systemd.timer using systemd.time format .

This sample from snapd.snap-repair.timer:

OnCalendar=*-*-* 5,11,17,23:00
RandomizedDelaySec=2h
AccuracySec=10min
OnStartupSec=15m

It does not have the concurrency limit mentioned earlier, but the AccuracySec can allow adjusting pod starts for example on load. With the status info on the Jobs previously run there would be potential for some smart scheduling if anyone finds that useful.

fejta-bot · 2021-06-01T22:53:53Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

soltysh · 2021-06-02T13:44:51Z

/remove-lifecycle stale
/lifycycle frozen

k8s-triage-robot · 2021-08-31T14:11:40Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

TimShilov · 2021-08-31T17:07:32Z

/remove-lifecycle stale

Nuxij · 2021-11-03T12:49:56Z

Is this planned?

k8s-triage-robot · 2022-02-01T13:46:18Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-03-03T14:05:31Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-04-02T14:12:28Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2023-06-18T15:17:19Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-triage-robot · 2023-07-18T15:50:37Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2023-07-18T15:50:43Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

szuecs · 2023-08-07T19:03:26Z

/reopen

k8s-ci-robot · 2023-08-07T19:03:31Z

@szuecs: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

shoffmeister · 2023-08-26T17:38:32Z

/remove-lifecycle rotten

k8s-triage-robot · 2024-01-27T02:42:52Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

thomaschaaf · 2024-01-27T05:12:18Z

/remove-lifecycle stale

gilad-aperio · 2024-04-09T11:25:19Z

Is this being worked on?

k8s-triage-robot · 2024-07-08T12:24:45Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

kragniz · 2024-08-01T08:32:12Z

/remove-lifecycle stale

k8s-triage-robot · 2024-10-30T08:52:53Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

kragniz · 2024-10-30T08:54:49Z

/remove-lifecycle stale

Exagone313 · 2025-01-27T09:52:04Z

We have a use-case where we run a periodic synchronization cronjob across projects, and as all these cronjobs start their pod at the same time, it increases database load significantly (as the underlying database is the same for all these projects). This load increase would be reduced if the cronjobs would not all start at the same time.

For such short-lived cronjobs, the suggested "workaround" of using an initContainer with a random sleep command would not do for Kubernetes offers where you pay with your pod usage and not the nodes (e.g. with GKE Autopilot): adding these sleep commands would increase the cost as the pods may live a bit longer (though it depends to the database load too in my case...).

adk-swisstopo · 2025-02-20T13:45:25Z

Another way to resolve this would be to implement the RANDOM_DELAY variable like cronie does.

k8s-triage-robot · 2025-05-21T14:26:51Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

kragniz · 2025-05-21T14:32:56Z

/remove-lifecycle stale

stuartpb added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 2, 2020

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 2, 2020

k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 2, 2020

k8s-ci-robot assigned soltysh Jul 27, 2020

liggitt added the area/workload-api/cronjob label Mar 3, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 1, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 2, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 31, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 31, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 1, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 3, 2022

k8s-ci-robot closed this as completed Apr 2, 2022

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 18, 2023

k8s-ci-robot reopened this Aug 7, 2023

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 26, 2023

helayoty added this to SIG Apps Sep 29, 2023

github-project-automation bot moved this to Needs Triage in SIG Apps Sep 29, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 8, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 1, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 30, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 30, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2025

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2025

Cron job staggering / randomization / concurrency limiting #91652

Cron job staggering / randomization / concurrency limiting #91652

Comments

stuartpb commented Jun 2, 2020

stuartpb commented Jun 2, 2020

Uh oh!

pacoxu commented Jun 3, 2020

Uh oh!

kow3ns commented Jul 27, 2020

Uh oh!

kow3ns commented Jul 27, 2020

Uh oh!

MikeSpreitzer commented Jul 27, 2020

Uh oh!

Iridias commented Oct 9, 2020

Uh oh!

pjastrzabek commented Oct 27, 2020

Uh oh!

kohtala commented Jan 15, 2021

Uh oh!

fejta-bot commented Jun 1, 2021

Uh oh!

soltysh commented Jun 2, 2021

Uh oh!

k8s-triage-robot commented Aug 31, 2021

Uh oh!

TimShilov commented Aug 31, 2021

Uh oh!

Nuxij commented Nov 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-triage-robot commented Feb 1, 2022

Uh oh!

k8s-triage-robot commented Mar 3, 2022

Uh oh!

k8s-triage-robot commented Apr 2, 2022

Uh oh!

k8s-ci-robot commented Jun 18, 2023

Uh oh!

k8s-triage-robot commented Jul 18, 2023

Uh oh!

k8s-ci-robot commented Jul 18, 2023

Uh oh!

szuecs commented Aug 7, 2023

Uh oh!

k8s-ci-robot commented Aug 7, 2023

Uh oh!

shoffmeister commented Aug 26, 2023

Uh oh!

k8s-triage-robot commented Jan 27, 2024

Uh oh!

thomaschaaf commented Jan 27, 2024

Uh oh!

gilad-aperio commented Apr 9, 2024

Uh oh!

k8s-triage-robot commented Jul 8, 2024

Uh oh!

kragniz commented Aug 1, 2024

Uh oh!

k8s-triage-robot commented Oct 30, 2024

Uh oh!

kragniz commented Oct 30, 2024

Uh oh!

Exagone313 commented Jan 27, 2025

Uh oh!

adk-swisstopo commented Feb 20, 2025

Uh oh!

k8s-triage-robot commented May 21, 2025

Uh oh!

kragniz commented May 21, 2025

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Nuxij commented Nov 3, 2021 •

edited

Loading