Skip to content

Cron job staggering / randomization / concurrency limiting #91652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
stuartpb opened this issue Jun 2, 2020 · 38 comments
Open

Cron job staggering / randomization / concurrency limiting #91652

stuartpb opened this issue Jun 2, 2020 · 38 comments
Assignees
Labels
area/workload-api/cronjob kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.

Comments

@stuartpb
Copy link

stuartpb commented Jun 2, 2020

What would you like to be added: CronJobs should have fields allowing for their start time to be randomized (as systemd's Timers have), or even distributed within a range to spread out in equal intervals with other CronJobs of a set. At the very least, it would be useful to specify concurrency policies like "do not run more than two Jobs with the selected annotation", so that scheduling will be postponed until that policy could be met.

Why is this needed: I have a cluster with very few cores, and KubeApps adds a CronJob to sync every repo in the cluster, one for each repo, each one scheduled to run every 10 minutes. This would only take a few seconds per repo, but when it tries to run the CronJob at the same time for every repo, it causes a "Three Stooges effect", and most of the synchronization jobs end up timing out for lack of available processing. If I could stagger these synchronizations, they could all succeed, and my cluster wouldn't be momentarily starved of resources while it happens.

@stuartpb stuartpb added the kind/feature Categorizes issue or PR as related to a new feature. label Jun 2, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 2, 2020
@stuartpb
Copy link
Author

stuartpb commented Jun 2, 2020

/sig apps

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 2, 2020
@pacoxu
Copy link
Member

pacoxu commented Jun 3, 2020

Do you mean something like RandomizedDelaySec?

Here is a workaround: run command like sleep $(shuf -i 10-20 -n 1) before your job cmd

[root@dce-10-7-177-7 ~]# docker logs 2a17fadddca1
Wed Jun  3 03:25:01 UTC 2020
Wed Jun  3 03:25:09 UTC 2020
Hello from the Kubernetes cluster
[root@dce-10-7-177-7 ~]# kubectl get pod
NAME                     READY   STATUS      RESTARTS   AGE
dao-2048                 1/1     Running     2          17h
hello-1591154640-qk4ng   0/1     Completed   0          83s
hello-1591154700-5ftww   0/1     Completed   0          23s
myapp-pod                0/1     Init:0/2    0          73m
[root@dce-10-7-177-7 ~]# kubectl get job
NAME               COMPLETIONS   DURATION   AGE
hello-1591154640   1/1           69s        86s
hello-1591154700   1/1           18s        26s
[root@dce-10-7-177-7 ~]# kubectl get cronjob
NAME    SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
hello   */1 * * * *   False     0        32s             2m
[root@dce-10-7-177-7 ~]# kubectl get cronjob -o yaml
apiVersion: v1
items:
- apiVersion: batch/v1beta1
  kind: CronJob
  metadata:
    creationTimestamp: "2020-06-03T03:23:32Z"
    name: hello
    namespace: default
    resourceVersion: "1005216"
    selfLink: /apis/batch/v1beta1/namespaces/default/cronjobs/hello
    uid: a3ab0566-11ba-47d7-8e70-8918a55926f0
  spec:
    concurrencyPolicy: Allow
    failedJobsHistoryLimit: 1
    jobTemplate:
      metadata:
        creationTimestamp: null
      spec:
        template:
          metadata:
            creationTimestamp: null
          spec:
            containers:
            - args:
              - /bin/sh
              - -c
              - date;sleep $(shuf -i 5-10 -n 1);date;echo Hello from the Kubernetes
                cluster
              image: busybox
              imagePullPolicy: Always
              name: hello
              resources: {}
              terminationMessagePath: /dev/termination-log
              terminationMessagePolicy: File
            dnsPolicy: ClusterFirst
            restartPolicy: OnFailure
            schedulerName: default-scheduler
            securityContext: {}
            terminationGracePeriodSeconds: 30
    schedule: '*/1 * * * *'
    successfulJobsHistoryLimit: 3
    suspend: false
  status:
    lastScheduleTime: "2020-06-03T03:25:00Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

@kow3ns
Copy link
Member

kow3ns commented Jul 27, 2020

/assign @soltysh

@kow3ns
Copy link
Member

kow3ns commented Jul 27, 2020

/cc @MikeSpreitzer

@MikeSpreitzer
Copy link
Member

I briefly thought about the new API Priority and Fairness feature in the apiservers, but that will not directly do the trick because your concern is not with apiserver requests but rather running workload in pods.

A generalization of leader-election to directly enforce a concurrency limit would probably do the trick, but may be heavier weight than is needed here. I like the random delay idea.

@Iridias
Copy link

Iridias commented Oct 9, 2020

I would like to have the Jenkins-Syntax supported!
In Jenkins you can put an "H" instead of a number, e.g. in the minutes-field, to indicate that the task is performed every hour at an unspecified but invariant time for each task.

@pjastrzabek
Copy link

Randomizing execution would be awesome. +1 for 'H' support (Hashed) https://en.wikipedia.org/wiki/Cron#Non-standard_characters

@kohtala
Copy link

kohtala commented Jan 15, 2021

The Jenkins syntax can be found at https://www.jenkins.io/doc/book/pipeline/syntax/#cron-syntax

once in every two hours slot between 9 AM and 5 PM every weekday (perhaps at 10:38 AM, 12:38 PM, 2:38 PM, 4:38 PM)

H H(9-16)/2 * * 1-5

But maybe a more modern route is to go with options like for the systemd.timer using systemd.time format .

This sample from snapd.snap-repair.timer:

OnCalendar=*-*-* 5,11,17,23:00
RandomizedDelaySec=2h
AccuracySec=10min
OnStartupSec=15m

It does not have the concurrency limit mentioned earlier, but the AccuracySec can allow adjusting pod starts for example on load. With the status info on the Jobs previously run there would be potential for some smart scheduling if anyone finds that useful.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 1, 2021
@soltysh
Copy link
Contributor

soltysh commented Jun 2, 2021

/remove-lifecycle stale
/lifycycle frozen

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 2, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 31, 2021
@TimShilov
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 31, 2021
@Nuxij
Copy link

Nuxij commented Nov 3, 2021

Is this planned?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 1, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 3, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 18, 2023
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@szuecs
Copy link
Member

szuecs commented Aug 7, 2023

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Aug 7, 2023
@k8s-ci-robot
Copy link
Contributor

@szuecs: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@shoffmeister
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 26, 2023
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Sep 29, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2024
@thomaschaaf
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2024
@gilad-aperio
Copy link

Is this being worked on?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 8, 2024
@kragniz
Copy link
Member

kragniz commented Aug 1, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 1, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 30, 2024
@kragniz
Copy link
Member

kragniz commented Oct 30, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 30, 2024
@Exagone313
Copy link

We have a use-case where we run a periodic synchronization cronjob across projects, and as all these cronjobs start their pod at the same time, it increases database load significantly (as the underlying database is the same for all these projects). This load increase would be reduced if the cronjobs would not all start at the same time.

For such short-lived cronjobs, the suggested "workaround" of using an initContainer with a random sleep command would not do for Kubernetes offers where you pay with your pod usage and not the nodes (e.g. with GKE Autopilot): adding these sleep commands would increase the cost as the pods may live a bit longer (though it depends to the database load too in my case...).

@adk-swisstopo
Copy link

Another way to resolve this would be to implement the RANDOM_DELAY variable like cronie does.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2025
@kragniz
Copy link
Member

kragniz commented May 21, 2025

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/workload-api/cronjob kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps.
Projects
Status: Needs Triage
Development

No branches or pull requests

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy