Linux Scheduler
Linux Scheduler
Linux Scheduler
sched.c
schedule()
scheduler_tick()
try_to_wake_up()
hooks
RT
...
CFS
CPU 0
CPU 1
CPU 2
CPU 3
Volker Seeker
06/12/2013
4.
5.
6.
Task Classification
Scheduler Skeleton
Completely Fair
Scheduler (CFS)
Real Time
Scheduling (RT)
Load Balancing CFS
Load Balancing RT
sched.c
schedule()
scheduler_tick()
try_to_wake_up()
hooks
RT
...
CFS
CPU 0
CPU 1
CPU 2
CPU 3
Volker Seeker
06/12/2013
1. Task Classification
Task Types
Real Time vs Normal/Other
Efficient vs Responsive
Server vs Desktop vs HPC
Volker Seeker
06/12/2013
1. Task Classification
Scheduling Classes
include/linux/sched.h
kernel/sched_rt.c
struct sched_class
rt_sched_class
.next = &fair_sched_class
(*enqueue_task)
(*dequeue_task)
(*check_preempt_curr)
(*pick_next_task)
(*select_task_rq)
(*pick_next_task)
...
.enqueue_task = &enqueue_task_rt
.dequeue_task = &dequeue_task_rt
...
kernel/sched_fair.c
fair_sched_class
.next = &idle_sched_class
.enqueue_task = &enqueue_task_fair
.dequeue_task = &dequeue_task_fair
...
Volker Seeker
06/12/2013
1. Task Classification
Scheduling Class Priorities
stop
rt
per CPU
stop task
fair
real time
tasks
idle
normal
tasks
per CPU
idle task
Highest
Lowest
0 - 99
100 - 139
Priority
Volker Seeker
06/12/2013
2. Scheduler Skeleton
Scheduler Entry Point
Per CPU runqueue
with sub runqueues
for CFS and RT
Remove or put
back to runqueue
different task
selected
than before?
kernel/sched.c
schedule()
Volker Seeker
06/12/2013
2. Scheduler Skeleton
Calling the Scheduler
1.
2.
3.
Timer Interrupt
Currently running task goes to sleep
Sleeping task wakes up
Volker Seeker
06/12/2013
2. Scheduler Skeleton
Calling the Scheduler - Timer
kernel/sched.c
scheduler_tick() {
...
kernel/sched_X.c
X_sched_class
task_tick_X() {
update tasks
scheduling entity
sched_class->task_tick()
...
}
is it
another
tasks turn
yet?
CPU 0
CPU 2
CPU 1
CPU 3
Yes
Invoke
scheduling
}
Volker Seeker
06/12/2013
2. Scheduler Skeleton
Calling the Scheduler - Sleep
Volker Seeker
06/12/2013
2. Scheduler Skeleton
Calling the Scheduler Wake up
kernel/sched.c
kernel/sched_X.c
try_to_wake_up() {
...
sched_class->enqueue_task()
X_sched_class
enqueue_task_X() {
put task into
runqueue
sched_class->check_preempt_curr()
task.state = TASK_RUNNING
...
}
}
check_preempt_curr_X() {
Awoken task
higher prio
than current
task?
CPU 0
CPU 1
Yes
CPU 2
CPU 3
Invoke
scheduling
Volker Seeker
06/12/2013
06/12/2013
Priorities
Time elapses
slower for
higher priorities
Same Priority
10ms
5ms
10ms
5ms
5ms
5ms
1ms
1ms
1ms
How does it work out for I/O bound and CPU bound Tasks?
Volker Seeker
06/12/2013
vruntime
used as key in
the RB-Tree
27
19
gravest
need for
CPU
34
25
NIL
NIL
31
NIL
NIL
65
NIL
NIL
49
NIL
NIL
98
NIL
NIL
NIL
virtual runtime
Volker Seeker
06/12/2013
User 1
T0
T1
User 2
T2
T3
User 1
T0
T1
User 2
T2
G1 50%
25%
25%
25%
25%
16.7%
T3
G2 50%
16.7%
CPU
16.7%
CPU
06/12/2013
50%
4. Soft-Real-Time Scheduling
Scheduling Modes
Scheduling of tasks with strict timing requirements.
RT scheduling is reliable but kernel does not guarantee that deadlines will be met.
SCHED_FIFO
no time slices
T1
T2
until termination
or yielding CPU
T3
SCHED_RR
T1
T2
T3
Volker Seeker
06/12/2013
4. Soft-Real-Time Scheduling
Runqueue Priority Queues
Priority Bitmap
Allows finding the
highest priority
task in O(1)
T1
RR
7
T0
RR
9
T4
RR
9
T3
FIFO
12
99
Priority Queues
One task queue
for each priority
T2
FIFO
7
99
Volker Seeker
06/12/2013
T5
RR
9
Domain Specific
Balancing Policy
how often to do
balancing
how far to move
tasks
how long before
cache cools down
...
Scheduling Domains
Handle the topology variety of
modern processor.
Volker Seeker
06/12/2013
Active Balancing
regularly by scheduler_tick()
pulls tasks over from busiest
group per domain
Idle Balancing
as soon as CPU goes idle in schedule()
checks if average idle time is larger than
migration cost
pulling tasks like active balancing
Volker Seeker
06/12/2013
kernel/sched.c
fair_sched_class
select_task_rq(flag) {
...
sched_class->select_task_rq(flag)
...
SD_BALANCE_EXEC
select_task_rq_fair(flag) {
...
}
Returns optimal
CPU to put the
task on
SD_BALANCE_FORK
used in wake_up_new_task() upon a fork command
SD_BALANCE_WAKE
used in try_to_wake_up() upon waking up a task that already ran
Volker Seeker
06/12/2013
6. Load Balancing RT
Root Domains and CPU Pri
RT load balancing aims to make sure that the N highest priority tasks on the
system are running at all time where N is the number of CPUs.
CPU Pri
struct root_domain
struct cpupri
overload mask
CPU 0
CPU 1
CPU 2
CPU 3
Root Domain
scope for RT scheduling decisions
overall overload and priority state
Volker Seeker
06/12/2013
6. Load Balancing RT
Push
A Low Priority Task is Pushed to Another CPU If
post_schedule()
lower prio task wakes up on CPU with higher prio task running
higher prio task wakes up on CPU and preempts lower prio task
T2 needs to
be pushed
away
CPU 4
94
T2
93
CPU 0
84
CPU 2
87
CPU 7
84
Volker Seeker
06/12/2013
6. Load Balancing RT
Pull
A High Priority Task is Pulled from Another CPU If
pre_schedule()
priority of task to be scheduled would be lower than previously ones
T0
89
CPU 0
85
prev
CPU 1
90
T1
85
next
T2
90
T3
87
Pulled over
Volker Seeker
06/12/2013
T4
83
try_to_wake_up()
hooks
RT
...
CFS
CPU 0
CPU 1
CPU 2
CPU 3
Volker Seeker
06/12/2013