KS2010: Deadline scheduling
Deadline scheduling does away with the classic notion of process priorities. Instead, each process requests scheduling of a maximum amount of CPU time within a specific deadline. The scheduler can then either arrange things to ensure that the deadline will be met or reject the request if the CPU would be overcommitted. Dario chose the case of a video player application as his main example for how this can be useful; with a deadline for each frame, the player can produce skip-free video even in the presence of significant contention for the CPU. Deadline scheduling can thus make a number of problems go away.
Linus wasn't convinced. He has never been entirely impressed by the realtime work (he pronounced that "realtime is bullshit" in the session) and does not see deadline scheduling as the right answer to this problem. He is more optimistic about group scheduling, and especially about the recently-posted per-tty task groups patch. Peter Zijlstra defended deadline scheduling, citing its ability to reject tasks which would overcommit the CPU, but Linus wasn't going for it. He says that multimedia people don't want the video player to be rejected if the scheduler can't guarantee the deadlines; they want a best-effort attempt to play the video.
Beyond that, he says, the problem with tasks like video playback is almost never the CPU scheduler. Video skips tend to be caused by I/O problems; the latencies appear in the I/O scheduler or in the virtual memory subsystem somewhere. Messing with the CPU scheduler is the wrong approach.
Peter said that, regardless, deadline scheduling is a feature that a number of people want. Should they press on with the work, or is it hopeless? Linus grumbled some more about how too much time goes into CPU scheduling and not enough into the other parts of the problem. But, he said, if the scheduler people want deadline scheduling in the end, he'll pull the patch.
Linus complained that, in practice, it's impossible to set the deadlines correctly. In the end, application developers have to request the absolute worst-case execution time, even though the application will almost never need that much CPU time. That's a problem because users often want to overcommit the resources - including the processor - on their systems. In practice, it almost always works because the worst-case CPU time is not actually needed.
Peter replied that deadline scheduling is safer because it can be made available to unprivileged applications. The maximum amount of CPU time can be bounded, as can the worst-case execution time. The scheduler can deny requests which would overcommit the amount of CPU time available. Linus replied that, on a server with multiple users, that approach still is not safe; there is no way to keep users from interfering with each other. The truth of the matter is that multi-user servers are probably not a place where using deadline scheduling will make sense.
In general, Linus said, deadline scheduling make him nervous. He has seen attempts to fix things with scheduler tweaks before; he's afraid that they can end up making things worse. The scenario of using deadline scheduling on desktop systems is, he says, not realistic; the real problems are elsewhere. Deadline scheduling will require system tuning, and that just doesn't work on desktop systems. There is no way to tune for everybody, and desktop users tend to be uninterested in and incapable of tuning their systems themselves.
Ted Ts'o said that video playback is an easy example to illustrate deadline scheduling, but, perhaps, it's not the best use case. This theme was to return a couple of times in the session; video was almost certainly not the best example to choose for this particular crowd. He asked for an alternative use case - something which cannot be fixed with changes elsewhere in the system. Thomas Gleixner said that OS X is using deadline scheduling for desktop tasks and it works great. But the desktop example was used because it is easy to understand; there are also a lot of industrial applications which benefit from a non-priority-based scheduling mechanism. Deadline scheduling, he says, is all about describing the work that must be done instead of tweaking priorities.
Tim Bird said that, in his job, he's spent 18 months tweaking embedded systems for proper performance. CPU scheduling, he says, is never the real issue; the hard problems are elsewhere. So, he said, they are unlikely to look at deadline scheduling in the next ten years.
Ted closed the session with the note that it's important to better identify the users for deadline scheduling. Without that, it will be hard to know whether the associated ABI is right.
Index entries for this article | |
---|---|
Kernel | Realtime/Deadline scheduling |
Kernel | Scheduler/Deadline scheduling |
Realtime is not BS
Posted Nov 2, 2010 16:21 UTC (Tue)
by daniel (guest, #3181)
[Link] (1 responses)
[Linus] pronounced that "realtime is bullshit" in the session
Posted Nov 2, 2010 16:21 UTC (Tue) by daniel (guest, #3181) [Link] (1 responses)
For that matter, SMP is bullshit and so is virtual memory. The common element shared by all three of these varieties of bullshit is that each is essential to some important things that we do with computers. If Linus means "bullshit" as in not useful or important then he is just plain wrong, as he is from time to time (see the long slow process of cluing him in to the value of kernel preemption). Perhaps Linus has never had to deal with the reality of programming a multi ton intelligent machine that can kill you in an instant, or cause unbelievable amounts of monetary damage if certain realtime deadlines are ever missed. I have. The experience is purifying. There is absolutely no substitute for being sure that deadlines will be met, if it is important to meet them. There is no question about that, except perhaps in Linus's mind. The only valid question is how best to achieve it.
Until Linux has a usable CONFIG_RT compile option, just as it has CONFIG_SMP, then it will not be useful for a large and important class of industrial and scientific applications. Believe it or not, this void is now filled largely by Windows or Dos systems, often with hardware offload to some black box with largely unknown behavior characteristics. Do these systems suck? Yes they do. Do they kill people? Yes they can, and most probably have. Can we do something about it? Yes we can, but it's hard if our fearless leader is busy pulling in a different direction than the people on the ground who actually have a clue about the importance of the application area.
Realtime is not BS
Posted Nov 2, 2010 19:38 UTC (Tue)
by caitlinbestler (guest, #32532)
[Link]
Posted Nov 2, 2010 19:38 UTC (Tue) by caitlinbestler (guest, #32532) [Link]
But there is still a valid discussion on whether it can be achieved in the
same set of kernel code as more general scheduling.
And Linus' comments that addressing CPU resources alone is a very on target.
A true deadline scheduler needs to schedule CPU, Memory bandwidth and IO
bandwidth. Whether that can be achieved side-by-side with a kernel that
defaults to "best effort" is a major question.
KS2010: Deadline scheduling
Posted Nov 2, 2010 17:04 UTC (Tue)
by daniel (guest, #3181)
[Link] (3 responses)
Linus complained that, in practice, it's impossible to set the deadlines correctly. In the end, application developers have to request the absolute worst-case execution time, even though the application will almost never need that much CPU time.
Posted Nov 2, 2010 17:04 UTC (Tue) by daniel (guest, #3181) [Link] (3 responses)
Where there's a will there's a way. Designate some subset of CPUs as realtime-only, then the worst that can happen is a task that could otherwise overcommit a CPU cannot be moved to that CPU.
KS2010: Deadline scheduling
Posted Nov 3, 2010 19:30 UTC (Wed)
by tcucinotta (guest, #69261)
[Link] (2 responses)
Posted Nov 3, 2010 19:30 UTC (Wed) by tcucinotta (guest, #69261) [Link] (2 responses)
If actually needed, given a certain allocation, one can compute what is the expected probability of meeting the deadlines by the application, or generally speaking one can embrace stochastic analysis techniques in order to assess what is the expected performance (on a probabilistic basis) of an application.
Also, one can use adaptive scheduling techniques in order to dynamically change the allocation of computing power as due to the fluctuations (foreseen and/or observed) in the application workload (e.g., as it happens in multimedia).
KS2010: Deadline scheduling
Posted Nov 8, 2010 1:47 UTC (Mon)
by vonbrand (guest, #4458)
[Link] (1 responses)
Posted Nov 8, 2010 1:47 UTC (Mon) by vonbrand (guest, #4458) [Link] (1 responses)
That might work for soft real time (whatever that means), for the "multiton machine that could kill you in an instant" application mentioned above this is very far from acceptable.
"Real time" is a system property (kernel with all its areas, computer on which it runs, and application code). Trying to fix that scheduling the CPU alone is certainly bullshit.
KS2010: Deadline scheduling
Posted Nov 8, 2010 20:40 UTC (Mon)
by tcucinotta (guest, #69261)
[Link]
I agree absolutely with all the statements. However, I also think we should undertake an incremental approach, otherwise we end-up with nothing. For this round, we're discussing about how to improve CPU scheduling. This is only one of the essential bricks that are needed in order to support real-time applications and increase predictability of software. Of course, a comprehensive approach to the problem needs to consider other resources as well, like networking and disk transfers, as well as the IRQ subsystem architecture. Still, improving CPU scheduling is just a little step that is strongly needed, has a relevant impact on the problem, and it can be done without subverting the way scheduling and resource management is handled inside the kernel.
Posted Nov 8, 2010 20:40 UTC (Mon) by tcucinotta (guest, #69261) [Link]
KS2010: Deadline scheduling
Posted Nov 3, 2010 19:18 UTC (Wed)
by tcucinotta (guest, #69261)
[Link]
Posted Nov 3, 2010 19:18 UTC (Wed) by tcucinotta (guest, #69261) [Link]
Applications may always have a fall-back mechanism: try to go (really) real-time, if that's not possible, then go best-effort, possibly with some UI trick for letting the user know he's running too many RT applications. That already happens with RT priorities: applications (e.g., jack) try to acquire RT prio, if that fails, they log a warning/error and they run anyway without it.
Or, you can have a more elaborate user-space middle-ware that allows to manage overload situations, e.g., have a look at the FRESCOR Application Level Contracts (ALC): http://www.frescor.org/index.php?page=publications.
Hard and soft real time
Posted Nov 4, 2010 9:36 UTC (Thu)
by csimmonds (subscriber, #3130)
[Link] (7 responses)
Posted Nov 4, 2010 9:36 UTC (Thu) by csimmonds (subscriber, #3130) [Link] (7 responses)
With hard real time, you must not miss the deadline ever. An example of a system I have worked on recently: printing sell-by dates on bottles going along a production line. If one bottle is not printed the whole production line has to be halted, the problem fixed and then restarted, which is very expensive. This is hard real time.
You can achieve hard real time behaviour with a priority based scheduler, but it is not easy. Much better to use a deadline scheduler which gives you the guarantees that you want. So, I vote very energetically for inclusion of the deadline scheduler. It may not be useful on the desktop or server, but that is a small part of what Linux does.
Chris.
Hard and soft real time
Posted Nov 4, 2010 13:34 UTC (Thu)
by zmower (subscriber, #3005)
[Link] (2 responses)
Posted Nov 4, 2010 13:34 UTC (Thu) by zmower (subscriber, #3005) [Link] (2 responses)
So would you like to change your statement to embedded hard realtime on unicore processors is a very small part of what linux should do?
Hard and soft real time
Posted Nov 4, 2010 15:36 UTC (Thu)
by Shewmaker (guest, #1126)
[Link] (1 responses)
Posted Nov 4, 2010 15:36 UTC (Thu) by Shewmaker (guest, #1126) [Link] (1 responses)
Here's a well written paper from this year.
DP-FAIR: A Simple Model for Understanding Optimal Multiprocessor Scheduling
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5562...
A short overview presentation of that paper.
https://systems.soe.ucsc.edu/sites/default/files/webform/...
This same research group is also working on solving the other parts of the problem. Linus is correct, we can't solve these problems with just CPU scheduling.
Efficient Guaranteed Disk Request Scheduling with Fahrrad (2008)
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1...
Work on memory and network resources is in earlier stages, but the result will hopefully be a coherent general theory of real-time performance management.
Hard and soft real time
Posted Nov 4, 2010 20:15 UTC (Thu)
by zmower (subscriber, #3005)
[Link]
Posted Nov 4, 2010 20:15 UTC (Thu) by zmower (subscriber, #3005) [Link]
As for linux, even if you have optimal scheduling for all the subsystems, the combined effect is still chaotic.
Ada conference videos etc here http://www.disca.upv.es/jorge/ae2010/outcome.html
Hard and soft real time
Posted Nov 4, 2010 19:20 UTC (Thu)
by dlang (guest, #313)
[Link] (3 responses)
Posted Nov 4, 2010 19:20 UTC (Thu) by dlang (guest, #313) [Link] (3 responses)
you could also have a hard-real-time task that has a 5 second deadline to get the task done. non-real-time linux can accomplish that today (except in the face of failing hardware)
even things with fairly short deadlines can be made statistically reliable in many cases. It's only when you start getting into extremely short deadlines or systems that also handle non-critical processes that compete for resources that you start having real problems.
"hard" real-time
Posted Nov 4, 2010 20:06 UTC (Thu)
by dmarti (subscriber, #11625)
[Link] (2 responses)
"Hard" real-time is where if you miss a deadline, the robot chops off the user's head, or the user's plane crashes.
Posted Nov 4, 2010 20:06 UTC (Thu) by dmarti (subscriber, #11625) [Link] (2 responses)
"hard" real-time
Posted Nov 4, 2010 20:37 UTC (Thu)
by dlang (guest, #313)
[Link] (1 responses)
Posted Nov 4, 2010 20:37 UTC (Thu) by dlang (guest, #313) [Link] (1 responses)
there are _very_ few situations where a single missed deadline proves fatal to the system (or the user :-)
in engineering, the assumption is that you may have unexpected loads, or you may have sub-par materials, so every design includes a safety margin, which means that they make it statistically unlikely that too many things will go wrong and the item will fail.
Even on the Space Shuttle, something as critical as the heat resistant tiles are not individually critical, it's expected that some number of them will be damaged or fall off on any flight. When too many of them get damages, you have the Columbia disintegrating, but that doesn't translate into zero tolerance of tile failure
"hard" real-time
Posted Nov 4, 2010 22:26 UTC (Thu)
by csimmonds (subscriber, #3130)
[Link]
Posted Nov 4, 2010 22:26 UTC (Thu) by csimmonds (subscriber, #3130) [Link]
Chris.