Hierarchical group I/O scheduling
The "completely fair queueing" (CFQ) I/O scheduler tries to divide the available bandwidth on any given device fairly between the processes which are contending for that device. "Bandwidth" is measured not in the number of bytes transferred, but the amount of time that each process gets to submit requests to the queue; in this way, the code tries to penalize processes which create seek-heavy I/O patterns. (There is also a mode based solely on the number of I/O operations submitted, but your editor suspects it sees relatively little use). The CFQ scheduler also supports group scheduling, but in an incomplete way.
Imagine the group hierarchy shown on the right; here we have three control groups (plus the default root group), and four processes running within those groups. If every process were contending fully for the available I/O bandwidth, and they all had the same I/O priority, one would expect that bandwidth to be split equally between P0, Group1, and Group2; thus P0 should get twice as much I/O bandwidth as either P1 or P3. If more processes were to be added to the root, they should be able to take I/O bandwidth at the expense of the processes in the other control groups. Similarly, the creation of new control groups underneath Group1 should not affect anybody outside of that branch of the hierarchy. In current kernels, though, that is not how things work.
With the current implementation of CFQ group scheduling, the above
hierarchy is transformed into something that looks like this:
The CFQ group scheduler currently treats all groups - including the root group - as being equal, at the same level in the hierarchy. Every group is a top-level group. This level of grouping will be adequate for a number of situations, but there will be other users who want the full hierarchical model. That is why control groups were made to be hierarchical in the first place, after all.
The hierarchical CFQ group scheduling patch set from Gui Jianfeng aims to make that feature available. These patches introduce a new cfq_entity structure which is used for the scheduling of both processes and groups; it is clearly modeled after the sched_entity structure used in the CPU scheduling code. With this in place, the I/O scheduler can just give bandwidth to the top-level cfq_entity which has run up the least "vdisktime" so far; if that entity happens to be a group, the scheduling code drops down a level and repeats the process. Sooner or later, the entity which is scheduled for I/O will be an actual process, and the scheduler can start dispatching I/O requests.
This patch set is on its fourth revision; the previous iterations have led to significant changes. It appears that there are a few things to fix up still, but this work seems to be getting closer to being ready.
One thing is worth bearing in mind: there are two I/O bandwidth controllers in contemporary Linux kernels: the proportional bandwidth controller (built into the CFQ scheduler) and the throttling controller built into the block layer. The group scheduling changes only apply to the proportional bandwidth controller. Arguably there is less need for full group scheduling with the throttling controller, which puts absolute caps on the bandwidth available to specific processes.
Controlling I/O bandwidth has a lot of applications; providing some
isolation between customers on a shared hosting service is an obvious
example. But this feature may yet prove to have value on the desktop as
well; many interactivity problems come down to contention for I/O
bandwidth. Anybody who has tried to start an office suite while
simultaneously copying a video image on the same drive understands how bad
it can be. If the group I/O scheduling feature can be made to "just work"
like the group CPU scheduling, we may have made another step toward a truly
responsive Linux desktop.
Index entries for this article | |
---|---|
Kernel | Block layer/I/O scheduling |
Posted Feb 27, 2011 9:05 UTC (Sun)
by blujay (guest, #39961)
[Link]
Hierarchical group I/O scheduling