Reconsidering the scheduler's wake_wide() heuristic
Reconsidering the scheduler's wake_wide() heuristic
Posted Jul 30, 2017 22:51 UTC (Sun) by glenn (subscriber, #102223)In reply to: Reconsidering the scheduler's wake_wide() heuristic by nix
Parent article: Reconsidering the scheduler's wake_wide() heuristic
I researched (https://tinyurl.com/y7lxzcy4) enhancing a deadline-based scheduler with cache-topology-aware CPU selection, and I studied the potential benefits for workloads where producer/consumer processes can be described as a directed graph (you see workloads like this in video and computer vision pipelines). I hesitate to generalize too much from my scheduler/experiments, but I think some of the broader findings can be applied to Linux’s general scheduler.
To my surprise, I discovered something obvious that I should have realized earlier in my research: (1) For producers/consumers that share little data, cache-locality is not very important—the overhead due to cache affinity loss is negligible; and (2) for producers/consumers that share a LOT of data, cache-locality is not very important—most of the shared data are self-evicted (or evicted by unrelated work executing concurrently) from the cache anyhow. In cases (1) and (2), getting scheduled on an available CPU is more important. Cache-aware scheduling is useful only for producers/consumers that share a moderate amount of data (“goldilocks workloads”). Moreover, you must strive to schedule a consumer soon after its producer(s) produce, or the shared data may be evicted from the cache by concurrently scheduled unrelated workload.