Predictive ELF bitmaps
When the kernel executes a program, it must retrieve the code from disk, which it normally does by demand paging it in as required by the execution path. If the kernel could somehow know which pages would be needed, it could page them in more efficiently. Andi Kleen has posted an experimental set of patches that do just that.
Programs do not know about their layout on disk, nor is their path through the executable file optimized to reduce seeking, but with some information about which pages will be needed, the kernel can optimize the disk accesses. If one were to gather a list of the pages that get faulted in as a program runs, that information could be saved for future runs. It could then be turned into a bitmap indicating which of the pages should be prefetched.
Once you have such a bitmap, where to store it becomes a problem. Kleen's method uses a "hack" to the ELF format on disk, putting the bitmap at the end of the executable. This has a number of drawbacks: a seek to get the info, modifying the executable each time you train, and only allowing a single usage pattern system-wide. It does have one very nice attribute, though, the bitmap and executable stay in sync; if the executable changes, due to an upgrade for instance, the bitmap would get cleared in the process. Alternative bitmap storage locations—somewhere in users' home directories for example—do not have this property.
Andrew Morton questions whether this need be done in the kernel at all:
Ulrich Drepper does not want to see the ELF format abused in the fashion it was for this patch, Kleen doesn't either, but used it as an expedient. Drepper thinks the linker should be taught to emit a new header type which would store the bitmap. It would be near the beginning of the ELF file, eliminating the seek. A problem with that approach is that old binaries would not be able to take advantage of the technique; a re-linking would be required.
Then the question arises, how does that bitmap get initialized? Drepper suggests that systemtap be used:
Kleen's patch walks the page tables for a process when it is exiting, setting a bit in the bitmap if that page has been faulted in. Drepper sees this as suboptimal:
The problem is in finding the balance between just prefetching the entire
executable—which might be very wasteful—and prefetching the
subset of pages that are most commonly used. It will take some heuristics
to make that decision. As Drepper points out, recording the entire runtime
of a program "will result in all the pages of a
program to be marked (unless you have a lot of dead code in the binary
and it's all located together).
"
The place where Drepper sees a need for kernel support is in providing a bitmap interface to madvise() so that any holes in the pages that get prefetched do not get filled by the readahead mechanism. The current interface would require a call to madvise() for each contiguous region, which could be add up to a large number of calls. Both he and Morton favor the bulk of the work being done in user space.
Overall, there is lots more work to do before "predictive bitmaps" make their way into a Linux system—if they ever do. To start with, some benchmarking will have to be done to show that performance improves enough to consider making a change like this. David Miller expresses some pessimism about the approach:
Frankly, based upon my experiences then and what I know now, I think it's a lose to do this.
It is an interesting idea though, one that will likely crop up again if
this particular incarnation does not go anywhere. Since the biggest efficiency
gain is from reducing seeks, though, it may not be interesting long-term.
As Morton says, "solid-state disks are going to put a lot of code out
of a
job.
"
Index entries for this article | |
---|---|
Kernel | Prefetching |
Predictive ELF bitmaps
Posted Mar 27, 2008 4:35 UTC (Thu)
by flewellyn (subscriber, #5047)
[Link] (12 responses)
Posted Mar 27, 2008 4:35 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (12 responses)
"solid-state disks are going to put a lot of code out of a job." Won't it be nice, too? Eliminating one of the worst bottlenecks in computing architecture would be such a boon to performance.
Solid state disk: show me
Posted Mar 27, 2008 5:10 UTC (Thu)
by jreiser (subscriber, #11027)
[Link] (11 responses)
Solid-state disks are going to put a lot of code out of a job.
Posted Mar 27, 2008 5:10 UTC (Thu) by jreiser (subscriber, #11027) [Link] (11 responses)
I was promised delivery of a solid state disk for 1978, thirty years ago. I won't hold my breath this time, either.
Solid state disk: show me
Posted Mar 27, 2008 5:21 UTC (Thu)
by flewellyn (subscriber, #5047)
[Link] (10 responses)
Posted Mar 27, 2008 5:21 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (10 responses)
We actually have them now. Flash drives are ubiquitous, and a major system vendor (Apple) recently put out a line of machines (Macbook Air) featuring solid-state disks.
Solid state disk: show me
Posted Mar 27, 2008 6:24 UTC (Thu)
by jwb (guest, #15467)
[Link] (5 responses)
Posted Mar 27, 2008 6:24 UTC (Thu) by jwb (guest, #15467) [Link] (5 responses)
They're also not very fast, especially when writing. We're getting close, but we're not there yet. Relatedly, I recently noticed an annoying habit of web browsers on laptop computers. When the disk is stopped, it's normally much faster to fetch an item over HTTP than to read it from the cache. But popular web browsers insist on consulting the cache which, on my laptop, takes 1-2 seconds while the disk spins up. An interesting lesson in the relative costs of fetching data.
Solid state disk: show me
Posted Mar 27, 2008 6:36 UTC (Thu)
by flewellyn (subscriber, #5047)
[Link]
Posted Mar 27, 2008 6:36 UTC (Thu) by flewellyn (subscriber, #5047) [Link]
Yes, that's probably because the browser caching behavior is still based on older assumptions about network and disk speeds. It made sense back when dialup was the norm. In this era of ubiquitous broadband, though...
Solid state disk: show me
Posted Mar 27, 2008 9:04 UTC (Thu)
by pointwood (guest, #2814)
[Link] (1 responses)
Posted Mar 27, 2008 9:04 UTC (Thu) by pointwood (guest, #2814) [Link] (1 responses)
"They're also not very fast, especially when writing. We're getting close, but we're not there yet." Yes, we are getting close. From what I read, we can look forward to 100MB/s writes in the very near future. Furthermore, they scale really well: http://www.nextlevelhardware.com/storage/battleship/
Solid state disk: show me
Posted Apr 18, 2008 18:31 UTC (Fri)
by ranmachan (guest, #21283)
[Link]
Posted Apr 18, 2008 18:31 UTC (Fri) by ranmachan (guest, #21283) [Link]
"Yes, we are getting close. From what I read, we can look forward to 100MB/s writes in the very near future." But is that for continuous writes or random writes? The latter case matters more. I replaced the hard disk in my notebook with a compact flash, which can do 20MB/s continuous writes (which - while not exactly fast - would be more than enough performance), but slows to a crawl (in the KB/s range) on random writes, especially when cache flushes are involved (FS metadata updates, fsync&co take ages).
Solid state disk: show me
Posted Mar 27, 2008 11:37 UTC (Thu)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Mar 27, 2008 11:37 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)
You can avoid the not-fast problem by rsyncing /usr/bin and /usr/lib (and parts of /usr/share?) into battery-backed/flash RAM, so most of the time very few writes would be needed. (Obviously rsync for this purpose is a kludge; some sort of replicated write at the kernel level seems preferable. Perhaps dm mirroring could do this, but it'd be forced to replicate all of /usr into flash, whether we care about it or not...)
Ramback
Posted Apr 1, 2008 18:36 UTC (Tue)
by dmarti (subscriber, #11625)
[Link]
Or do something like the "Ramback" patch from Daniel Phillips (LWN article): "The core idea behind Ramback is that all of that memory is turned into a ramdisk, but with a persistent device attached to it. In normal conditions, all application I/O involves only the ramdisk, and is, thus, quite fast."
Posted Apr 1, 2008 18:36 UTC (Tue) by dmarti (subscriber, #11625) [Link]
Solid state disk: show me
Posted Mar 29, 2008 0:22 UTC (Sat)
by giraffedata (guest, #1954)
[Link] (3 responses)
Posted Mar 29, 2008 0:22 UTC (Sat) by giraffedata (guest, #1954) [Link] (3 responses)
And as in 1978, they are more expensive than moving-head disks. Even after you count the cost of the slowness and the engineering to avoid the seeks.
That's on average, of course. Solid state storage gets used more and more every year as more applications fall out of the disk-costs-less category.
I think one of the great pastimes in the computer industry these days is guessing when more than half of storage will be solid state. While non-professionals have been saying "a couple of years" for about 20 years, I'm now beginning to hear 5 years, for new storage, from credible professionals.
I know some day the use of moving parts in a computer will be a subject of ridicule. It already seems perverse.
Solid state disk: show me
Posted Mar 29, 2008 1:11 UTC (Sat)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Mar 29, 2008 1:11 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)
Clarke had it taking a million years in _The City and the Stars_. It seems like we might manage the no-moving-parts dictum, in computers at least, a good bit sooner than that. (I definitely agree with that dictum: `no machine shall contain any moving parts.' As a recipe for long-lived hardware, it has few equals.)
Solid state disk: show me
Posted Mar 29, 2008 2:00 UTC (Sat)
by zlynx (guest, #2285)
[Link] (1 responses)
Posted Mar 29, 2008 2:00 UTC (Sat) by zlynx (guest, #2285) [Link] (1 responses)
Even with zero moving parts, thermal migration of the material on the silicon will limit the life of a computer system.
Solid state disk: show me
Posted Mar 29, 2008 13:21 UTC (Sat)
by nix (subscriber, #2304)
[Link]
Posted Mar 29, 2008 13:21 UTC (Sat) by nix (subscriber, #2304) [Link]
Yeah, but it's *better*. ;} (now the fix-every-atom-in-place, fix-reality-to-a-virtual-backdrop tech they had in Diaspar, *that* was advanced. And probably physically impossible, but it feels like it should have worked. Ah, Clarke wrote some good stuff in his day.)
Predictive ELF bitmaps
Posted Mar 27, 2008 4:50 UTC (Thu)
by ikm (guest, #493)
[Link]
Posted Mar 27, 2008 4:50 UTC (Thu) by ikm (guest, #493) [Link]
Any benchmarks? Does all this actually make a difference?
Predictive ELF bitmaps for old ELF files
Posted Mar 27, 2008 5:03 UTC (Thu)
by jreiser (subscriber, #11027)
[Link]
the linker should be taught to emit a new header type which would store the bitmap. It would be near the beginning of the ELF file, eliminating the seek. A problem with that approach is that old binaries would not be able to take advantage of the technique; a re-linking would be required.
Posted Mar 27, 2008 5:03 UTC (Thu) by jreiser (subscriber, #11027) [Link]
PT_GNU_STACK currently uses only .p_flags, and the default linker script has inserted PT_GNU_STACK for a couple years. So re-linking could be avoided in nearly all cases. Just set .p_offset to ALIGN_UP(old_file_size, 4), etc.
Predictive ELF bitmaps
Posted Mar 27, 2008 16:55 UTC (Thu)
by mezcalero (subscriber, #45103)
[Link] (1 responses)
Posted Mar 27, 2008 16:55 UTC (Thu) by mezcalero (subscriber, #45103) [Link] (1 responses)
Why not attach those bitmaps to the files in an extended attribute?
Predictive ELF bitmaps
Posted Mar 27, 2008 17:33 UTC (Thu)
by jake (editor, #205)
[Link]
Posted Mar 27, 2008 17:33 UTC (Thu) by jake (editor, #205) [Link]
> Why not attach those bitmaps to the files in an extended attribute? This was discussed as part of the thread and I meant to mention it in the article. The basic problem is that most filesystems limit xattrs to 4K (total for keys and data) and they are not stored with the inode so a seek must be done to get to them. jake
Predictive ELF bitmaps
Posted Mar 29, 2008 0:29 UTC (Sat)
by giraffedata (guest, #1954)
[Link] (1 responses)
Posted Mar 29, 2008 0:29 UTC (Sat) by giraffedata (guest, #1954) [Link] (1 responses)
I'd rather see a bigger solution the problem. The text of a program isn't the only thing that needs to get paged in. Shared libraries and data files are in there, and programs often invoke other programs.
And in different situations, vastly different parts of the program get paged in.
I'd like to be able to define a "procedure" by saying, "tell me all the file pages that get paged in between Time 0 and Time 1, as I start a program or maybe a few in that interval. Then I save that trace somewhere -- not necessarily bound to a particular executable -- and the next time I perform a similar procedure, I explicitly ask to have those pages paged in before I start. Maybe with a script that has the name of the trace file in it.
Bootup is an obvious place to exploit something like this.
Predictive ELF bitmaps
Posted Apr 9, 2008 8:09 UTC (Wed)
by hensema (guest, #980)
[Link]
Posted Apr 9, 2008 8:09 UTC (Wed) by hensema (guest, #980) [Link]
That's exactly what Windows Vista does. Vista preditively pages in data that's likely to be read in the near future. Vista uses a self-learning algorithm for this.
Predictive ELF bitmaps
Posted Apr 5, 2008 18:07 UTC (Sat)
by anton (subscriber, #25547)
[Link]
Stefan Strauß-Haslinglehner (a student of mine) did his master's thesis (in German) on prefetching disk blocks on program startup based on a training run. This covered any blocks (e.g., shared libraries and data files), not just those in the binary.
Posted Apr 5, 2008 18:07 UTC (Sat) by anton (subscriber, #25547) [Link]