Predictive ELF bitmaps

By Jake Edge
March 26, 2008

When the kernel executes a program, it must retrieve the code from disk, which it normally does by demand paging it in as required by the execution path. If the kernel could somehow know which pages would be needed, it could page them in more efficiently. Andi Kleen has posted an experimental set of patches that do just that.

Programs do not know about their layout on disk, nor is their path through the executable file optimized to reduce seeking, but with some information about which pages will be needed, the kernel can optimize the disk accesses. If one were to gather a list of the pages that get faulted in as a program runs, that information could be saved for future runs. It could then be turned into a bitmap indicating which of the pages should be prefetched.

Once you have such a bitmap, where to store it becomes a problem. Kleen's method uses a "hack" to the ELF format on disk, putting the bitmap at the end of the executable. This has a number of drawbacks: a seek to get the info, modifying the executable each time you train, and only allowing a single usage pattern system-wide. It does have one very nice attribute, though, the bitmap and executable stay in sync; if the executable changes, due to an upgrade for instance, the bitmap would get cleared in the process. Alternative bitmap storage locations—somewhere in users' home directories for example—do not have this property.

Andrew Morton questions whether this need be done in the kernel at all:

Can't this all be done in userspace? Hook into exit() with an LD_PRELOAD, use /proc/self/maps and the new pagemap code to work out which pages of which files were faulted in, write that info into the elf file (or a separate per-executable shadow file), then use that info the next time the app is executed, either with an LD_PRELOAD or just a wrapper.

Ulrich Drepper does not want to see the ELF format abused in the fashion it was for this patch, Kleen doesn't either, but used it as an expedient. Drepper thinks the linker should be taught to emit a new header type which would store the bitmap. It would be near the beginning of the ELF file, eliminating the seek. A problem with that approach is that old binaries would not be able to take advantage of the technique; a re-linking would be required.

Then the question arises, how does that bitmap get initialized? Drepper suggests that systemtap be used:

To fill in the bitmaps one can have separate a separate tool which is explicitly asked to update the bitmap data. To collect the page fault data one could use systemtap. It's easy enough to write a script which monitors the minor page faults for each binary and writes the data into a file. The binary update tool and can use the information from that file to generate the bitmap.

Kleen's patch walks the page tables for a process when it is exiting, setting a bit in the bitmap if that page has been faulted in. Drepper sees this as suboptimal:

Over many uses of a program all kinds of pages will be needed. Far more than in most cases. The prefetching should really only cover the commonly used code paths in the program. If you pull in everything, this will have advantages if you have that much page cache to spare. In that case just prefetching the entire file is even easier. No, such an improved method has to be more selective.

The problem is in finding the balance between just prefetching the entire executable—which might be very wasteful—and prefetching the subset of pages that are most commonly used. It will take some heuristics to make that decision. As Drepper points out, recording the entire runtime of a program "will result in all the pages of a program to be marked (unless you have a lot of dead code in the binary and it's all located together)."

The place where Drepper sees a need for kernel support is in providing a bitmap interface to madvise() so that any holes in the pages that get prefetched do not get filled by the readahead mechanism. The current interface would require a call to madvise() for each contiguous region, which could be add up to a large number of calls. Both he and Morton favor the bulk of the work being done in user space.

Overall, there is lots more work to do before "predictive bitmaps" make their way into a Linux system—if they ever do. To start with, some benchmarking will have to be done to show that performance improves enough to consider making a change like this. David Miller expresses some pessimism about the approach:

I wrote such a patch ages ago as well.

Frankly, based upon my experiences then and what I know now, I think it's a lose to do this.

It is an interesting idea though, one that will likely crop up again if this particular incarnation does not go anywhere. Since the biggest efficiency gain is from reducing seeks, though, it may not be interesting long-term. As Morton says, "solid-state disks are going to put a lot of code out of a job."

Index entries for this article
Kernel	Prefetching

Predictive ELF bitmaps

Posted Mar 27, 2008 4:35 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (12 responses)

"solid-state disks are going to put a lot of code out of a job."

Won't it be nice, too?  Eliminating one of the worst bottlenecks in computing architecture
would be such a boon to performance.

Solid state disk: show me

Posted Mar 27, 2008 5:10 UTC (Thu) by jreiser (subscriber, #11027) [Link] (11 responses)

Solid-state disks are going to put a lot of code out of a job.

I was promised delivery of a solid state disk for 1978, thirty years ago. I won't hold my breath this time, either.

Solid state disk: show me

Posted Mar 27, 2008 5:21 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (10 responses)

We actually have them now.  Flash drives are ubiquitous, and a major system vendor (Apple)
recently put out a line of machines (Macbook Air) featuring solid-state disks.

Solid state disk: show me

Posted Mar 27, 2008 6:24 UTC (Thu) by jwb (guest, #15467) [Link] (5 responses)

They're also not very fast, especially when writing.  We're getting close, but we're not there
yet.

Relatedly, I recently noticed an annoying habit of web browsers on laptop computers.  When the
disk is stopped, it's normally much faster to fetch an item over HTTP than to read it from the
cache.  But popular web browsers insist on consulting the cache which, on my laptop, takes 1-2
seconds while the disk spins up.  An interesting lesson in the relative costs of fetching
data.

Solid state disk: show me

Posted Mar 27, 2008 6:36 UTC (Thu) by flewellyn (subscriber, #5047) [Link]

Yes, that's probably because the browser caching behavior is still based on older assumptions
about network and disk speeds.  It made sense back when dialup was the norm.  In this era of
ubiquitous broadband, though...

Solid state disk: show me

Posted Mar 27, 2008 9:04 UTC (Thu) by pointwood (guest, #2814) [Link] (1 responses)

"They're also not very fast, especially when writing.  We're getting close, but we're not
there yet."

Yes, we are getting close. From what I read, we can look forward to 100MB/s writes in the very
near future. Furthermore, they scale really well:
http://www.nextlevelhardware.com/storage/battleship/

Solid state disk: show me

Posted Apr 18, 2008 18:31 UTC (Fri) by ranmachan (guest, #21283) [Link]

"Yes, we are getting close. From what I read, we can look forward to 100MB/s writes in the
very near future."

But is that for continuous writes or random writes?
The latter case matters more.  I replaced the hard disk in my notebook with a compact flash,
which can do 20MB/s continuous writes (which - while not exactly fast - would be more than
enough performance), but slows to a crawl (in the KB/s range) on random writes, especially
when cache flushes are involved (FS metadata updates, fsync&co take ages).

Solid state disk: show me

Posted Mar 27, 2008 11:37 UTC (Thu) by nix (subscriber, #2304) [Link] (1 responses)

You can avoid the not-fast problem by rsyncing /usr/bin and /usr/lib (and parts of
/usr/share?) into battery-backed/flash RAM, so most of the time very few writes would be
needed.

(Obviously rsync for this purpose is a kludge; some sort of replicated write at the kernel
level seems preferable. Perhaps dm mirroring could do this, but it'd be forced to replicate
all of /usr into flash, whether we care about it or not...)

Ramback

Posted Apr 1, 2008 18:36 UTC (Tue) by dmarti (subscriber, #11625) [Link]

Or do something like the "Ramback" patch from Daniel Phillips (LWN article): "The core idea behind Ramback is that all of that memory is turned into a ramdisk, but with a persistent device attached to it. In normal conditions, all application I/O involves only the ramdisk, and is, thus, quite fast."

Solid state disk: show me

Posted Mar 29, 2008 0:22 UTC (Sat) by giraffedata (guest, #1954) [Link] (3 responses)

And as in 1978, they are more expensive than moving-head disks. Even after you count the cost of the slowness and the engineering to avoid the seeks.

That's on average, of course. Solid state storage gets used more and more every year as more applications fall out of the disk-costs-less category.

I think one of the great pastimes in the computer industry these days is guessing when more than half of storage will be solid state. While non-professionals have been saying "a couple of years" for about 20 years, I'm now beginning to hear 5 years, for new storage, from credible professionals.

I know some day the use of moving parts in a computer will be a subject of ridicule. It already seems perverse.

Solid state disk: show me

Posted Mar 29, 2008 1:11 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)

Clarke had it taking a million years in _The City and the Stars_. It seems 
like we might manage the no-moving-parts dictum, in computers at least, a 
good bit sooner than that.

(I definitely agree with that dictum: `no machine shall contain any moving 
parts.' As a recipe for long-lived hardware, it has few equals.)

Solid state disk: show me

Posted Mar 29, 2008 2:00 UTC (Sat) by zlynx (guest, #2285) [Link] (1 responses)

Even with zero moving parts, thermal migration of the material on the silicon will limit the
life of a computer system.

Solid state disk: show me

Posted Mar 29, 2008 13:21 UTC (Sat) by nix (subscriber, #2304) [Link]

Yeah, but it's *better*. ;}

(now the fix-every-atom-in-place, fix-reality-to-a-virtual-backdrop tech 
they had in Diaspar, *that* was advanced. And probably physically 
impossible, but it feels like it should have worked. Ah, Clarke wrote some 
good stuff in his day.)

Predictive ELF bitmaps

Posted Mar 27, 2008 4:50 UTC (Thu) by ikm (guest, #493) [Link]

Any benchmarks? Does all this actually make a difference?

Predictive ELF bitmaps for old ELF files

Posted Mar 27, 2008 5:03 UTC (Thu) by jreiser (subscriber, #11027) [Link]

the linker should be taught to emit a new header type which would store the bitmap. It would be near the beginning of the ELF file, eliminating the seek. A problem with that approach is that old binaries would not be able to take advantage of the technique; a re-linking would be required.

PT_GNU_STACK currently uses only .p_flags, and the default linker script has inserted PT_GNU_STACK for a couple years. So re-linking could be avoided in nearly all cases. Just set .p_offset to ALIGN_UP(old_file_size, 4), etc.

Predictive ELF bitmaps

Posted Mar 27, 2008 16:55 UTC (Thu) by mezcalero (subscriber, #45103) [Link] (1 responses)

Why not attach those bitmaps to the files in an extended attribute?

Predictive ELF bitmaps

Posted Mar 27, 2008 17:33 UTC (Thu) by jake (editor, #205) [Link]

> Why not attach those bitmaps to the files in an extended attribute?

This was discussed as part of the thread and I meant to mention it in the article.  The basic
problem is that most filesystems limit xattrs to 4K (total for keys and data) and they are not
stored with the inode so a seek must be done to get to them.

jake

Predictive ELF bitmaps

Posted Mar 29, 2008 0:29 UTC (Sat) by giraffedata (guest, #1954) [Link] (1 responses)

I'd rather see a bigger solution the problem. The text of a program isn't the only thing that needs to get paged in. Shared libraries and data files are in there, and programs often invoke other programs.

And in different situations, vastly different parts of the program get paged in.

I'd like to be able to define a "procedure" by saying, "tell me all the file pages that get paged in between Time 0 and Time 1, as I start a program or maybe a few in that interval. Then I save that trace somewhere -- not necessarily bound to a particular executable -- and the next time I perform a similar procedure, I explicitly ask to have those pages paged in before I start. Maybe with a script that has the name of the trace file in it.

Bootup is an obvious place to exploit something like this.

Predictive ELF bitmaps

Posted Apr 9, 2008 8:09 UTC (Wed) by hensema (guest, #980) [Link]

That's exactly what Windows Vista does. Vista preditively pages in data that's likely to be
read in the near future. Vista uses a self-learning algorithm for this.

Predictive ELF bitmaps

Posted Apr 5, 2008 18:07 UTC (Sat) by anton (subscriber, #25547) [Link]

Stefan Strauß-Haslinglehner (a student of mine) did his master's thesis (in German) on prefetching disk blocks on program startup based on a training run. This covered any blocks (e.g., shared libraries and data files), not just those in the binary.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Predictive ELF bitmaps

Predictive ELF bitmaps

Solid state disk: show me

Solid state disk: show me

Solid state disk: show me

Solid state disk: show me

Solid state disk: show me

Solid state disk: show me

Solid state disk: show me

Ramback

Solid state disk: show me

Solid state disk: show me

Solid state disk: show me

Solid state disk: show me

Predictive ELF bitmaps

Predictive ELF bitmaps for old ELF files

Predictive ELF bitmaps

Predictive ELF bitmaps

Predictive ELF bitmaps

Predictive ELF bitmaps

Predictive ELF bitmaps

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.