Experiments On Page Size, Program Access Patterns, and Virtual Memory Performance

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

D. J.

Hatfield

Experiments on Page Size, Program Access


Patterns, and Virtual Memory Performance

Abstract: The assumption about virtual memory systems that as overhead (time for access and software page management) decreases
page size should be reduced is not alwaysa good one. Recent experiments indicatethat larger page sizes can provide better performance
for programs that make highly localized use of memory space.

Introduction Paging simulators


One parameter that affects the performance of memory In order to compare the effectiveness of different place-
hierarchies is the size of the block of data (e.g., the page) ments of the same set of programs (or subroutines, con-
transferred between the memories. We are specifically trol sections, common arrays, data areas, etc.), paging
interested in determining the best page size for a virtual simulators were developed that accuratelymimicked [71
memory system. the software and, whenever necessary, the hardware as-
Key factors influencing the choice of page size are the sociated with actual page replacement algorithms. The
time required to transfer a page between memories and paging simulators were accurate for a single user paging
the patternsof use of memories made by programs. There against himself in either a fixed amount of space or space
is a penalty involved in using large pages, since much of that vaned dynamically in response to his varyingre-
the data transferred may never be used. There is also a quirements. Simulator input was a sequence of page re-
penalty for using small pages,which resultsfromthe quests generated by processing a full instruction trace
time involved in separately transferring many small pages of a real program running under a real operating system.
that may beused together duringprogram execution. More precisely, since all the studies were performed on
Consideration of these penaltiesleads to one working the IBM System/360 model 67, the sequence was oneof
hypothesis about page size: as overhead (times for hard- page sets. For an instruction to be processed on the mo-
ware access and software page management) decreases, del 67, all addresses involved must be in physical memory
pagesizeshouldalso bedecreased; as overhead ap- at once. If the instruction and data are both in the same
proaches zero, a smallerpage is always better than a page, only one page is needed; butif the instruction cross-
larger page[ 1 -51. Investigations of the access patterns es one page boundary and the data cross another, three
of programs indicate that this hypothesis is not always or four pages are required. The instruction images from
good and that the problem of determining the best page the trace were examined sequentially, and when the set
size may not have a simple solution. of pages required for an instruction changed, the current
The work described in this paper is an extension of set was recorded and replaced by the new set. Instruc-
some recent experiments to investigate the effectiveness tions that used a subset of the pages in the current set
of automatic repacking of programs and program pieces did not cause the set to be recorded andreplaced by the
into pages of virtual memory so as to reduce thepage ex- subset, since for a single user no page exception could
ceptions generated by instructionordatareferences result from the requirements of the subset[8].
across page boundaries [6]. In this paper, we first de- The sequence of page sets was processed by a page
scribe relevant portions of the procedures used in the management simulator, which was given either static or
earlier study. We thenconsidermore thoroughly the dynamic constraints on the numberof page frames avail-
usualbasis for choosingpagesizes andcomparethat able to the program during its execution. Sinceonly a se-
58 with our experimental results. quence of sets of virtual page numbers is supplied to the

D. J . HATFIELD IBM J . RES. DEVELOP.


simulator, it needs no information about page size and in At the other extreme(i.e., both halves of a larger page
fact is independent of page size. are always used) every time large page i is requested, the
The page size must bespecified in the processof gener- corresponding sequence of small pages needed would be
ating page-set sequences from the full instruction trace. 2i - 1, 2i. Thenthe overall sequence of small pages
Each instruction or data address from the trace is mapped would be exactly twice as long as the sequence of large
into the corresponding address for thespecific placement pages; for the same amount of real memory space, there
into pages of programs, subroutines, and data areas. This would beexactlytwice the number of small pageex-
address is then divided by the page size to give the page ceptions as large page exceptions. But again, since the
number. Therefore,a page-set sequence can be generated small pages are only half as long, the total number of bits
for any desired page size. transferred would be the same. Any extra overhead for
Our initial reason for looking at different page sizes for small pages would be due to access time plus software
different placements was to determine whether improve- time associated with updating and searching page tables.
ments in performance (fewer page exceptions for a fixed In addition, we had assumed that an effective upper
memory constraint) gained by packing for one page size bound on small page exceptions is twice the number of
would also apply to double- and half-size pages without large page exceptions, and that, as the density of use
further repacking. Since the packing algorithm uses the within large pages decreases, the ratio of small page ex-
page size as well as thesize of the program and data areas ceptions to large page exceptionsdecreases also. Of
to be packed, it was not obvious that the overall place- course one would expect the advantageof the small page
ment of programs and data derived from packing for a size toincreaseastheamount of space availablede-
particular page size would prove effective for other page creased.
sizes.
Because of the greatlikelihood that page sizes are some Experimental results
power of 2, the half-page and the double-page were first Some measurements on a program that we characterized
examined. Results were gratifying in terms of our original as having low-density memory use (program A in Fig. 1)
objectives, in that placements made on the assumption generally confirmed these expectations. Thevertical axis
of 4096-byte pagesproved to be good placements for in the figure represents real memory space and theverti-
2048- and 8192-byte pages as we11[9]. But more inter- cal line8 represent memoryusage. The horizontalaxis
esting was the comparison of page exceptions for page represents execution time. Figure 2 shows the curves of
sizes of n/2 and n bytes, where n in the cases examined page exceptions versus available space for 2K-bytepages
took on the values of 2048, 4096, 8192, and 16,384. (solid line) and4K-byte pages(dottedline). The page
replacement algorithm was first-in, first-out (FIFO). The
Expected results dashed line gives the ratio of exceptionsfor2K-byte
Before examining the actual results, here is a summary pages to exceptions for 4K-byte pages. The horizontal
of the results expectedbased both on ourintuition and on line at 1000 exceptions indicates a ratio of one-to-one,
the literature[ 1 - 51. If a program is divided into pages of and the line at 2000 exceptions indicates a ratio of two-
size n and then into pages of size 4 2 , execution of the to-one. Over most real memory sizes, the 2K-byte pages
program with the smaller size pages would seem toimply give fewer exceptions than the 4K-byte pages. Only when
less data transfer resulting from page exceptions. The the program has all the space it needs does the ratio ap-
reason forthis is that we would not expect the program to proach two-to-one. The height at which the dashed line
always use both halves of the larger size pages. If only stops indicates the ratio of the total number of 2K-byte
half of the larger size pages were always used, the length pagesneeded tothetotal number of 4K-byte pages
of the sequenceof pages for both page sizeswould be the needed.
same. Hence a request for large page i would correspond Whenthisprogram was repackaged toincreasethe
to a request for either small page 2i or small page 2i - 1. localization of memory use by placing together in mem-
If the correspondence from large to small pages always ory space program parts used close together in time (see
involved the same half of the large page i, the request se- Fig. 3), the number of exceptions for both 2K- and 4K-
quence would be exactly the same except for renumber- byte pages was fewer than in the previous case. The in-
ing, and the same number of exceptions would result for creased localization was indicated by: fewer page refer-
the same numberof page frames. Butthe same numberof ences (for both 2K and 4K) required for completion of
page frames implies half the space, so that for the same the program; a smaller working set size, measured over
amount of space the number of exceptions should be ap- 2500-, 5000-, 7500-, and 10,000-instruction intervals;
preciably less. And the time for data transferwould also and an increase in the ratio of program instruction and
be less, since each page is half as long. This situation data transfers within pages to those transfers from one
clearly favors the smaller page size. page to another. However, theratio was much lessfavor- 59

JANUARY 1972 VIRTUAL MEMORYPERFORMANCE


J"

I ::

9vailable memory
Figure 2 Page exceptions for program A.

Execution time + even level for the hardware-software environmentwithin


which the experiments were run.
Figure 1 Memory usage by program A.
The break-evenlevel is determinedas follows: the
time to processa page exception is composed of the time
for data transfer, the access time for the device where
able to the smaller page size, especially in the right half the page resides, and the software time involved in han-
of the curve, which is the region of moderate to low pag- dling the page exception interruption and finding a page
ing activity and therefore clearly the most desirable re- to replace. Under the assumptions that the device speed
gion (Fig. 4).In fact, throughout most of this region, the and the overall paging load are the same for both page
60 exception ratio was above whatwe refer toas the break- sizes, the access time a will be the same for both large
the

D. J . HATFIELD I B M 1. RES.
DEVELOP.
200(

1.14

34 - 1000

-
32 -'

30 ,--

28 -

2
.-
3
x

P
6

E .vailable memory
8
Figure 4 Page exceptions for program A after repackaging.
2n

-
6
-
search is made may be distributed in the same manner
I Execution time through small and large pages, so that the depthof search
Figure 3 Memory usage by program A after repackaging. would be the same. Sinceit is easy tostipulate conditions
that would favor either size, the software overhead 6 was
assumed to be the same for both cases.
The number of small page exceptions that can be pro-
and the small page size. The transfer time will be p for
cessed in the time it takes to process onelarge page ex-
the large page size and p/2 for the small. The software
ception is
overhead to replace a page is difficult to determine as a
function of page size. There are potentially more small a+P+G
pages to look through, but the condition for which the 61

JANUARY 1972 VIRTUAL MEMORYPERFORMANCE


. .
1 :

22

20

I8
-
II Execution time "+
Figure 5 Memory usage by program B.

For CP-67 version 3 on the model 67 with an IBM 2301


paging drum, this number is 1.14, largely because a is
large compared to /3 and 6.
For devices with access time reduced with respect to
transfer time, this number is closer to 2.0. On the other
hand, if the I/O operation time is totally overlapped with
program execution, the ratio reduces toS,/S,,2. The value
of this ratio is dependent on the program, the page size,
and the page replacement algorithm, and will not be dis-
cussed further in this paper.
Given a break-even level of 1.14, it is clear that the re- ivailable memory

packed program favored the larger page size throughout Figure 6 Page exceptions for program B.
most of the desirable performance region. For programs
with greater localization of heavily used memory, the bias
to the large page size is also greater (Figs. 5 and 7). Both
of these programs have a more stable working set and It was not difficult to find examples of page request
show a sharper bend in the page exception curve than do sequences giving two, three, and fourtimes the numberof
either the original or the repackaged version of the first exceptions for the smallerpagesize. The examples in
program. And in the page exception graphs (Figs. 6 and Fig. 9 are simple sequences that parallel the activity of
8) the smaller page size often resulted inmore than twice realprogramsin anenvironment involving more real
theexceptions,contrarytoour intuitive expectations. pages and longer stringsof requests between page excep-
After checking our page replacement simulators and find- tions. The vertical boxes at the left represent an initial
ing no logical errors, we tried to find models that would stack of pages ordered for removal when a page excep-
62 predict ratios in excess of two :one[ 51. tion occurs. The bottommost member of the stackis to be

D. J. HATFIELD IBM J . RES. DEVELOP.


2000

1000

2
.-
BB
0
k I I I I I I
40K 80K 120K 160K 200K 240K

.vailable memory
01
b Figure 8 Page exceptions for program C.
P

D
P-
instruction or data reference spanning a half-page bound-
IExecution time "-+ ary was less than one percent in the real programs ex-
Figure 7 Memory usage by program C. amined. In other words, if we consider a string of pro-
gram instructionstranslatedinto page references (@ne
or morepage references perinstruction) first for pages of
removed first. Corresponding to two requests for large size n and then for pages of size n/2,the page reference
page i are two requests for either one or both halves of strings for n and n/2 are essentially thesame length.
the page as some combination of the numbers 2i and 2i - Note that such a reference string is not the same length
1. Page requests that cause page exceptions are under- as the compressed string of page sets given to the page
lined in both cases. In thefirst and second and the fourth managementsimulator. For the single-user case, how-
and fifth examples, the request sequence and exception ever, both the full and the compressed stringcontain the
pattern can cycle indefinitely. This is because the
removal same information.
stack at an earlier position in the sequence is reestab- These examples all have somethings in common. Most
lished later in the sequence, defining a cycle that will of the time, both halves of the large page are used. Page
regenerate itself as many times as desired. In these cases, exceptions for thelarge page are farenough apart so that
the number of exceptions given is that for one cycle (all between the corresponding small page exceptions there
examples arefor single page sequences,butcan be are more than enough changes of state to significantly re-
broken into sets withoutdestroying the phenomenon). order the stack. On the other hand, between the times of
It should be emphasized that these sequences are the the large page exceptions, the stack is not significantly
result of the accessing patterns of programs and not an reordered. The large page that causes a page exception
artifact resulting from an instruction or a word of data usually corresponds to twosmall page requests, and both
spanning a half-page boundary. The occurrence of an usually cause a page exception. These conditions would 63

JANUARY 1972 VIRTUAL MEMORYPERFORMANCE


FIFO:
33 11 2 2 33 22
11 33 11 2 2 11 22 3 3 22
33 1 1 33

65 12 44 6 6 3 4 12 56

FIFO:
22 33
11
33 44 33 44 11 44 33 11 44 11
33 2 2 44 33 22 33 22 44 33 1 1 22 4 4 . . . 3 exceptions

33 56 12 56 17 56 78 78
22 56 12 77 56 12 34 78 _56 34 56 34 7s 56 12 23 78.. . 14exceptions

*-
65 21.. .16 exceptions

Figure 9 Examples of page request sequences.

be expected from programs characterized as high-density shows LRU replacement appliedto theprogram shown in
users of memory, and for those programs only when the Fig. 7 for FIFO. As yet we have been unable to prove
paging rate for the large page size is relatively low (i.e., that there is a replacement algorithm using only the past
the righthand side of the graphs). In addition, the stipula- history of page requests that cannot generate more than
tion that the removal stack for the large page size not be twicetheexceptions with half size pages. Figure 11
reordered by the time a page exception occurs would im- shows the program displayed in Fig. 5 passed by a re-
ply that the phenomenon is more likely to occur with placement algorithm that selects a page for removal by
FIFO than with least recently used (LRU) replacement. examining a single "used" bit for thepage. It caneasily be
This follows because the FIFO stack can change only shown that the MIN algorithm, which gives the minimum
at page exception time, while the LRU stack canchange number of page exceptions for any requeststring, cannot
with every instruction. This has been observed experi- produce more than twice the exceptions for thehalf page
mentally, with the LRU algorithm seldom giving an ex- size[ 111. But for other algorithms, especially those in
64 ceptionratio greaterthan 2: 1 for real programs. Figure 10 use today, there seem to be no guarantees.

D. J . HATFIELD 1BM J . RES. DEVELOP.


1001

501

/Available memory

Figure 10 LRU replacement for program C .

One other not obvious characteristic of the page ratio


curves should be noted. Instead of progressively favoring
the smaller page size as available space is reduced, the
2
curves (for as far as we have measured them) reach a .
3
minimum near the middle of the available space range, K
al
with the minimum growing sharper and shifting to theleft D
as usagedensity increases. Several modes of page re-
k
quest behavior could account for this, but we have not
yet been able to specify analytically the degree towhich (Available memory

real programs resemble these models. Figure 11 “Used bit” replacement for program B.
For instance, the amountof memory space involved in
a memory cycle can be the determining factor. If a pro-
gram is cycling through r pages and has only r - 1 page exception per request, and the large size page will be-
frames available, both LRU and F I F O replacement al- come competitive again with respect to the small. The
gorithms will generate a page exception for every page curve representing the ratio of small to large page excep-
request. If the same program is run on pages half as large, tions will climb as the available space is reduced.
the cycle may involve far fewer than 2r small pages. Any The effect of more than doubling the exceptions for the
cycle using 2(r - 1) or fewer small pages will generate half-size page has beennoted for a large size page of
no page exceptions, compared with one exception per 16,384, 8192, 4096 and 2048 bytes. It is a characteristic
page request with large pages. But if the available space of programs that make highly localized use of memory
is further constricted so that the cycle does not fit for and that therefore perform well on systems using reloca-
either large or small pages, both cases will generate one tion hardware for address translation and is also a char- 65

JANUARY 1972 VIRTUAL MEMORY PERFORl LNCE


acteristic of those programs in the region of low paging References and notes
activity. It seems toaffect all implementable replacement 1 . M. Joseph, “Analysis of Pagingand Program Behavior,”
algorithms,especially thosethat seldompermanently Computer Journal 13, No. 1 , 49 (February 1970).
2. L. Belady, “A study of replacement algorithms for a virtual-
alter theremoval stack between exceptions. Because it is storage computer,” IBM Systems Journal 5, No. 2, 93-94
related to the page request string and not the page size, (1966).
it can apply to small pages (of 64 and 32 bytes) as well as 3. B. Randell, “A note on storage fragmentation and program
segmentation,” Communications of t h e A C M 12, No. 7,
large ones, and so can be encountered in machines with 365-366 (July 1969).
caches as well as those using relocation hardware. 4. M. H. J. Baylis, D. G. Fletcher, and D. J. Howarth, “Paging
The questionof page size (usually termed slot size) for Studies Made on the ICT Atlas Computer,” Proceedings
IFIP Congress 1968 2, 835-836 (1968).
a cache is usually more complex than that of page size 5. P. J. Denning, “Virtual Memory,” Computing Surveys 2,
for main memory. Typically the replacement algorithms No. 3, 169 (September 1970).
used for caches are only locally LRU. The ratio of re- 6. D. Hatfield and J. Gerald, “Program Restructuring for Vir-
tual Memory” I B M Systems Journal 10, 168 (1971).
trieval time for slots of n and n/2 bytes depends on more 7. The limit of this accuracy is the set of all events triggered by
than the addressing and data transfer time from memory. changes in virtual memory or execution cycle requirements.
In addition, the degree of interleaving of memory mod- Timing considerations introduced by the speed of different
memory or data transfer devices were ignored, and to this
ules determines how many bytes can be sent to the cache extent the simulation was not complete.
at once. For the System/360 model 85, a 16-byte draw 8. The average size of a page set forthe programs we have ex-
and a 4-way interleaving implies that 64 bytes can be amined varies between two and three pages. Naturally the
difference in performance between a sequence of single
brought into the cache just as quickly as 32 bytes. For pages and a sequence of sets becomes more critical as the
each module of an interleaved memory, the memory de- available paging space is reduced.
sign can determine whether successive accesses can be 9. The improvements in the double page case were greater than
with the full pagefor which the repackaging was performed.
attempted as soon as the addressing and transfer hard- The repackaging programs strung together pages that com-
ware are ready, or whether a memory cycle must elapse municated with one another and so created very wellpacked
between one access and another. Individual system de- double pages. Splitting the full pages in half destroyed some
of the effect of page packing and the improvements for the
signs are complex enough so that the ratio of retrieval half page case were not as great as with the full.
times for n and n / 2 bytes can range anywhere from one to 10. I am grateful for discussion of the problem and for some ex-
a little more than two. amples of sequences of page requests that give more than
twice the exceptions at the half pagesize to John Pomerantz,
It is not really surprising that more localized use of Department of Computer Science, University of Chicago.
memory would favor largerpages, since if a program 1 1 . An algorithm can be devised that gives exactly twice the
stayed uniformly within one large block, the optimal page number of exceptions for the half page sequence by looking
ahead in the request string, as does the MIN algorithm. The
size would be thesize of the block. The possible degree of MIN algorithm itself must do at least this well, so cannot
mismatch between pagesize and memory use pattern, generate more than twice the page exceptions withhalf
however, implies that careful study is required in order pages. For a discussion of the extention of the MIN algo-
rithm to handle page sets, seeL. A. Belady, “Use of the min-
to decide the best performing page size for a program or imum pagereplacement algorithm to produce specified mem-
a programming system,andthattheassumption,“the ory states,” IBM T. J. Watson Research Center Report.
smaller the page size the fewer wasted 1 / 0 transfers,” is
not always correct. The relation between page reference Received August 2 6 , 1971
patterns and page replacement algorithms gives rise to
behavior that is not yet well understood and may require The aurhor is located at the IBM DPD Scientific Cen-
stricter definitions of program locality. ter, Cambridge, Massachusetts 02139.

66

D. J . HATFIELD I B M J . RES. DEVELOP.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy