0% found this document useful (0 votes)
12 views14 pages

programming_with_invariants__1986

The document discusses the use of invariants in programming to enhance efficiency and comprehension in code, particularly in the context of imperative programming languages. It proposes a method that reduces the occurrence of modification operations by maintaining and exploiting invariants, which can improve program performance without sacrificing too much efficiency. The paper outlines the implementation of this approach in a programming language called SETL, demonstrating how invariants can facilitate the development of high-performance algorithms and compilers.

Uploaded by

testg3478
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views14 pages

programming_with_invariants__1986

The document discusses the use of invariants in programming to enhance efficiency and comprehension in code, particularly in the context of imperative programming languages. It proposes a method that reduces the occurrence of modification operations by maintaining and exploiting invariants, which can improve program performance without sacrificing too much efficiency. The paper outlines the implementation of this approach in a programming language called SETL, demonstrating how invariants can facilitate the development of high-performance algorithms and compilers.

Uploaded by

testg3478
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Programming

with Invariants
Robert Paige, Rutgers University

The use of a restricted class of ased on efficiency considerations, the most practi-
invariants as part of a language cal programming languages in current use foster
programming in which low-level retrieval, modifi-
supports both the accurate cation, and control operations are interspersed. This con-
synthesis of high-level programs and fusing mixture makes programs hard to construct and diffi-
their translation into efficient cult languagesbut
to understand,
High-level such to compile
easyas APL into efficient code.
or SETL propose to min-
implementations. imize this problem by making it possible to write programs
with fewer and more abstract operations. Functional lan-
guages offer another solution: eliminating error-prone
modification code so that programs can be written more
uniformly in terms of retrieval and control operations. The
combination of both these solutions has also been proposed.
However, these remedies sacrifice too much efficiency to
make them competitive with languages such as Fortran.
Furthermore, these approaches have not demonstrated
convincingly that they make correct implementations of
large and difficult programs, such as optimizing compilers
or distributed operating systems, substantially easie to con-
_____ struct or understand.*
We propose another solution that exploits a simple
semantic characterization of modification operations
(sometimes called imperative code) to (1) reduce the occur-
rence of modification operations in programs, (2) make
programming with modification operations easier, and (3)
make programs easier to understand.
This solution is based on the observation that modifica-
tion operations frequently serve to maintain specialized
kinds of invariants that are essential to high program per-
formance. These invariants are typically equalities of the
form
(1) E=Ax1,,...,x,)
where f is a computationally expensive n-variate function
and E is a variable uniquely associated with the value of f.
Invariant (l) can be maintained within a program region B

*In computability theory, the step-counter predicate (which inputs a


program P, data I, and count N, and decides whether P halts on input I
before N computational steps) is far more compact and understandable in
the form of conventional imperative code than as a primitive recursive
function specification. Although the first certified Ada compiler was
written in SETL, it was highly inefficient, and much more efficient Ada
Ganesa, the Hindu god of prudence and policy, is represented with compilers were certified soon after.
an elephant's head, an emblem of sagacity. He frequently rides a
rat, whose conduct is esteemed for its wisdom and foresight.
Edward Moor, Hindu Pantheon (onginally published in 1810), A shorter version of this article appears in Conf. Record HICSS-19, Hawali
Philosophical Research Society, Los Angeles 1976 International Conference on System Sciences, January 8-10, Honolulu.

56 0740-7459/86/0100/0056$01.00 ( 1986 IEEE IEEE SOFTWARE


(where the value off is required with high frequency) by up- a less efficient but more perspicuous program P' by main-
dating E whenever any of the variables on which fdepends taining and exploiting a collection of well-chosen invariants
is modified in B. Consequently, we can avoid recomputing within regions of P. To exploit this idea, we incorporate in-
f each time its value is needed inside B, since its value is kept variants as a primitive programming language construct
stored in the variable E. with user facilities to define them, bind them to programs,
This approach, which is an elementary kind of finite dif- and delineate program regions where they are maintained.
ferencing, can be usefully applied to a rich class of in- A user-defined invariant encapsulates the way an in-
variants that includes invariant (1) and the following type of variant is established and maintained by itself, separately
existential form: from the way it depends on other invariants. The more dif-
(2) 3YIxE..XI,...X ,Xn), 3Y2YIl,---, 3YkEYk- IlYk=E ficult task of generating efficient code to establish and
maintain a whole collection of interdependent invariants is
When fnite differencing maintains invariant (2) in a pro- done automatically by the system. The implementation is
gram region B, all occurrences of the indeterminate expres- based on the transformational techniques of finite differ-
sion 33..3f(x1,...,x,) (involving k applications of the encing and stream processing. 1,2
SETL choice operator*) are replaced by occurrences of E. This approach allows the dynamic part of a program to
We prefer to specify both invariants (1) and (2) more be separated from the functional part in a way that fosters
uniformly in terms of the reduction notation, both program comprehension and efficiency. The impera-
E-JAxl,...,xn) tive code to establish and maintain invariants is captured
and within the invariant definitions. The actual invariants (1
E-33)) .A...Xn) and 2) can be regarded as dynamic functions or relations.
because it actively conveys our particular interest in using Previously, we reported on more restricted but fully
invariants for performning substitutions. With respect to the automatic versions of finite differencing2,3 in which the in-
invariant Ihs -rhs, we say that rhs is the replaceable term, variants and the code regions where they are maintained are
and Ihs is the replacing term. discovered automatically. The papers just cited illustrate a
The imperative code that maintains these invariants (by fully automatic finite differencing applied to such examples
updating E) is very difficult to compose and debug because as topological sorting, graph reachability, the bankers
(1) the programmer is not aware of the invariants being im- algorithm, the available expressions problem, useless code
plemented, (2) the invariants are too numerous, and (3) the elimination, attribute closure, and many more. The semi-
interlocking dependencies among the invariants complicate automatic application of finite differencing discussed in this
the code to maintain them. article allows invariants to be used in many more contexts.
There is some evidence that a major part of the hand This article describes and illustrates an extension of
crafting that goes into the construction of high-perfor- SETL4 with invariants and an implementation method
mance optimizing compilers consists ofthis very sort of im- based on a partial difference calculus that improves the
perative code** (see the useless code elimination algorithm calculus described in Paige and Koenig.2 The material is
discussed in Goldberg and Paige'). This could explain why presented in three main sections. The first section gives the
optinizing compilers are so costly to construct. However, definition and implementation design of a single invariant
our previous investigations2 suggest that, while this impera- E-f(x1, .. . ,x, ). The second section presents a chain rule
tive code itself contains a high level of structural detail and mechanism for maintaining collections of interdependent
complexity, the process of constructing it could be quite invariants. The third section explains how to maintain in-
simple. If this hypothesis is correct, then the problem to be variants E-f(x1 ,...,x,) in which the same variable is
coded (perhaps an optimizing compiler) is not so complex, substituted for more than one of the parameters xl ,...,x".
and its implementation should not be so labor-intensive. These three sections also illustrate the partial difference
The basic idea behind our solution is stated as follows. calculus with examples that show how invariants can facili-
We regard an efficient program P as arising naturally from tate the use of dynamic data structures in programming, the
interfacing of system modules, and the introduction of ac-
cess paths. Relevant background material on programming
methodology, compiler methodology, and related work is
If s is a set, the SETL choice operation 3s yieds an arbitrary element presented in the sidebar on pp. 58-61.
chioseni from s.
The use of invariants described here is a central feature of a
**In combining useless code elimination with four other conventional prototype transformational programming system called
global optimizations, Jiazhen Cai and I have observed that the code in- RAPTS that has been operational for more than four
volved in maintamiing invariants predoniinates over all other code. years. 5,6

January 1986 57
Individual invariants: (3) read(q); $Read a set of numbers q, and
Example 1-from selection sort to heap sort t: [ $initialize t to the empty tuple.
(while q. 1) $While q is nonempty, find its
It is often possible to obtain efficient algorithms by using t with: = min/q; $minimum value, add it to the end
customized dynamic data structures to avoid costly recom- q less: = min/q; $of t, and delete it from q.
putations. Such data structures can be seen as implementa- end while;
tions of invariants. For example, consider a heap data struc- print(t);
ture: a complete binary tree with a key value at each node in
w,hich the key value at every nonleaf node xmust not exceed Because this routine repeatedly searches for the
the key values of any of the children of x (which implies that minimum value of q, it has a nonoptimal running time of
the minimum key is stored in the root). 0(n2), where n is the initial number of elements in q. But
The tree is represented as a vector v in which the parent of since the search operation min/q occurs within a region of
each component v (i), i> 1, is stored in v(floor(i/2)) and the code in which q is modified only by deletion of its minimum
root is stored in v(l). This heap data structure can be used to element, we can obtain a speedup by maintaining q as a
maintain the minimum value of a set s dynamically relative heap.
to additions of new values into s and deletions of the
minmu
miiu vau vau frofro s. Hep ilutaeamotbsckn
s.Hasilsrt otbsckn Based on this idea, heap sort can be seen as arising
directly from selection sort by a transformation that
of invariant that can be used to transform a simple selection
sort directly into a heap sort. maintains the equality E= min/q as an invariant at the two
program points where min/q is computed. Such an invarl-
Selection sort. Consider the following selection sort rou- ant would make these occurrences of min/q redundant and
tine, which sorts a set of numbers q into a tuple t: replaceable by occurrences of E.

Background
Ourtreatment of invariants and their implementation By placing the essential idea underlying Brigg's
by finite differencing unifies two different develop- algorithm in minimal form and applying the idea in the
ments. One development has to do with programming more general context of program development, Dijkstra
methodology: the informal but essential principles that became perhaps the most visible advocate of finite dif-
facilitate the manual construction of programs and the ferencing.5 Through numerous examples, Dijkstra
synthesis of algorithms. Another has to do with com- distilled a lucid and convincing principle of finite dif-
piler methodology: the formal techniques of syntactic ferencing, which he called "avoiding redoing work
analysis and symbolic manipulation by which pro- known to have been done"5' p.152 and "[trading] variable
grams can be improved automatically or semi- spaceforcomputationspeed."6'P5 SimilartoDijkstra,
automatically. Gries7 and Reynolds8 presented several compelling
examples that illustrate finite differencing as part of a
Programming methodology programming methodology.
The development of invariants in programming
methodology can be traced back to the 16th century, Compiler methodology
when Henry Briggs discovered the omethod of The more formal idea of uncovering invariants by
numerical finite differencing for efficiently generating syntactic analysis and maintaining them by an
accurate tables of polynomials.12 The maintenance of automatic or semiautomatic implementation of finite
difference polynomials as invariants was the key idea differencing can also be traced back to Briggs. His
that allowed Briggs to compute each successive value method, though applied manually, was an algorithm for
of a dth degree polynomial with only d additions in- speeding up the calculation of successive polynomial
stead of dadditionsanddmultiplications. Nowadays, it values. Babbage's analytic difference engine1 was an
is widely accepted that invariants play a maJor role in early attempt to mechanize Briggs's technique.
the design of efficient algorithms and software sys-
tems (a different use of invariants is central to program An elementary form of numerical finite differencing
verification as shown in Floyd3 and Hoare4). proposed by Bauer and Samelson9 was eventually in-

58 IEEE SOFTWARE
Transformation. The essential steps of this transforma- tion but extends that technique in several essential ways.
tion, which we regard as a kind of finite differencing, are as Classical strength reduction was limited to equalities E=f,
follows: where f is a numerical expression. 8 Following Earley,9 we
* On entry to the while loop, store the set q into the tuple consider expressions f of any data type, including user-
minheap that implements a heap. This establishes the defined types. For the min/q example, q could be any
invariant totally ordered set.
(4) minheap(l) = min/q This example also illustrates how one invariant can be
on entry to the while loop. * used to maintain another. That is, invariant (4), which is
* Within the while loop immediately after q is dimin- used directly to speed up the selection sort, is implied by the
ished, insert the classical siftdown code (see Aho, Hopcroft, invariant that keeps q stored as a heap. The heap is called
and Ullman7), which updates minheap so that the new the defining invariant, and invariant (4) is called a replacing
value of q is stored as a heap. This establishes invariant (4) invariant. Also note that, because a heap representation for
at both points in the selection soft where min/q occurs. q is not unique, the heap invariant is actually a membership
* Invariant (4), which is maintained by the preceding minheap E heaps(q), where heaps(q) is the set of all possible
steps makes both occurrences of min/q in the selection sort heap representations of q.
(3) redundant, so they can be replaced by occurrences of
minheap(l). The preceding transformation can be applied wit-hin the
minheprdt1)oh * RAPTS system by defining the heap invariant, binding it to
the selection sort (3), and indicating that the invariant
* Assume that both minheap(l) and min/q have the unique undefined value should be maintained and exploited within the selection
S2 when q= I . sort's while loop.

plemented within an Algol compiler. A more general variants for numerical expressions previously handled
finite differencing transformation, which John Cocke by strength reduction and for all the expressions in-
called strength reduction, was later developed as a vestigated by Earley.2G22
powerful global compiler optimization by Cocke and However, Fong and Paige only discussed implemen-
Schwartz,10 Cocke and Kennedy,11 and Allen, Cocke, .
Sartz,ennedy
and Kennedy.C12
12 tations
use a notational facilityclasses
for predefined of invariants
for specifying and did not
and applying dif-
In the beginning of the 1970s, Jay Earley initiated an ferencing rules within a programming language. This
exciting investigation to extend, formalize, and auto- was initiated by Koenig and Paige,23 who presented
mate major aspects of the more general principle of notations for specifying database user views as ar-
finite differencing used implicitly by algorithm bitrary set theoretic invariantsthat could bemaintained
designers and advocated explicitly by Dijkstra. Earley across ad hoc database queries and transactions by
incorporated equality invariants E=f(x1, ..,xn), which finite differencing.
he called intentional variables, as part of his proposed
programming language, Vers2.13 He also suggested a
way to maintain such invariants within code regions by
a method he expected to arise from his iterator inver- Other related work should be mentioned. Burstall
sion transformation.14 and Darlington have often -used their fold and unfold
transformations to implement invariants.24 However,
Fong and Ullman, and later Fong, developed an in- thesetransformations, which Sintzoff callstransforma-
teresting method reminiscent of the calculus of varia- tional gotos,25 are at a much lower semantic level of
tions that was the first implementation of a subset of abstraction than finite differencing. Abstract data
Earley's set theoretic invariants. 1i17 Fong's implemen- types can also be used to implement invariants, and
tation has since been imnproved by Rosen18 and Pepperactually specif-ied asimple but general finite dif-
Tarjan.'19 Paige subsequently developed a general ferencing scheme (but lacking a chain rule) within the
finite difference method capable of implementing in- CIP language. 26

January 1986 59
Invariant definition. An invariant definition has the The heap invariant can be restored just after the minimum
following four parts: value ofsis deleted from s by executing the siftdown routine.
* A header for the heap invariant is denoted by Since the siftdown operation should be performed im-
Define Invariant E - heap(s) mediately after the modification s less: = min/s, it is called
where heap identifies the defining invariant to be im- the postdifference of E with respect to s less: = min/s, for
plemented, s is the set valued argument, and E is a tuple which we write
storing a heap data structure for s. Since there can be many a+E<s less: = min/s; > = /..siftdown code../
heaps for a set s, E must be just one of them. For the assignment s: = [ 3, which assigns the empty set to s,
* The second part of an invariant definition is called the we can keep s stored as a heap E by setting E to the empty
differencing section. This part contains the code that should tuple just prior to the modificaton to s. That is,
be executed to restore the invariant (by updating E) when it a-E<s:= [I;> =E:=[ 1;
is falsified at program points where -its argument s is * Code that establishes the heap invariant on entry to the
modified. This code is called partial difference code. program region where the difference rule is applied is con-
In the case of a heap, we can preserve the heap invariant taied in the initialization section. For this example, the
prior to augmenting s by executing the well-known siftup heap can be established easily (though not in the most effi-
operations.7 Because this partial difference code is executed dent way) by executing
just before the parameter modification s with:= z, it is a-E<s:= (>
called predifference code, for which we write (for x E s)
d-E<swith :=x>
a-E<swith:=z>=/. . siftupcode. . / end;

However, the semantics of abstract data types at- span parsing method. Schwartz showed how his rules
tempts to reinterpret instead of transform code. This could be used to maintain what he calls relationships,
detracts from the very specific nature of what is in- which are more general than invariants. Date has
volved when invariants are maintained and used to im- elaborated extensively on a general rule system similar
prove program performance. Furthermore, abstract to Schwartz's for defining and enforcing database in-
data types are more general than partial difference tegrity constraints in the presense of updates.30
rules, and their specialization for this purpose seems
awkward.
A similar but more general use of invariants than that
discussed in this article is found in computability
theory, where invariants play an important role in the
notion of machine (and language) simulation. Simula- Ref
tion led Hoare to develop a formal basis for replacing eH
abstract variables and operations with more efficient 1. H. Goldstine, The Computer from Pascal to Von
concrete
concretevariables and operations.
variables and 271972.
operations.27 Neumann, Princeton University Press, Princeton, N.J.,
Justification for this replacement is the mainte- 2 H Goldstine, A History of Numerical Analysis, Springer-
nance of an invariant (called a representation function) Verliag, New York, 1977.
between a set of concrete variables and each abstract 3. R. Floyd, "Assigning Meaning to Programs," Proc. Symp.
variable. Gries and Prins28 recently extended Hoare's Applied Mathematics, Vol. 19, American Mathematics
idea so that the invariant relationship between abstract Society, Providence, R.I., 1967.
and concrete variables could be denoted by arbitrary 4. C.A.R. Hoare, "An Axiomatic Basis for Computer Pro-
representation relations. They propose a new language gramming," Comm. ACM, Vol. 12, No. 10, Oct. 1969, pp.
construct called a module that facilitates automatic 576-581.
representation changes. They also discuss some of the 5. E.W. Dijkstra, A Discipline of Programming, Prentice-
issues involved in an implementation. Hall, Englewood Cliffs, N.J., 1976.
Schwartz proposed a language feature more general 6. E.W. Dijkstra, A Personal Perspective, Springer-Verlag,
than ours for specifying and applying rules that involve New York, 1982.
syntactic substitution and insertion operations within 7. D. Gries, The Science of Programming, Springer-Verlag,
program regions.29 He illustrated this construct with New York, 1981.
examples much like those first suggested by Earley 8. J. Reynolds, The Craft of Programming, Prentice-Hall,
and sketched an implementation based on a nodal Englewood Cliffs, N.J., 1981.

60 IEEESOFTWARE
* Fmally, to indicate that all occurrences of min/s Declare Invariant minheap-heap(q);
should be replaced by E(l) within the region where heap is read(q);
maintained, we specify the replacing invariant as the reduc- t = [ 1;
tion maintain minheap;
(while q/ = I J)
E(l)-min/s t with: = min/q;
Note that E(1) is the replacing term and min/s is the q less: = min/q;
replaceable term for the heap invariant. end maintain;
Heap sort. Once the heap invariant above is defined, it print (t);
can be used to improve the selection sort (3) by inserting the Correctness issues. In general, partial finite difference
declaration rules can be used to maintain invariants E-f(xl ,..., x")
Declare Invariant minheap-heap(q); within single-entry, single-exit program regions B. Each in-
at the top of the sort routine. This declaration binds the new variant definition has one defining invariant I and one or
program variable minheap to parameter E and the set more replacing invariants (which could include the defining
valued program variable q to parameter s. The syntactic invariant) implied by L The set of replacing and replaceable
region in which the invariant is preserved is called a main- terms for an invariant I consists of the set of replacing and
tain block, which is delineated by the header "maintain replaceable terms occurring in all of the replacing invariants
minheap" and the trailer "end maintain." associated with I.
Given the preceding heap definition, the following code To understand the differencing method, it is useful to
transforms a selection sort into a heap sort. separate the maintain block into the code that establishes a

9. K. Samelson and F. L. Bauer, "Sequential Formula 20. R. Paige, Formal Differentiation, UMI Research Press,
Translation," Comm. ACM, Vol. 3, No. 2, Feb. 1960, pp. Ann Arbor, Mich., 1981 (revision of PhD dissertation, New
76-83. York University, June 1979).
10. J. Cocke and J. T. Schwartz, Programming Languages 21. R. Paige and S. Koenig, "Finite Differencing of Com-
and Their Compilers, lecture notes, CIMS, New York putable Expressions," ACM TOPLAS, Vol. 4, No. 3, July
University, 1969. 1982, pp. 402-454.
11. J. Cocke and K. Kennedy, "An Algorithm for Reduction of 22. R. Paige, "Transformational Programming-Applica-
Operator Strength," Comm. ACM, Vol. 20, No. 11, Nov. tions to Algorithms and Systems," Proc. 10th ACM
1977, pp. 850-856. Symp. Princ. Programming Languages, Jan. 1983, pp.
12. F. E. Allen, J. Cocke, and K. Kennedy, "Reduction of 73-87.
Operator Strength," in Program Flow Analysis, S. 23. S. Koenig and R. Paige, "A Transformational Framework
Muchnick and N. Jones, eds., Prentice-Hall, Englewood for the Automatic Control of Derived Data," Proc.
Cliffs, N.J., 1981, pp. 79.101. Seventh Int'l Conf. Very Large Databases, Sept. 1981, pp.
13. J; Earley, "High-Level Operations in Automatic Program- 306-318.
ming," Proc. Symp. Very High Level Languages, Vol. 9, 24. R. Burstall and J. Darlington, "A Transformation System
No. 4, Apr. 1974. for Developing Recursive Programs," J. ACM, Vol. 24, No.
14. J. Earley, "High-Level Iterators and a Method for Auto- 1, Jan. 1977, pp. 44-67.
matically Designing Data Structure Representation," J. 25. M. Sintzoff, "Understanding and Expressing Software
Computer Languages, Vol. 1, No.4,1976, pp. 321-342. Construction," in Program Transformation and Program-
15. A. Fong and J. Ullman, "Induction Variables in Very High ming Environments, P. Pepper, ed., Springer-Verlag, New
Level Languages," Proc. Third ACM Symp. Princ. York, 1984, pp. 169-180.
Programming Languages, Jan. 1976, pp. 104-112. 26. P. Pepper, private communication, 1982.
16. A. Fong, "Elimination of Common Subexpressions in 27. C.A.R. Hoare, "Proof of Correctness of Data Representa-
Very High Level Languages," Proc. Fourth ACM Symp. tions," Acta Informatica, Vol. 1, No. 19,1972, pp.271-281.
Princ. Programming Languages, Jan. 1977, pp. 48-57. 28. D. Gries and J. Prins, "A New Notion of Encapsulation,"
17. A. Fong, "Inductively Computable Constructs in Very SIGPLAN Conf. Programming Languages and En-
High Level Languages," Proc. Sixth ACM Symp. Princ. vironments, July 1985, pp. 131-139.
Programming Languages, Jan. 1979, pp. 21-28. 29. J.T. Schwartz, "Some Syntactic Suggestions for
18. B.K. Rosen, "Degrees of Availability,", in Program Flow Transformational Programming," SETL Newsletter, No.
Analysis, S. Muchnick and N. Jones, eds., Prentice-Hall, 205, New York University, Dept. of Computer Science,
Englewood Cliffs, N.J.,1981, pp. 55-76. Mar. 1978.
19. R. Tarjan, "A Unified Approach to Path Problems," J. 30. C. Date, Introduction to Database Systems, Vol. 11,
ACM, Vol. 28, No. 3, July 1981, pp. 577-593. Addison-Wesley, Reading, Mass., 1982.

January 1986 61
defining invariant E-f(xl ,...,x, ), denoted byestablish(E), Our implementation of maintain blocks supports this goal
and the code that maintains and exploits E within B, of program improvement by enforcing three easy condi-
denoted by aE<B> and called the differential of E with *tions:
respect to B. The differential code block aE<B> is formed * The partial difference rules invoked by a maintain
from B by block (5) must include difference code with respect to all
* inserting the difference code blocks 8-E<dx> and modifications occurring within the program region B to
a + E< dx> immediately before and after each modifica- variables on which the expressionf depends.
tion dx (occurring in B) to a parameter x on which f * If a modification dxto a variable x on whichfdepends
depends, and contains an occurrence of any replaceable term, then the
* for each replacing invariant h(E)-g(x,,...x,), predifference code a -E<dx> must not modify E.
substitute h(E) for all occurrences of g(xj,...,x") within B. * There must not be any occurrences of replaceableterms
Thus, the maintain block within any of the difference code blocks for E.
(5) maintain E The first condition above is a correctness check, while the
B other two conditions are for improved performance. That
end maintain is, they ensure that all occurrences of replaceable terms
is equivalent to the code within B are made redundant and that no new occurrences
establish(E) of such terms are introduced within difference code.
dE<B> Note that the second condition above is satisfied for the
The actual predifference and postdifference code blocks min/q example because we use postdifference (and not
are not unique, but they must obey the following correct- predifference) code relative to deletions of min/q from q.
ness condition. For a defining invariant E-f(x,...,x ), if This choice avoids the need for potentially costly copy
dxi is a modification to a parameter xi, i = 1, ,n, then the operations and makes the occurrence of min/q within the
deletion q less: = min/q redundant. Paige and Koenig pro-
predifference and postdifference code blocks that calculate vided a morei theoretical treatment and formal correctness
the new value of E from its old value must satisfy the p g.2
following general Hoare formula: proof of fite differencig.
(E-J(xj, -,x,)j
...
Collections of invariants:
d-E<dx >
Example 2-forming a bibliography
d+E<dxi> Systems programs that are designed independently of one
(E-JAxlj,...,x,) } another often interact according to some standard pattern
It is also essential to the correctness of this transformation
thatf(xj ,...,x,) is well-defined within the region where Eis The monkeygod Hanuman
maintained. In other words, the values of the variables served the mighty
xl, ,x, must belong to the domain off. Also, within these Indian conqueror,
Rama.
difference code blocks, only modifications to E and
variables local to these blocks are allowed. As in the heap
example, the predifference or postdifference code blocks
can be empty.
Implementation issues. For finite differencing to actually
improve code the cumulative expense of establishing a de-
fining inveriant E-fon entry to a program region B, plus
the cost of maintaining it and computing replacing terms
(that result from substitutions indicated by the replacing in-e
variants) inside B, must be less than the total cost of com-
puting the replaceable terms within B before optimization.
As in classical strength reduction8 and the fnite differ-
encing techniques of Fong and Ullman'0 and Paige and
Koenig,2 analysisofthecontrol flowpropertiesofBareim-
portant in deciding whether program improvement is likely.
*We assume that bound variables occurring within g(xl,...,x,) can be Hindu Pantheon,
renamed appropriately. Philosophical Research Society, 1976

62 IEEE SOFTWARE
file is often passed to a text formatter. However, if the (7) define invariant E-multi(str)
system is unable to anticipate this pattern and the formatter a - E<insert(str,z) > = (for x E multiref(z))
blindly processes an entire tagged text file (even though the Ewith: = x;
file is only edited in a minor way relative to the last time it end for;
was processed by the formatter), then potentialy costly E-multiref(str)
redundant computations will be performed. end
and
Partly to overcome this problem, the approach taken by declare citelist - multi(efile);
Janus 1I interfaces editing and text processing in the form of After citelist is maintained, it is important to avoid the
an incremental text formatter. Janus keeps the final text file costly computation [x E citelist), which computes the set
in a form that can be easily updated at low cost whenever of references occurring in citelist. The value of this set ex-
slight editing changes are made to the tagged text file. pression can also be preserved using the invariant definition
In this next, more complicated example, the use of in- and declaration given below.
variants and finite differencing are shown to provide a con-
venient way to implement an interface between editing and define invariant E-setfromtuple(q)
formatting for the simple task of bibliography formation. a -E<q with: = z> = if #[x E q Ix = z] = 0then
Ewith:= z;
end if;
Abstract code. Suppose that the text formatter constructs
a bibliography from file efile (generated by the editor) by E-[x: x E q
executing the following code: end
(6) bibfrom(sort(Ix:xE multiref(eflle))j)) and
declare citeset-setfromtuple(citelist);
where multiref(eflle) yields a multiset of references occur-
ring within efile to cited material; the expression Maintaining the two different invariants, citelist and
{x: x E multiref(eflle) I citeset, within a sequence B of updates to efile raises the
forms a set from this multiset; and bibfrom constructs the problem of how to combine the difference code for citelist
final bibliography from a sorted list of elements from this and citeset so that both invariants are maintained correctly
set. Note that these operations are easy to understand-but and efficiently. It is always correct to maintain the inner-
are costly to compute. most invariant citelist first, then to maintain citeset; that is,
(8) 8 citeset< a citelist<B>>
Maintaining invariants. To improve the performance of In particular, if B were just the simple update, insert(efile,
(6), we can maintain a collection of four different iri- z), then code (8) would yield
terdependent invariants that serve to interface editing and (9) (for x e multiref(z))
formatting as an efficient incremental activity. These in- if#[y E citelist IY = x] = 0 then
variants can be partly determined from an inner to outer citeset with: =x,
subexpression analysis of code (6). end if;
The first invariant, citelist-multi(efile), keeps the citeniftowithrx
bibliographic references occurring in efile stored within the insert (efile,z);
multiset citelist. When efile is-updated by substring insertion The ordering above is also the only correct ordering because
and deletion, the citelist invariant can be maintained by per- variable cit.elist, upon which citeset depends, is undefined
forming inexpensive operations that avoid scanning all of within B. That is, the transformation
efile. This greatly speeds up the task of bibliography forma- a tet< B>
tion at the expense of a small amount of additional space. citese
Whenever a string z is added to efile by an operation, in- cannot be correct.
sert(efile,z), we can locate all the citations within z and add Continuing with this example, we note that the difference
them one by one to citelist. If a constant number of such in- code for citeset occurring within code block (9) is ineffi-
sertions and similar deletions (where each insertion and cient, because of the costly embedded calculation
deletion of a single citation takes unit time to compute) can (10) #jy E citelist Iy=xI
be expected to occur between consecutive formatter runs, which computes the number of references to x occurring in
then an order of magnitude improvement in running time the multiset citelist. This problem can be overcome, how-
will result. ever, by maintaining counts of all occurrences of each dif-
The following invariant definition and declaration will ferent citation within citelist. A single invariant definition
achieve this goal. and a declaration for maintaining all these counts as invari-
January 1986 63
ants is given below: sorted is maintained after citecount, the collective predif-
(11) define invariant E(w)-count(t,w) ference code for citelist, citeset, citecount, and sorted,
8-E<twith: = z> =E(z) +: = 1; relative to the modification insert(efile,z),is
E(wj-#[xE t|x=w] (12) = (for x E multiref(z))
and end End ) -#[EtI ]if citecount(x) = 0 then
insertst(sorted,x);
declare citecount(x) -count(citelist,x); citeset with: =_~x
end if;
Note that for this example, the parameter E and the pro- citecount(x) +: - 1;
gram variable citecount are specified in a pointwise function citelist with: = x;
format. The significance of this. is .that x. is a pattern that end for;
matches any citation.- The- .replacing
. . . . ~~~~~$themodification insert(eftle,zl would appear here$
invariant within (11) in- The modifction insere(efile,z)
dicates that occurrences of citecount(x) will replace occur- Thecollectivepredifferencecoderelativetodelete(efle,z)
rences of expression (10) for any value of x. Of course, for is similar. Maintenance of these four invariants allows us to
this to work, the way in which citecount is maintained must replace the costly bibliography code (6) with the following
be correctly integrated with the other two invariants citelist more efficient code,
and citeset. bibfrom(inorder(sorted));
Because citecount depends on citelist, the code to main- Since sorted represents an invariant that is essential to the
tain invariant citelist must be inserted first. Because expres- task of bibliography formation, it is worth examining the
sion (10), which occurs within the predifference code for two possible predifference code blocks (code block (12) and
citeset, is involved in the old value of citelist before it is the predifference code relative to deletions) to determine if
modified, expression (10) would be made redundant by the we can avoid maintaining any of the other three invariants
old value of citecount before the predifference code for cite- and still maintain sorted. A simple useless code elimination
count is executed. Consequently, in order to avoid making procedure determines that only citecount, the set of refer-
any copies of efile, citelist, citeset, and citecount, it is ence counts, is necessary-citelist and citeset need not be
necessary to insert the code that maintains the citecount in- maintained, and the code to maintain them can be safely
variant after inserting the code for maintaining citeset and removed from the predifference code (12).
citelist; that is, The preceding example illustrates a few basic principles
a citecount < a citeset< a citelist < insert(efle,z)> > > of finite differencing that allow us to automate the poten-
tially tedious task of ordering the way a collection of in-
variants should be maintained in a single-entry, single-exit
(for x E multiref(z)) program region B. The chain rule and method discussed
if citecount(x) - 0 then below improve the treatment of finite differencing found in
citeset with: x=
end if; Paige and Koenig.2
citecount(x) +: = 1;
citelist with: = x; Maintaining invariants.
end for; Definition (chain rule). Let E,-fi, i= 1,..,n, be n
insert (efile,z); defining invariants in Which each expressionf,, i = 1,..,n,
One more improvement to the bibliography code (6) is to depends only on variables v1,..,Vk, . El,...,Ei_1. The
eliminate the expensive operation sort(citeset), which sorts collective differential of El ,..-, E with respect to a pro-
all citations occurring in the set citeset. This is achieved by gram region B, denoted by a[El ,...,E,J <B>, is a new
storing all these citations within a search tree. The following code block formed from B by replacing each modifica-
invariant definition and declaration introduces this crucial tion dx occurring in B with the differential code
data structure a[E1,.,.,En <dx>.
define invariant E-searchtree(s) We can reduce the differential a[E1 ,...,E, <dx> to
a-E<swith:= z>: insertst(E,z); the simpler differential
in;order(E)-sort(s) (13) aIE2,...,En)<a-El<dx> dxa+El<dx>>
end if none of the difference code for E2,... ,E, inserted into
and (13) introduces any occurrences of replaceable terms
declare sorted-searchtree(citeset); associated with El. If the simplification to (13) can oc-
Based on the preceding rules for ordering the way in- cur, we refer to El as a minimal invariant for the dif-
variants are maintained, it would be correct to maintain ferential a[E1 ,...,Enj <dx>. This simplification rule
sorted eithewfter citeset or after citecount. Assuming that leads to the following general chain rules defining predif-
64 IEEE SOFTWARE
sorted

citeset
citecount
ference and postdifference code blocks for collections of citelist
invariants:
a - [E,,...,E,]< dx> = a[E2,..,, <<-El a < dx> > efile
a - (E2, *...En] < dx>
a+ [E,,..EEn<dx> --a+ 1E2-...EA < dx>
a[E2 , . ,En I < a - El < dx> > Fgure 1. Augmented dependency dag (broken edge results
The reduction of the colective differential from augmenting the data dependency dag).
a[El,...,E"I <dx> efficiently evaluate an expressionE=j(xl , . . ,xn) by search-
depends on finding minimal invariants, which can be done ing through a set or tuple valued parameter xi, then xi is
by a simple graph theoretic analysis. Consider a directed called a stream parameter for f. For example, efile is a
acyclic graph (called a dag) whose nodes are labeled with stream parameter for citelist, because we can compute
variables V1,.,Vk,El,E,...,E,. We draw an edge from a citelist efficiently by searching through efile. That is,
node labeled Ei to a node labeled y ifthe variabley occursin
expressionf,. We call this dag the data dependency dag for
a -citelist<efile : =">
(for line = newline(efile)) Sextract a new line starting
fi I,- a - citelist< insert(efile,line)> $from the left of efile
Next, for each predecessor node (labeled E, say) of the end for;
node labeled x, we determine the difference code for E Also, since citelist is a stream parameter for citeset, both
relative to the modification dx. We successively determine of the invariants citelist and citeset can be established
the difference code for each invariant E' relative to each of together in the same loop using a loop combining techniique
the modifications previously determined for the successor we call verticalfusion:
nodes of E' until the entire dag is processed. (15) a citeset<a -citelist<efle: ="> >
After this, we augment the data dependency dag with an (for line= newline (efile))
arc from a node labeled E' to a node labled E" if any part a < citeset< d - citelist< insert(eflle,line) > >
of the difference code just determined for E" contains an end for;
occurrence of a replaceable term g associated with E'. As was pointed out earlier, the difference code used to
(When this is the case, we say that g is an auxiliary expres- construct citeset within loop (15) contains a costly embed-
sion for E .) Consequently, we know that Eis a minimal in- ded expression that does not have to be computed if cite-
variant for 8[El ,...,En} <dx> if all its successor nodes in count is kept invariant at the appropriate point. Since
the augmented data dependency dag are either the node la- citelist is a stream parameter for both citecount and citeset,
beled x or nodes that have no paths to x. citecount can be constructed at the same time as citeset by
To see how the preceding analysis applies to the bibliog- another loop-combining technique called horizontalfusion.
raphy example, consider the collective differential Although this kind of fusion technique would ordinarily
(14) a[sorted, citeset, citelist, citecount} <insert(efile,z)> allow citecount and citelist to be constructed in an arbitrary
The augmented data dependency dag for (14) is shown in order, in this case citecount must come first, because cite-
Figure 1. It is easy to see that citelist is a minimal invariant, count is an auxiliary expression for citelist. That is,
In fact for this case we can simply sort the dag topologically a citecount< a citeset< a - citelist < efile = > > >
and obtain a correct ordering of invariants; that is, line = newline(efile))
(fora citecount< a citeset< a - ctelist< insert(efile,line)> > >
a citecount< a sorted< a citeset <a end for;
.citelist
< insert(efle,z) > > > >
Finally, since citeset is a stream parameter for sorted, all
Establishing invariants. Although it is possible to four invariants can be cornputed efficiently using a single
establish a collection of invariants in a straightforward way search through efile. That is,
using the initialization section of the invariant definition as a sorted< a citecount< 8 citeset< a-
in the heap sort example, we can produce much better code citelist<efile >>>>
based on an improved version of the stream processing (for line= newline(efile))
transformation found in Goldberg and Paige' and Paige a sorted< a citecount< a citeset< a- citelist < insert(efile,
and Koenig.2 Stream processing is a technique that im- line)»»
end for;
proves the time and space requirements of several iterative
expressions by calculating them in a single pass and by We indicate the stream parameters for an invariant with-
avoiding intermediate calculations. in the initialization part of the invariant definition. For ex-
The method of stream processing discussed by Goldberg ample, initialization for the multi-invariant (7) would
and Paige depends on a few preliminary notions. If we can contain
January 1986 65
Stream processing implementation
In general, the code to implement the root. Finally, suppose that S is initialized by the in-
establish I E1 , *En Istruction start(S) (for example, efile: "for the bibl iog-
is produced with the following logic. Given a dependen- raphy example), searched through by the generator for
cy dag (directed acyclic graph) for E1 ,..,Ex and the = next(S) (an example is (for line = newline(efile))),
designated stream edges (edges leading toto sream
stream and augmented with the new element x by the instruc-
tion augment(S,x) (for example, insert(efile,ine)). Then
parameters) for El ,...,En, we want to establish these nl
invariants with a minimal number of loops. Let D be the L, is produced by the following code:
set of edges in the dependency dag. First we find the - (1) a- (lE1 ,...Em I <start(S)>
smallest sequence of trees t1 ,---,tk (with edges leading (for x = next(S))
toward the root) satisfying the following four condi- end for;
tions: Note that the procedure (1), which translates trees to
* F is a subset of the stream edges in D such that for loops, provides a new unified treatment of finite dif-
each Ej, i= 1,...,n, only one edge leading out of Ej ferencing and stream processing. It is intriguing to con-
in the dependency dag is included in F. siderStevenD.Johnson'sobservation2 thatthestream
* ftl , ..,tk I partitions F. processing technique of Friedman and Wise3'4 can be
* There are no edges in D - t1, i = 1 ,...,k, leading from regarded as a restricted form of our technique (even
an internal vertex of t, to an internal vertex of t1 though their technique is tailored to infinite streams
where i<<. - while ours is fashioned around finite streams).
* Each edge [E1 ,E2] occurring in the augmented de- The reason this makes sense is that stream process-
pendency graph but not in the original dependency ing is a scheduling problem for searching through
graph indicates that E1 is an auxiliary expression streams. Scheduling searches through infinite streams
for E2. If the two stream edges in F leading away is simplified by the fact that these searches begin but
from E1 and E2 lead to the same vertex, then these never end. Finite streams are more complicated, be-
stream edges must belong to the same tree; other- cause whenever a search through a stream completes,
wise the stream edge for E1 must belong to a tree we need to choose another stream to be searched.
scheduled before the tree that contains the stream
edge for E2.
The last condition above is new and extends the
method of Goldberg and Paige1 to handle auxiliary ex- References
pressions. 1. A. Goldberg and R. Paige, "Stream Processing," ACM
Let t1 ,...,tk be a minimal sequence of trees satisfying Symp. Lisp and Functional Programming, Aug. 1984, pp.
the conditions just above. Each tree t, represents a
loop, Li, that establishes all of the invariants asso- 2. SD. Johnson, private communication, 1984.
ciated with the nonroot vertices of ti, i = 1,...,k. The se- 3. D. Friedman and D. Wise, "CONS Should Not Evaluate Its
quence of loops, L1 ,...,Lk, is the-code that establishes S. M ichaelson and R Mlne,angeduaes,Edianbd Programmin
all the invariants El,...,En. Press, Edinburgh, Scotland, 1976, pp. 257-284.
We generate each loop Li, i= 1,...k, in the following 4. D. Friedman and D. Wise, "Aspects of Applicative Pro-
way: Suppose that t; contains the nonroot vertices gramiming for File Systems," Proc. ACM ConL. Language
labeled El ,...,Em. Suppose also that variable S labels Design for Reliable Software, Mar. 1977, pp. 41-55.

stream:str, a-E<str:= "> Pameteized Invariants:


o-E<insert(str,lie)> Example 3-finding the largest orbit
end for; The computable partial difference calculus just de-
The collective establish operator, denoted by estab- scribed can be used to maintain invariants of the form
lish(E1 ,,...,E, , yields code that establishes the n invar- E-ftx1 ,x2 ,...,x), where the variables x1 ,x2 ,...,xn are
iants Ei =fi, i= 1,...,n. A general stream processing meth- distinct identifiers. An interesting question arises when
od for implementing efficient code to establish these more than one ofthese variables have the same identifier. In
invariants is described in the box above. the case of classical strength reduction, where only a limited
Based on the collective differential and establish opera- number of arithmetic operations are considered, the ap-
tors, we can define collective maintain blocks with respect proach is to treat expressions with multiple occurrences of
to single-entry, single-exit program regions B. The collective the same variable using different rules. Thus, for products,
maintain block is denoted separate difference rules are defined for both i*j and ili.
maintain IE ,...,En1 For the more general finite differencing framework con-
B sidered here, the standard strength reduction approach
end maintain could require 2" different invariants to cover all the ways in
and means the same as which one variable can have multiple occurrences in
establish tE1,...,E,) f(x1 ,...,xJ). In this section, we show that partial difference
a1E1,...,E,)<B> rules are complete in the sense that only n different partial
66 IEEE SOFTWARE
A single-copy method for
parameterized invariants
It is always possible to maintain invariants of the
form E-f(X,...x,xm + i,...,x,) using only one extra copy
of x. Let the two copies be denoted x and x '. In deriving
this technique,, iit is
i useful to consider the invariant
difference rules are needed to handle all of these 2"possible
cases.
Ei-e(hniq...,e formedfr
E'-fx1 ,....,xm,xm +1 , ,x0)formed
.... from Ebyby replang
replacing
the m occurrences of x by m distinct identifiers
Suppose we want to maintain the invariant E-h(x,f(x)), xj,...,xm. Suppose that each variable x1,...,xm,x' has
which involves two occurrences of x. This is easily handled the same value as x, xold, at a program point p where x
by maintaining two invariants, El -f(x) and E-h(x,EI), is modified by a change dx. Suppose also that just after
and by using the chain rule to combine partial difference peach of these variables is modified in the same way as
rules for f and h in the standard way. So if dx were a x(bydx1,...,dxm,dx'), sotheyall havethesamevalueas
modification to x, then the differential a (E,EI <dx> the new value of x, xnew. Then E' can be maintained
would be through this sequence of modifications by executing
the following code:
aE< 8 -El <dx> >
d-E<dx> (1)p: dx
dx dx a-E'<dx,
~~~~~~~~~~~~~~~~~~~~~dx
- > 1
d+ E<dx> a+E'<dx1 >
aE<a+E,<dx>> a-E'<dX2>
Handling invariants such as E-f(x,x) can be reduced to dx2
the preceding case by maintaining a copy of x; that is, by a + E <dX2 >
maintaining the invariants E -xand E-f(x,E1). This ap- a - E' <dXm>
proach wastes space by storing an extra copy of x, but it is dxm
dx'E'<dxm
efficient in terms of time. (A copy is maintained, but a copy + >
operation is not performed except to establish El .) Unfor- Since the values of x,x1 ,...,xm are equal to xold im-
tunately, this technique could prove too costly in space mediately before and to xnewjust aftercode block (1) is
usage for the case of E-f(x,...,X,xm+i ,x"), where m- I executed, the invariant E'-(x,...,x,xm+i, .... xn) holds
extra copies of x are maintained. both before and after code block (1) also. Observe that
The box at right describes a way to avoid all but one extra after the modification dx1 in code block (1) all occur-
copy of x. Although maintaining an extra copy of an argu- rences of variable x1 within the predifference and
ment for expressions with multiple argument occurrences is postdifference code blocks represent the new value of
significantly better than the first naive approach, the need x, xnew. More generally, after each modification dx1,
to conserve space is important enough to consider an alter- i=1
,=..,m, all occurrences of xi within predifference and
native technique that can sometimes avoid keeping any ad- postdifference code blocks represent xnew also. Fur-
ditional
.
copies at dx,
thermore, prior to each modification in code block
(1), i= 1,...,m, all occurrences of the variable xi within
Improved method, an example. But before describing predifference and postdifference code blocks repre-
this approach in its full generality, it is useful to illustrate the sent the old value of x, xold.
ideas with a simple example. Consider the problem that To transform the code (1) into difference code for E,
we take the following steps: Within the difference code
inputs an arbitrary finite function f and a finite set q and
intputs
outputsanelarbistrary
the largest fuite
subset ofunctio foandwhich
s of q for ahfcithe rese - ad.
the restricted in codeodb
sn
~~sent
(1), replace
' n allelc
variablel occurrences
curne that repre-
htrpe
xold by x ', and replace allI occurrences that repre-
functionf [s is a total function that maps into
s itself. This sent xnew by x. These substitutions remove all occur-
problem, which we call the largest orbit problem, can be rences of x1,...,xm from the difference code for E'
formally specified in terms of the following deterministic without altering the semantics of code (1). Finally, this
selection expression: code can be transformed into difference code for E by
(16) thes C q n domainf ffs] c smaximizings, replacing all occurrencesof E' byEandeliminating all
which yields the unique largest subset s of q n domain f the modifications dx1 ..dXm.
satisfyingf[s] C s.
To implement the preceding problem specification, we (18) s: = q n domainf:
first turn the predicatef[s] C s into the equivalent forms c (converge) $repeat until s is unchanged
f l ls], then
tEn into s -f-lI [s] = I ], and finally into s less:
=
)lx E s 0 s }; $pick any element
end converge; JAx)
17)s], nos1-f1x
(17) s-tx s jf(x) ds =sS
E
[sI ,ad=ialyit
Code (18) uses a naive negative workset strategy that
Because the predicatef(x) bs appearing in expression (17) is starts with an approximation that includes the solution and
antimonotone in s (that is, the predicate cannot change then repeatedly whittles away at this and successive approx-
from false to true when s is augmented), the left-hand-side imations until the solution is reached. Its running time,
expression in (17).is monotone. Consequently, we can use a however, is too slow-essentially O(#q + #f2). It can be
classical fixed point argument to implement specification speeded up to run in O(#q + #1) by maintaining the follow-
(16) with the following code: ing two invariants:
January 198S 67
Maintaining parameterized
invariants using no copies
It is often possible to maintain invariants of the form
E-(X,...,X,xm + I ...xn) using no extra copies of x. To de-
termine the difference code for E relative to a modifica-
El-[xsESIAx) s) tion dx to x, it is useful to consider the invariant
E2 - (/(x),xl: xE SI E'-(xj,...,xm,xm+1, ....xn) formed from f by rewriting
where E2 is an access path (or index) into the set s that helps the m occurrences of x by m distinct identifiers.
realize efficient difference code for El. Suppose that the postdifference code for E ' relative
Observe that invariant El has two occurrences of s. Sup- to the change dxi, i = 1 m, is empty. Assume that the
pose that our library of invariant definitions contains E2 initial value of each of the variables x1 ,...,xm is the old
and the following invariant value of x, xold, and let each of these variables in tum be
modified in the same way (so that their final values are
E3-[x E si IAX) S21 all equal to the new value of x, xnew).
but not El. We show how to derive a correct invariant Then all occurrences of xi.. xm within dE' <dx,>
definition of El from E3. represent the old value of x, xold, and all occurrences of
Let the differencing rules for E3 and E2 be given as x ,...,x-1, within aE' <dx, > representxnew,i= 1,..,m.
follows: If there exists some m-permutation irand some number
a-E3<sI less: = z> = ifAx) 9s2 then L = 1, ,m such that aE' <dx7,) > references only xold,
E3 less: = z; i=1,...,L, and dE'<dx,,A> references only xnew,
end if; i=L+1,.,m, then the following code realizes
a-E3<s2less:= z>= (forxE 1UESe I Au)=z)) dE<dx>:
E3 with:= x-,
end for;
[a-E'<dx,,lI)>]xi.xn,E'\x,...,x,E $referencesxold
d -E2 <sI less: = z> = E2 less: = Ufz),z];
By the chain rule, the collective partial differencing code for dx
E2 and E3 is given by [a - E <dX<L +1) >]xi,x-n,E'\x,...,xE $reference xnew
(19) a-1E2,E31<sIless:=z>= iffx) s2then
E3 less:= z; [a-E'<dxi(m) >x1 i..-xn,E'\x,..,x,E $references xnew
end if; A more specialized variant of the preceding rule was
d - (E2,E3 I <S2 less: = Ez>
= (for x e E2 (zl; first presented in Paige1 and proved correct. The prob-
E3 with:E=xI,z lem of deciding whether such a permutation exists was
end for; proved to be N P-complete by Peter Gacs.2
To turn difference rules (19) into rules for El and E2 we References
assume that s1 and S2 initially have the samne values and are 1. R. Paige, Formal Differentiation, UMI Research Press, Ann
then each modified by deletion of z. If s, is modified first, Arbor, Mich., 1981 (revision of PhD dissertation, New York
then a - IE2,E3 ) <s, less: = z> references s2, which has University, June 1979).
the old value of s . Consequently, the code for 2. P. Gacs, private communication, 1980.
a IEl,E2 ) <s less: =z >canbecorrectlyderivedfromrules
(19) by replacing all occurrences of s2 and E3 within (19) by
s and El, respectively.
Let the term that results from the parallel substitution of (20) maintain {E1 E2 )
all free occurrences of the variables xi ,...,x,
by the terms (for x E domainfIx E q)
t1,..,tt, within the term t be denoted by s with: = x-,
(while 3,x"\ts wvstn
[t] xl end .
,for;
ZE Ix E s JAx) 9 sl)
Then a correct rule for a 1E1,E2 1 <s less:= Z> is s less: = z;
[a - (E2,E3 I <s1 less: = z>] E3,s2\El,s end while;
$the old value of s is referenced end maintain;
sless:= z;
[a - (E2,E3 I <S2 less: = z>] E3\El Note that the SETL optimizer can improve code (20) still
By changing our assumption so that S2 is modified first, we further by implementing set- and map-valued variables with
can derive the following, different rule: conventional data structures. 12 However, bymaking use of
[d- IE2, E3 ) <s2 less: = z>I E3 \EI invariants to specify and generate data structures (as we did
sless:= z; in the heap example discussed earlier), we can obtain an
[a- tE2,E3 j<sI less:= z>]E3,s2\El,s even better orbit program that runs in O(#f+ #q) time.
$the new value of s is referenced More generally, our notion of invariant can be used conve-
Thus, to speed up code (18) we need only specify the niently to reformulate the SETL data structure selection
following maintain block: and aggregation transformations.
68 IEEE SOFTWARE
The preceding example leads to a general method 3. R. Paige, FormalDifferentiation, UMI Research Press, Ann
described in the box at left that generates difference code for Arbor, Mich., 1981 (revision of PhD dissertation, New York
parameterized expressions in which no extra copies of University, June 1979).
arguments are required. 4. J.T. Schwartz, On Programming: An Interim Report on the
SETL Project, Installments I and II, CIMS, New York

-
his paper has developed a programming paradigm University, New York, 1974.
. . . . 5. R. Paige, "Transformational Programming-Applications
* based on invariants and an implementation based to Algorithms and Systems," Proc. 10th ACMSymp. Princ.
on an improved finite difference calculus. Al- Programming Languages, Jan. 1983, pp. 73-87.
though we introduced invariants as an extension of the 6. R. Paige, "Supercompilers-Extended Abstract," in Pro-
SETL language, the same method can be used to add in- gram Transformation and Programming Environments, P.
variants to any imperative programming language. Pepper, ed., Springer-Verlag, New York, 1984, pp. 331-340.
Programming with invariants involves an interesting sep- 7. A. Aho, J. Hopcroft, and J. Ullman, Design and Analysis of
aration of functional and imperative programming para- Computer Algorithms, Addison-Wesley, Reading, Mass.,
digms in a way that makes programming and the translation 1974.
of programs into efficient code easier. This article il- 8. J. Cocke and K. Kennedy, "An Algorithm for Reduction of
lustrated this style of programming with three simple ex- Operator Strength," Comm. ACM, Vol. 20, No. 11, Nov.
amples, but applications to other more significant examples 1977, pp. 850-856.
should be apparent. 9. J. Earley, "High-Level Iterators and a Method for
For future work, it would be interesting to consider Automatically Designing Data Structure Representation,"
whether finite differencing can be generaled to handle Journal of Computer Languages, Vol. 1, No. 4, pp. 321-342.
dynamic programming. It would also be worthwhile 10. A. Fong and J. UUlman, "Induction Variables in Very High
investigating how to implement and exploit more general Level Languages," Proc. ThirdACMSymp. Princ. Program-
invariants than the ones discussed here. Finally, the use of ming Languages, Jan. 1976, pp. 104-112.
invariants and their implementation by finite differencing D. Chamberlin
11. Document et al., "Janus:
Composition," Proc. An
ACMInteractive System for
SIGPLAN/SIGOA
presented here is only one of several basic program Symp. Text Manipulation, June 1981.
transformations. 12. R. Dewar, A. Grand, S.C. Liu, J.T. Schwartz, and E.
Our goal is to formalize other principles of programming Schonberg, "Program by Refinement, as Exemplified by the
methodology and to capture them as part of a small but SETL Representation Sublanguage," ACM TOPLAS, Vol.
widely applicable collection of powerful transformations. 1, No. 1, July 1979, pp. 27-49.
These transformations, along with dictions to combine and
apply them, could form a useful supplement to a program-
ming language. For highly restricted languages they might
even be used to form a fully automatic compiler. C1

Acknowledgments
Part ofthis work was done while I was visiting Yale University on
sabbatical from Rutgers University and while I was a summer facul-
ty member at IBM Yorktown Heights.
I am grateful to David Gries, Brent Hailpern, Lee Hoevel, and
Ken Perry for their helpful comments on earlier drafts of this arti-
cle. This work is partly based upon work supported by the National
Science Foundation under grant MCS-8212936 and by the Office
of Naval Research under grant N00014-84-K-0444.

References "StreamxProcessig," A Paige is an associate professor in Rutgers University's


I. A. Goldberg and R. Paige, Stream Processig, ACM Robert Department of Computer Science. His current research interests in-
Symp. Lisp and Functional Programming, Aug. 1984, PP, clude program development methQdology. He received a BA from
53-62. Occidental College and an MS and PhD from New York Universi-
2. R. Paige and S. Koenig, "Finite Differencing of Com- ty's Courant Institute.
putatable Expressions," ACM TOPLAS, Vol.4, No. 3, July His address is Rutgers University, Department of Computer
1982, pp. 402-454. Science, New Brunswick, NJ 08903.
January 1986 69

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy