PL Unit-1 Material
PL Unit-1 Material
PRELIMINARY CONCEPTS
1. Reasons for Studying Concepts of Programming Languages
It is believed that the depth at which we think is influenced by the expressive power of the language in
which we communicate our thoughts. It is difficult for people to conceptualize structures they cannot
describe, verbally or in writing. Language in which they develop S/W places limits on the kinds of
control structures, data structures, and abstractions they can use. Awareness of a wider variety of P/L
features can reduce such limitations in S/W development. Can language constructs be simulated in other
languages that do not support those constructs directly?
Many programmers, when given a choice of languages for a new project, continue to use the language with
which they are most familiar, even if it is poorly suited to new projects. If these programmers were familiar
with other languages available, they would be in a better position to make informed language choices.
Programming languages are still in a state of continuous evolution, which means continuous learning is
essential. Programmers who understand the concept of OO programming will have easier time learning
Java. Once a thorough understanding of the fundamental concepts of languages is acquired, it becomes
easier to see how concepts are incorporated into the design of the language being learned.
Understanding of implementation issues leads to an understanding of why languages are designed the
way they are. This in turn leads to the ability to use a language more intelligently, as it was designed to
be used.
The more languages you gain knowledge of, the better understanding of programming languages
concepts you understand.
In some cases, a language became widely used, at least in part, b/c those in positions to choose languages
were not sufficiently familiar with P/L concepts. Many believe that ALGOL 60 was a better language
than Fortran; however, Fortran was most widely used. It is attributed to the fact that the programmers
and managers didn’t understand the conceptual design of ALGOL 60.
2. Programming Domains
Scientific applications
– The O/S and all the programming supports tools are collectively known as its system software.
– Need efficiency because of continuous use.
Scripting languages
– PHP is a scripting language used on Web server systems. Its code is embedded in HTML documents.
The code is interpreted on the server before the document is sent to a requesting browser.
3.1 Readability
– Too many features make the language difficult to learn. Programmers tend to learn a subset of the
language and ignore its other features. “ALGOL 60”
– Multiplicity of features is also a complicating characteristic “having more than one way to accomplish
a particular operation.”
– Ex “Java”:
count = count + 1
count += 1
count ++
++count
– Although the last two statements have slightly different meaning from each other and from the others, all
four have the same meaning when used as stand-alone expressions.
– Operator overloading where a single operator symbol has more than one meaning.
– Although this is a useful feature, it can lead to reduced readability if users can create their own
overloading and do not do it sensibly.
Orthogonality
– Meaning is context independent. Pointers should be able to point to any type of variable or data
structure. The lack of orthogonality leads to exceptions to the rules of the language.
– A relatively small set of primitive constructs can be combined in a relatively small number of ways to
build the control and data structures of the language.
– The more orthogonal the design of a language, the fewer exceptions the language rules
require.
– The most orthogonal programming language is ALGOL 68. Every language construct has a type, and
there are no restrictions on those types.
Control Statements
– It became widely recognized that indiscriminate use of goto statements severely reduced program
readability.
loop1:
if (incr >= 20) go to out;
loop2:
if (sum > 100) go to next;
sum += incr;
go to loop2;
next:
incr++;
go to loop1:
– Basic and Fortran in the early 70s lacked the control statements that allow strong restrictions on the
use of goto’s, so writing highly readable programs in those languages was difficult.
– The control statement design of a language is now a less important factor in readability than it was in
the past.
– The presence of adequate facilities for defining data types and data structures in a language is another
significant aid to reliability.
timeout = 1 or
timeout = true
Syntax Considerations
Identifier forms: Restricting identifiers to very short lengths detract from readability. ANSI BASIC
(1978) an identifier could consist only of a single letter of a single letter followed by a single digit.
Special Words: Program appearance and thus program readability are strongly
influenced by the forms of a language’s special words. Ex: while, class, for. C uses braces for pairing
control structures. It is difficult to determine which group is being ended. Fortran 95 allows
programmers to use special names as legal variable names.
Form and Meaning: Designing statements so that their appearance at least partially indicates their
purpose is an obvious aid to readability.
If used as a variable inside a function, it means the variable is created at compile time.
If used on the definition of a variable that is outside all functions, it means the variable is visible only in
the file in which its definition appears.
3.2 Writability
It is a measure of how easily a language can be used to create programs for a chosen problem domain.
Most of the language characteristics that affect readability also affect writability.
– A smaller number of primitive constructs and a consistent set of rules for combining them is much
better than simply having many primitives.
– Abstraction means the ability to define and then use complicated structures or operations in ways that
allow many of the details to be ignored.
– A process abstraction is the use of a subprogram to implement a sort algorithm that is required several
times in a program instead of replicating it in all places where it is needed.
Expressivity
– It means that a language has relatively convenient, rather than cumbersome, ways of specifying
computations.
3.3 Reliability
A program is said to be reliable if it performs to its specifications under all conditions.
Type checking: is simply testing for type errors in a given program, either by the compiler or during
program execution.
– The earlier errors are detected, the less expensive it is to make the required repairs. Java requires type
checking of nearly all variables and expressions at compile time.
Exception handling: the ability to intercept run-time errors, take corrective measures, and then continue
is a great aid to reliability.
Aliasing: it is having two or more distinct referencing methods, or names, for the same
memory cell.
3.4 Cost
– Categories
– Compiling programs
– Executing programs
– Maintaining programs: Maintenance costs can be as high as two to four times as much as
development costs.
We use imperative languages, at least in part, because we use von Neumann machines
Programming methodologies
1950s and early 1960s: Simple applications; worry about machine efficiency
Late 1960s: People efficiency became important; readability, better control structures
– Structured programming
– data abstraction
5. Language Categories
Imperative
– C, Pascal
Functional
– LISP, Scheme
Logic
– Rule-based
Object-oriented
– C++, Java
Example: Java demands all references to array elements be checked for proper indexing, which
leads to increased execution costs
- Example: provides many powerful operators (and a large number of new symbols),
allowing complex computations to be written in a compact program but at the cost of
poor readability
Example: C++ pointers are powerful and very flexible but are unreliable
Compilation
Pure Interpretation
The operating system and language implementation are layered over machine interface of a
computer
• Translate high-level program (source language) into machine code (machine language)
• Slow translation, fast execution
• Compilation process has several phases:
lexical analysis: converts characters in the source program into lexical units
syntax analysis transforms lexical units into parse trees which represent the syntactic structure
of program
Semantics analysis: generate intermediate code
code generation: machine code is generated
The Process of Compilation and program execution takes place in several phases. The
following figure specifies different phases of a general Compiler
Preprocessor
Programming Environments
Genealogy of common high-level programming languages
The Study of programming languages is dived into examination of Syntax and Semantics of its
constructs.
Syntax – It is the form / structure of the expressions, statements, and program units.
Semantics – It is the meaning of the expressions, statements, and program units.
Terminology
A lexeme is the lowest level syntactic unit of a language (e.g., *, sum, begin).
1. Recognizers - used in compilers (reads a string and determines whether it is the given
language or not)
Language recognizers:
Suppose we have a language L that uses an alphabet ∑ of characters. To define L formally using the
recognition method, we would need to construct a mechanism R, called a recognition device, capable of
reading strings of characters from the alphabet ∑. R would indicate whether a given input string was or
was not in L. In effect, R would either accept or reject the given string. Such devices are like filters,
separating legal sentences from those that are incorrectly formed. If R, when fed any string of characters
over ∑, accepts it only if it is in L, then R is a description of L. Because most useful languages are, for
all practical purposes, infinite, this might seem like a lengthy and ineffective process. Recognition
devices, however, are not used to enumerate all the sentences of a language—they have a different
purpose. The syntax analysis part of a compiler is a recognizer for the language the compiler translates.
In this role, the recognizer need not test all possible strings of characters from some set to determine
whether each is in the language.
Language Generators
A language generator is a device that can be used to generate the sentences of a language. The syntax-
checking portion of a compiler (a language recognizer) is not as useful a language description for a
programmer because it can be used only in trial-and-error mode. For example, to determine the correct
syntax of a statement using a compiler, the programmer can only submit a speculated version and note
whether the compiler accepts it. On the other hand, it is often possible to determine whether the syntax
of a statement is correct by comparing it with the structure of the generator.
Context-Free Grammars
Example
<Stmt> -> <single_stmt>
| begin <stmt_list> end
Syntactic lists are described in BNF using recursion
An example grammar:
<program> -> <stmts>
<stmts> -> <stmt> | <stmt>; <stmts>
<stmt> -> <var> = <expr>
<var> -> a | b | c | d
<expr> -> <term> + <term> | <term> - <term>
<term> -> <var> | const
An example derivation:
A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the
one that is expanded.
An example grammar:
<program> -> <stmts>
<stmts> -> <stmt> | <stmt> ; <stmts>
<stmt> -> <var> = <expr>
<var> -> a | b | c | d
<expr> -> <term> + <term> | <term> - <term>
<term> -> <var> | const
Dept. of CSE. MECS 21
An example derivation:
Syntax Graphs - put the terminals in circles or ellipses and put the non-terminals in
rectangles; connect with lines with arrowheads
Example: expressions of the form id + id - id's can be either int_type or real_type types of the two id's
must be the same type of the expression must match its expected type
BNF:
Hierarchical representation of a derivation. Every internal node of a parse tree is labeled with a non-
terminal symbol; every leaf is labeled with a terminal symbol. Every subtree of a parse tree describes
one instance of an abstraction in the sentence
Ambiguous grammar
A grammar is ambiguous if it generates a sentential form that has two or more distinct parse trees An
ambiguous expression grammar:
1. Optional parts are placed in brackets ([]) <proc_call> -> ident [ ( <expr_list>)]
2. Put alternative parts of RHSs in parentheses and separate them with vertical bars
<term> -> <term> (+ | -) const
3. Put repetitions (0 or more) in braces ({}) <ident> -> letter {letter | digit}
BNF
| <expr> - <term>
| <term>
| <term> / <factor>
| <factor>
EBNF
Static semantics (have nothing to do with meaning) (Syntax rather than semantics).
It has been introduced to specify some language rules which are not specified by using BNF rules.
Example: Floating point value cannot be assigned to integer type variable (context-free)
CFGs cannot describe all the syntax of programming languages. So, it is an extension to CFG.
Dept. of CSE. MECS 24
- Additions to CFGs to carry some semantic info along through parse tree Primary value of AGs:
Attribute grammars are context-free grammars to which we have added attributes, attribute computation
functions, and predicate functions.
Attributes: are associated with grammar symbols (terminals and non-terminals) and are like variables.
Attribute computation function: semantic functions associated with grammar rules. They are used to
specify how attribute values are computed.
Predicate function: static semantic rules of the language that are associated with grammar rules.
Each set A(x) consists of two disjoint sets S(x) and I(x), called synthesized and inherited attributes.
Each rule has a set of functions that define certain attributes of the non-terminals in the rule called
semantic functions.
Each rule has a (possibly empty) set of predicates to check for attribute consistency called predicate
function.
Let X0 -> X1 ... Xn be a rule Functions of the form S(X0) = f(A(X1), ... A(Xn)) define
synthesized attributes Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for i <= j <= n, define
inherited attributes Initially, there are intrinsic attributes on the leaves
Attributes:
Semantic rules:
<var>[1].env <expr>.env
<var>[2].env <expr>.env
<expr>.actual_type <var>[1].actual_type
Predicate:
<var>[1].actual_type = <var>[2].actual_type
<expr>.expected_type = <expr>.actual_type
1. If all attributes were inherited, the tree could be decorated in top-down order.
If all attributes were synthesized, the tree could be decorated in bottom-up order.
Dept. of CSE. MECS 25
In many cases, both kinds of attributes are used, and it is some combination of top-down and bottom-up
that must be used
<var>[1].env <expr>.env
<var>[2].env <expr>.env
<var>[1].actual_type == <var>[2].actual_type
<expr>.actual_type <var>[1].actual_type
<expr>.actual_type == <expr>.expected_type
Describing dynamic semantic or describing the meaning of the expression, statement, and program units
of a programming language.
Several reasons for the needs of a methodology and notation for describing semantics:
S/W developers and compiler designers determine the semantics by reading the language
manual written in English, but such manuals are often imprecise and incomplete. A complete
specification of the syntax and semantics of a programming language could be used by a tool to
generate a compiler for the language automatically.
There are three approaches to describe the semantic for the imperative programming languages
Operational Semantics
Denotational Semantics
Axiomatic Semantics
Operational Semantics
2. Machine language and real computers are not used for operational semantics (they are too
large and complex)
The detailed characteristics of the particular computer would make actions difficult to understand
The process:
Build a translator (translates source code to the intermediate code of an idealized computer)
Meaning
C Statement
expr1 ;
for (expr1; expr2;
loop: if expr2 == 0 goto out
expr3)
...
{
expr3 ;
...
goto loop
}
out: . . .
Intermediate language is meant to be convenient for virtual machine.
• Natural operational semantics: Highest Level, interest is in the result of complete program.
• Structural operational semantics: Lowest Level, precise meaning of program is specified with a
complete sequence of state changes when the program is executed.
It is the most rigorous and most widely used formal notation of describing the meaning of
program constructs
- Based on recursive function theory, the most abstract semantics description method Originally
developed by Scott and Strachey
- The difference between denotational and operational semantics: In operational semantics, the state
changes are defined by coded algorithms; in denotational semantics, they are defined by rigorous
mathematical functions
1
Syntactic domain: set of {0,1}
Semantic domain: set of all nonnegative
decimal numbers
Function:
Mbin (‘0’) = 0
Mbin (‘1’) = 1
Mbin (<bin_num> ‘0’) = 2 * Mbin (<bin_num>) + 0
Mbin (<bin_num> ‘1’) = 2 * Mbin (<bin_num>) + 1
<dec_num> '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'
|<dec_num>('0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9’)
Denotational semantics of a program could be defined in terms of state changes in an ideal computer. It
is defined in terms of only the values of all the program’s variables.
- The state of a program is the values of all its current program variables
Each i is the name of a variable, and associated v’s are the current values of these variables.
Let VARMAP be a function that, when given a variable name and a state, returns the current value of the
variable VARMAP (ij, s) = vj
These state changes are used to define the meanings of programs and program constructs.
Expressions
if (Me(<binary_expr>.<left_expr>, s) == undef OR
Me(<binary_expr>.<right_expr>, s) == undef)
then error
else
if (<binary_expr>.<operator> == '+' then
Me(<binary_expr>.<left_expr>, s) + Me(<binary_expr>.<right_expr>, s)
else Me(<binary_expr>.<left_expr>, s) * Me(<binary_expr>.<right_expr>, s)
Assignment Statements
An assignment statement is an expression evaluation and assigning the value to target variable.
The meaning function maps state to a state.
Ma(x := E, s) =
if Me(E, s) == error
then error
else s’ = {<i1,v1’>,<i2,v2’>,...,<in,vn’>},
where for j = 1, 2, ..., n,
if ij == x #comaparison of names
then vj’ = Me(E, s)
else vj’ = VARMAP(ij, s)
Logical Pretest Loops
• Msl and Mb map statement lists and states to states and Boolean expression to Boolean values.
Ml(while B do L, s) =
if Mb(B, s) == undef
then error
else if Mb(B, s) == false
then s
else if Msl(L, s) == error
then error
else Ml(while B do L, Msl(L, s))
• The meaning of the loop is the value of the program variables after the statements in the loop
have been executed the prescribed number of times, assuming there have been no errors
• In essence, the loop has been converted from iteration to recursion, where the recursive control
is mathematically defined by other recursive state mapping functions
- Recursion, when compared to iteration, is easier to describe with mathematical rigor
Axiomatic Semantics
Approach: Define axioms or inference rules for each statement type in the language (to allow
transformations of expressions to other expressions)
-An assertion before a statement (a precondition) states the relationships and constraints among variables
that are true at that point in execution
-An assertion following a statement is a postcondition that describes the new constraint on variable after
execution.
Example:
But the least restrictive precondition that will guarantee the validity of associated postcondition is x > 0.
So, x > 0 is the weakest precondition
Program proof process: The postcondition for the whole program is the desired results. Work
back through the program to the first statement. If the precondition on the first statement is the
same as the program spec, the program is correct.
An axiom is a logical statement that is assumed to be true. Therefore, it is an inference rule without an
antecedent.
Inference rule is a method of inferring the truth of one assertion based on the values of other assertions.
(S1, S2, ……Sn)/S if s1, s2,…..Sn are true, then the truth of S can be inferred.
To use axiomatic semantics for correctness proofs or for formal semantic specification, either an axiom
or inference rule must exist for each kind of statement in language
2. {I} B {I} (evaluation of the Boolean must not change the validity of I)
- The loop invariant I is a weakened version of the loop postcondition, and it is also a precondition.
- I must be weak enough to be satisfied prior to the beginning of the loop, but when combined with the
loop exit condition, it must be strong enough to force the truth of the postcondition