Record&Arrays
Record&Arrays
Records(structures) Types
• Record types allows related data of
heterogeneous types to be stored and
manipulated together
• Algol68,C,common LISP use the structure instead
of record
• Fortran 90 simply calls its records “types”
• Structures in C++ are defined as a special form of
class.
• Java uses classes in all cases
• C# uses a reference model for variables of class
types,and value model for variables of struct types
Records and Variants contd..
PASCAL: C:
ML:
type element = struct element
type element =
record {
{
name : array[1..2] of char name[2];
name: string,
char; int
atomic_number:
atomic_number : atomic_numbe
int,
integer; r;
atomic_weight:
atomic_weight : real; double
real
metallic :Boolean atomic_weight;
metallic :Boolean
_Bool metallic
};
end; };
Records contd..
Each of the record components is known as
fields
To refer to given field of a record ,most language
uses “dot” notation.
Ex: element copper;
copper.name[0]=‘C’; copper.name[1]=‘u’;
ML differs from most languages in specifying that
the order of record fields is insignificant
ML record value example
{name=“Cu”,atomic_number=29,atomic_weight=63.5
Records (Structures)
• Memory layout and its impact
(structures)
Figure 7.2 Likely memory layout for packed element records(putting the
fields together ,without holes). The atomic_number and atomic_weight fields are
nonaligned, and can only be read or written (on most machines) via
multi-instruction sequences.
Records (Structures)
• Memory layout and its impact
(structures)
char name[2];
int age;
boolean scolarship;
}
University Question
UQ: How are records represented in programming
languages?
PASCAL RECORD
Record types allows related data of heterogeneous
type element =
record
types to be stored and manipulated together.
name : array[1..2]
of char;
Each of the record components is known as fields
atomic_number :
integer;
To refer to given field of a record , atomic_weight :
real;
most language uses “dot” notation. metallic :Boolean
copper.name[0]=‘C’; copper.name[1]=‘u’;
Arrays
Arrays
• Arrays are the most common and important
composite data types
• An array is an aggregate of homogeneous data
elements in which an individual element is identified
by its position in the aggregate, relative to the first
element.
• A reference to an array element in a program often
includes one or more non constant subscripts.
• Such references require a run-time calculation to
determine the memory location being referenced.
Arrays & indexes
•Indexing is a mapping from indices to elements.
• The mapping can be shown as:
map(array_name, index_value_list) → an element
• C-based languages use [ ] to delimit array indices.
•Two distinct types are involved in an array type:
o The element type , and
o The type of the subscripts .
Ex: A(3) in FORTRAN and Ada, A[3] in Pascal and C
Array requirements
• Subscript Types:
FORTRAN, C - int only
Pascal - any ordinal type (int, boolean, char, enum)
Ada - int or enum (includes boolean and char)
Java - integer types only
• Index range checking
C, C++, Perl, and Fortran do not specify range
checking
Java, ML, C# specify range checking
In Ada, the default is to require range checking, but
it can be turned off
Arrays: Declaration
• In C:
char upper[26];
• In Pascal:
var upper : array [‘a’..’z’] of char;
• In Fortran:
character, dimension (1:26) :: upper
character (26) upper // shorthand
1.Contiguous elements
a)column major: consecutive memory location
hold elements that differ by 1 in initial
subscript.
• A[2,4] is followed by A[3,4]
• only in Fortran
b)row major: consecutive memory location hold
elements that differ by 1 in final subscript
• so A[2,4] is followed by A[2,5] in memory
• used by everybody else.
Arrays
Figure Row- and column-major memory layout for two-dimensional arrays. In row-major order, the
elements of a row are contiguous in memory; in column-major order, the elements of a column are
contiguous. The second cache line of each array is shaded, on the assumption that each element is
an eight-byte floating-point number, that cache lines are 32 bytes long (a common size), and that
the array begins at a cache line boundary. If the array is indexed from A[0,0] to A[9,9], then in
the row-major case elements A[0,4] through A[0,7] share a cache line; in the column-major case
elements A[4,0] through A[7,0] share a cache line.
Array Layout
2.Row pointers
• an option in C
• allows rows to be put anywhere - nice for big
arrays on machines with segmentation
problems
• avoids multiplication
• nice for matrices whose rows are of
different lengths
• e.g. an array of strings
• requires extra space for the pointers
Arrays
Figure Contiguous array allocation v. row pointers in C. The declaration on the left is a
true two-dimensional array. The slashed boxes are NUL bytes; the shaded areas are holes. The
declaration on the right is a ragged array of pointers to arrays of character s. In both
cases, we have omitted bounds in the declaration that can be deduced from the size of the
initializer (aggregate). Both data structures permit individual characters to be accessed
using double subscripts, but the memory layout (and corresponding address arithmetic) is
quite different.
Array Operations
•The most common array operations are
assignment, catenation, comparison for
equality and inequality, and slices
•The C-based languages do not provide any array
operations, except through the methods of Java,
C++, and C#.
•Perl supports array assignments but does not
support comparisons.
Ada allows array assignments, and also provides
catenation,
specified by the ampersand (&).
•Python provides array assignment, although it
is only a reference change. Python also has
operations for array catenation (+) and
element membership (in).
•It includes two different comparison
operators: one that determines whether the
two variables reference the same object (is)
and one that compares all corresponding
objects in the referenced objects, regardless of
how deeply they are nested, for equality (==).
•Fortran 95+ includes a number of
array operations that are called
elemental because they are
operations between pairs of array
elements.
•For example, the add operator (+)
between two arrays results in an
array of the sums of the element
pairs of the two arrays.
•F# includes many array operators in
its Array module. Among these are
Array.append, Array.copy, and
Array.length.
• In APL, the four basic arithmetic
operations are defined for vectors
(single-dimensioned arrays) and
matrices, as well as scalar operands.
•For example, A + B
is a valid expression, whether A and B
APL includes a collection of unary operators for vectors and
matrices,
some of which are as follows (where V is a vector and M is a
matrix):
Rectangular and Jagged
Arrays
•A rectangular array is a multidimensioned
array in which all of the rows have the same
number of elements and all of the columns have the
same number of elements.
A jagged array is one in which the lengths of
the rows need not be the same. For example, a
jagged matrix may consist of three rows, one with 5
elements, one with 7 elements, and one with 12
elements.
•C, C++, and Java support jagged arrays but not
rectangular arrays.
•In those languages, a reference to an
element of a multidimensioned array uses a
separate
pair of brackets for each dimension. For
example, myArray[3][7]
Fortran, Ada, C#, and F# support
rectangular arrays. (C# and F# also
support jagged arrays.) In these cases, all
subscript expressions in references to
elements are placed in a single pair of
brackets.
For example,
Arrays: Slice
• A slice or section is a rectangular portion of an
array
• A slice is some substructure of an array;
nothing more than a referencing
mechanism
• Slice Examples:Consider the following Python
declarations:
• vector = [2, 4, 6, 8, 10, 12, 14, 16]
• mat = [[1, 2, 3],[4, 5, 6],[7, 8, 9]]
• The syntax of a Python slice reference is a pair
of numeric expressions separated by a colon.
•vector[3:6] is a three-element array with the fourth
through sixth elements of vector (those elements with
the subscripts 3, 4, and 5)
•. row of a matrix is specified by giving just one
subscript.
•For example, mat[1] refers to the second row of mat;
•a part of a row can be specified with the same syntax
as a part of a single dimensioned array.
•For example, mat[0][0:2]
refers to the first and second element of the first row
of mat, which is [1, 2].
•Python also supports more complex slices of arrays.
•For example, vector[0:7:2] references every other
element of vector, up to but not including the
element with the subscript 7, starting with the
subscript 0, which is [2, 6, 10, 14].
•Perl supports slices of two forms, a list of specific
subscripts or a range of subscripts. For example,
@list[1..5] = @list2[3, 5, 7, 9, 13];
Array Categories
static array
fixed stack-dynamic array
Stack-dynamic
Fixed heap-dynamic
Heap-dynamic
• Static array: subscript ranges are statically bound
and storage allocation is static (before run-time)
• Advantage: efficiency (no dynamic allocation)
Ex: Arrays declared in C & C++ function that includes the
static modifier are static
Array Categories
• Fixed stack-dynamic: subscript ranges are statically bound,
but the allocation is done at elaboration time during execution.
• Advantages: Space efficiency. A large array in one subprogram
can use the same space as a large array in different
subprograms.
• Ex: Arrays declared in C & C++ function without the static
modifier are fixed stack-dynamic arrays.
A stack-dynamic array is one in which the subscript ranges
are dynamically bound, and the storage allocation is
dynamic “during execution.” Once bound they remain fixed
during the lifetime of the variable.
• Advantages: Flexibility. The size of the array is not known
until the array is about to be used.
Array Categories
S3 = size of elem-type
S2 = (U3-L3+1)* S3 (* size of a row *)
S1 = (U2-L2+1) * S2 (* size of a plane *)
Figure Virtual location of an array with nonzero lower bounds. By computing the constant portions of an array index at
compile time, we effectively index into an array whose starting address is offset in memory, but whose lower bounds are all
zero.
Arrays: Address calculations
• Given an array [ 1..8, 1..5, 1..7 ] of integers. Calculate
address of element A[5,3,6], by using rows and columns
methods, if BA=900?
Solution:-
A[i ,j,k]=Address of A+(i-L1)*S1+(j-L2)*S2 +(k-L3)*S3
Let i=5, j=3, k=6,L1=L2=L3=1, U1=8,U2=5,U3=7
S3 = 4 (size of element type)
S2=(U3-L3+1)*S3=(7-1+1)*4=28
S1=(U2-L2+1)*S2=(5-1+1)*28=140
Location(A[5,3,6])= 900 +(5-1)*S1+(3-1)*S2+(6-1)*S3
=900+4*140+2*28+5*4
=1536
University Question
UQ: What are the memory layouts used in arrays? How
the address calculation is done in three dimensional
arrays?
String
Strings
• The syntax is the same: characters in quotes.
• Pascal has one kind of quotes, Ada has two:
'A' is a character, "A" is a string.
The allowed length of strings is a design issue:
fixed-length strings—Pascal, Ada, Fortran;
variable-length strings—C, Java, Perl.
A character may be treated as a string of length 1, or
as a separate data structure.
Many languages (Pascal, Ada, C, Prolog) treat strings
as special cases of arrays or lists
String operations
Typical operations on strings
string string string //concatenation
string int int string //substring
string characters //decompose
into an array or list
characters string //convert an
array or list into a string
String operations
string integer // length
string boolean //is it empty?
string string boolean //equality,
ordering