Abstract Data Types - Ics2105
Abstract Data Types - Ics2105
RECALL
DATA TYPES
Recall: A type is a named construct that specify a set of values and a set of allowable operations that can be
carried out on them. A data type is a classification identifying one of various types of data, such as real-
valued, integer (or whole numbers) or Boolean (true or false values), dates.
Determines the possible values for that type
The operations that can be done on values of that type
The meaning of the data.
The way values of that type can be stored.
TYPE CLASSES
i) Primitive types
Primitive data types are predefined types of data, which are supported by the programming language (basic or
built in types). For example numeric types, character and string and Boolean types are all primitive data types.
Programmers can use these data types when creating variables in their programs.
ii) Composite types
Types derived from more than one primitive type. This can be done in a number of ways. The ways they are
combined are called data structures. Composing a primitive type into a compound type generally results in a
new type, common composite types are:
Arrays : stores a number of elements of the same type in a specific order accessed using an integer
index. May be fixed-length or expandable.
Record ( tuple or struct) Records are among the simplest data structures. A record is a value that
contains other values, typically in fixed number and sequence and typically indexed by names. The
elements of records are usually called fields or members.
Union : A union type definition will specify which of a number of permitted primitive types may be
stored in its instances, e.g. "float or long integer". Contrast with a record, which could be defined to
contain a float and an integer; whereas, in a union, there is only one value at a time.
Tagged unions (also called a variant, variant record, discriminated union, or disjoint union) contains an
additional field indicating its current type, for enhanced type safety.
Set :abstract data structure that can store certain values, without any particular order, and no repeated
values. Values themselves are not retrieved from sets, rather one test a value for membership to obtain a
boolean "in" or "not in".
Objects : contains a number of data fields, like a record, and also a number of program code fragments
for accessing or modifying them. Data structures not containing code, like those above, are called plain
old data structure.
iii) Abstract types
Types that do not specify an implementation. For instance, a stack (which is an abstract type) can be
implemented as an array (a contiguous block of memory containing multiple values), or as a linked list (a set of
non-contiguous memory blocks linked by pointers).
Data Structures Lecture Notes: Lecture One: Abstract Data Types Isaiah Mulang’
isaiah.mulang@jkuat.ac.ke 0711250239
Abstract types can be handled by code that does not know or "care" what underlying types are contained in
them. Programming that is agnostic about concrete data types is called generic programming. Arrays and
records can also contain underlying types, but are considered concrete because they specify how their contents
or elements are laid out in memory.
Examples include: smart pointer is the abstract counterpart to a pointer, Hash or dictionary or map, queues,
stacks, trees, graphs.
MODULARITY
A design technique of breaking down a program into smaller, more manageable subtasks/subsections called
modules: in structural programming this is achieved through functions and procedures or blocks.
A module therefore is a contiguous section of code that can be separated from the overall code and performs a
specific task given specific resources and produces specific results.
Modularity is a technique that keeps the complexity of a large program manageable by systematically
controlling the interaction of its components. You can focus on one task at a time in a modular program without
other distractions.
A module is any named program unit that can be implemented as an independent entity.
A well designed module has a single purpose, and presents a narrow interface to other modules.
Modularity is mainly achieved via abstraction.
Modules include
A single, stand-alone function
A method of a class
A class
Several functions or classes working closely together
Other blocks of code
Advantages of modularization
Makes code easy to write & read.
Modularized code is easy to debug.
Isolates errors and eliminates redundancies.
ABSTRACTION
A mode of thought by which we concentrate on the general ideas rather than on specific manifestations of the
idea: In programming, abstraction is the distinction made between what a piece of code does and how it’s is
implemented e.g. in C++ consider the distinction between a .h file (what the program does) and a .c file (how
the program does it i.e. it’s implementation)
TYPES OF ABSTRACTION
There are three major types of abstraction:
i) Procedural / Functional Abstractions
Data Structures Lecture Notes: Lecture One: Abstract Data Types Isaiah Mulang’
isaiah.mulang@jkuat.ac.ke 0711250239
Concentrating on what the function does and what it requires to perform the task rather than the specific
steps (the how) it undertakes to complete the task. This is achieved by: function prototyping, function
calls and parameter listings (REFF: Supplement 1: Procedural Abstraction ).
For example, suppose that a program needs to operate on a sorted array of names the program may, for
instance, need to search the array for a given name or display the names in alphabetical order. The
program thus needs a function S that sorts an array of names. Although the rest of the program knows
that function S will sort an array, it should not care how S accomplishes its task.
ii) Data Abstraction
Asks that you think in terms of what you can do to a collection of data independently of how you do it.
Data abstraction is a technique that allows you to develop each data structure in relative isolation from the
rest of the solution. The other modules of the solution will “know” what operations they can perform on the
data, but they should not depend on how the data is stored or how the operations are performed. Again, the
terms of the contract are what and not how. Thus, data abstraction is a natural extension of functional
abstraction.
iii) Control Abstraction (will be discussed under topic “Recursion”)
Terms
*Information Hiding: Refers to the shielding of data or information from outside entities/ unauthorized access,
modification or manipulation.
*Encapsulation: Refers to wrapping up together the data and the operations for accessing, manipulating them
into a single entity.
The description of an ADT’s operations must be rigorous enough to specify completely their effect on the data,
yet it must not specify how to store the data nor how to carry out the operations. For example, the ADT
operations should not specify whether to store the data in consecutive memory locations or in disjoint memory
locations. You choose a particular data structure when you implement an ADT.
Recall that a data structure is a construct that you can define within a programming language to store a
collection of data. For example, arrays and structures, which are built into C++, are data structures. However,
you can invent other data structures.
Note
Review Knowledge of arrays and structs as far as composite types are concerned: this will be done in class
Data Structures Lecture Notes: Lecture One: Abstract Data Types Isaiah Mulang’
isaiah.mulang@jkuat.ac.ke 0711250239
Abstract Data Types
Contents
Abstract data types
Representation independence o Rep exposure
Abstraction function & rep invariant
In this lecture, we look at a powerful idea, abstract data types, which enable us to separate how we use a data
structure in a program from the particular form of the data structure itself. Abstract data types address a
particularly dangerous dependen representation. We’ll see why this is dangerous and how classification of
operations, and some principles of good design for abstract data types.
Information hiding. Hiding details of a module’s implementation from the rest of the system, so that
the details can be changed later without changing the rest of the system.
Separation of concerns. Making a feature (or “concern”) the responsibility of a single module, rather
than spreading it across multiple modules.
As a software engineer, you should know these terms, because you will run into them frequently. The
fundamental purpose of all of these ideas is to help achieve the three important properties that we care about
like safety from bugs, ease of understanding, and readiness for change.
User-Defined Types
In the early days of computing, a programming language came with built -in types (such as integers, booleans,
strings, etc.) and built-in procedures, e.g. for input and output. Users could define their own procedures: that’s
how large programs were built.
A major advance in software development was the idea of abstract types: that one could design a programming
language to allow user-defined types too. This idea came out of the work of many researchers, notably Dahl (the
inventor of the Simula language), Hoare (who developed many of the techniques we now use to reason about
abstract types), Parnas (who coined the term information hiding and first articulated the idea of organizing
program modules around the secrets they encapsulated), and at MIT, Barbara Liskov and John Guttag, who did
seminal work in the specification of abstract types, and in programming language support for.
Data Structures Lecture Notes: Lecture One: Abstract Data Types Isaiah Mulang’
isaiah.mulang@jkuat.ac.ke 0711250239
The key idea of data abstraction is that a type is characterized by the operations you can perform on it. A
number is something you can add and multiply; a string is something you can concatenate and take substrings
of; a boolean is something you can negate, and so on. In a sense, users could already define their own types in
early programming languages: you could create a record type date, for example, with integer fields for day,
month and year. But what made abstract types new and different was the focus on operations: the user of the
type would not need to worry about how its values were actually stored, in the same way that a programmer can
ignore how the compiler actually stores integers. All that matters is the operations.
In Java, as in many modern programming languages, the separation between built-in types and user-defined
types is a bit blurry. The classes in java.lang, such as Integer and Boolean are built-in; whether you regard
all in the collections of java.util as built-in is less clear (and not very important anyway). Java complicates
the issue by having primitive types that are not objects. The set of these types, such as int and boolean,
cannot be extended by the user.
These show informally the shape of the signatures of operations in the various classes. Each T is the abstract
type itself; each t is some other type. In general, when a type is shown on the left, it can occur more than once.
For example, a producer may take two values of the abstract type; string concat takes two strings. The
occurrences of t on the left may also be omitted; some observers take no non-abstract arguments (e.g., size), and
some take several.
Data Structures Lecture Notes: Lecture One: Abstract Data Types Isaiah Mulang’
isaiah.mulang@jkuat.ac.ke 0711250239
Here are some examples of abstract data types, along with their operations:
int is C++/ Java’s primitive integer type. int is immutable, so it has no mutators.
This classification gives some useful terminology, but it’s not perfect. In complicated data types, there may be
an operation that is both a producer and a mutator, for example. Some people use the term producer to imply
that no mutation occurs.
Data Structures Lecture Notes: Lecture One: Abstract Data Types Isaiah Mulang’
isaiah.mulang@jkuat.ac.ke 0711250239
iii. Adequate set of operations
The set of operations should be adequate; there must be enough to do the kinds of computations clients are
likely to want to do. A good test is to check that every property of an object of the type can be extracted. For
example, if there were no get operation, we would not be able to find out what the elements of a list are. Basic
information should not be inordinately difficult to obtain. The size method is not strictly necessary for List,
because we could apply get on increasing indices until we get a failure, but this is inefficient and inconvenient.
2. Representation Independence
A good abstract data type should be representation independent. This means that the use of an abstract type is
independent of its representation (the actual data structure or data fields used to implement it), so that changes
in representation have no effect on code outside the abstract type itself. For example, the operations offered by
List are independent of whether the list is represented as a linked list or as an array.
You won’t be able to change the representation of an ADT at all unless its operations are fully specified with
preconditions (requires), postconditions (effects), and frame conditions (modifies), so that clients know what to
depend on, and you know what you can safely change.
3. Preserving Invariants
Finally, and perhaps most important, a good abstract data type should preserve its own invariants. An invariant
is a property of a program that is always true. Immutability is one crucial invariant that we have already
encountered: once created, an immutable object should always represent the same value, for its entire lifetime.
When an ADT preserves its own invariants, reasoning about the code becomes much easier. If you can count on
the fact that Strings never change, you can rule out that probability when you are debugging code that uses
Strings — or when you’re trying to establish an invariant for another ADT. Contrast that with a string class
that guarantees that it will be immutable only if its clients promise not to change it. Then you’d have to check
all the places in the code where the string might be used.
Data Structures Lecture Notes: Lecture One: Abstract Data Types Isaiah Mulang’
isaiah.mulang@jkuat.ac.ke 0711250239