KCS301_DS_unit1_Introduction and algorithms

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

UNIT-1 (Part -1)

Topics Covered:

Introduction: Basic Terminology, Elementary Data Organization, Algorithm, Efficiency of an Algorithm,


Time and Space Complexity, Asymptotic notations: Big-Oh, Time-Space trade-off. Abstract Data Types
(ADT)

Introduction:
Data structure is the structural representation of logical relationships between elements of data i.e. Data
structure is a way of organizing data items by considering its relationship to each other. Data structure
affects the design of both the structural and functional aspects of a program.

Basic Terminology, Elementary Data Organization

Basic Terminology
 Data − Data are values or set of values.

Data Definition defines a particular data with following characteristics.

 Atomic − Definition should define a single concept


 Traceable − Definition should be able to be mapped to some data element.
 Accurate − Definition should be unambiguous.
 Clear and Concise − Definition should be understandable.

 Data Item − Data item refers to single unit of values.

 Group Items − Data item that are divided into sub items are called as Group Items.

 Elementary Items − Data item that cannot be divided are called as Elementary Items.

 Attribute and Entity − An entity is that which contains certain attributes or properties which
may be assigned values.

 Entity Set − Entities of similar attributes form an entity set.

 Field − Field is a single elementary unit of information representing an attribute of an entity.

 Record − Record is a collection of field values of a given entity.

 File − File is a collection of records of the entities in a given entity set.


Data and Data Item

Entity

Entity Set

Data Type
Data type is way to classify various types of data such as integer, string etc. Data type of two types −

 Built-in Data Type


 Derived Data Type
Built-in Data Type
Those data types for which a language has built-in support are known as Built-in Data types. For
example, most of the languages provide following built-in data types.

 Integers
 Boolean (true, false)
 Floating (Decimal numbers)
 Character and Strings
Derived Data Type
Those data types which are implementation independent as they can be implemented in one or other way
are known as derived data types. These data types are normally built by combination of primary or built-
in data types and associated operations on them. For example −

 List
 Array
 Stack
 Queue
Abstract Data Type

An abstract data type (ADT) is a mathematical model for data types where a data type is defined by its
behavior (semantics) from the point of view of a user of the data, specifically in terms of possible values,
possible operations on data of this type, and the behavior of these operations. This contrasts with data
structures, which are concrete representations of data, and are the point of view of an implementer, not a
user.

A data type can be considered abstract when it is defined in terms of operations on it, and its
implementation is hidden

Data Structures: Data Structure is a way to organized data in such a way that it can be used efficiently.

Algorithm + Data Structure = Program

Difference

Data types
Data types are classification of data in any programming language - for example integers, characters,
floats etc.
Abstract data types
An ADT is just a theoretical concept –
Its how any type should look: what operations are allowed etc. Implementation is not a part of ADT's.
Example Stacks. A stack ADT can have the operations push, pop, peek. These three operations define
what the type can be irrespective of the language of the implementation.

Primitive and Non-Primitive data Types:

Data type specifies the type of data stored in a variable. The data type can be classified into two types
Primitive data type and Non-Primitive data type

PRIMITIVE DATATYPE
The primitive data types are the basic data types that are available in most of the programming
languages. The primitive data types are used to represent single values.

The primitive data types are Integer, Float and Double, Character, String, Boolean: This is used
represent logical values either true or false.

NON-PRIMITIVE DATATYPES: The data types that are derived from primary data types are known
as non-Primitive data types. These datatypes are used to store group of values.
The non-primitive data types are

 Arrays
 Structure
 Union
 Linked list
 Stacks
 Queue etc

Characteristics of a Data Structure


 Correctness − Data Structure implementation should implement its interface correctly.

 Time Complexity − Running time or execution time of operations of data structure must be as
small as possible.

 Space Complexity − Memory usage of a data structure operation should be as little as possible.

Need of Data Structure


Types of Data Structures

Data structures:
There are two types of data structure Linear and Nonlinear Linear data structure.

Linear data structures:


In linear data structures, values are arranged in linear fashion. Arrays, lists, stacks and queue are
examples of linear data structures in which values are stored in a sequence.
 Array: An array is a collection of homogeneous data elements described by a single name.
Each element of an array is referenced by a subscripted variable or value, called subscript or
index enclosed in parenthesis. If an element of an array is referenced by single subscript,
then the array is known as one dimensional array or linear array and if two subscripts are
required to reference an element, the array is known as two dimensional arrays and so on.

 List: A list is an ordered set consisting of a varying number of elements to which insertion
and deletion can be made. List can be implemented by using pointers. Each element is
referred to as nodes; therefore a list can be defined as a collection of nodes as shown below :
a list

 Stack: It is an ordered collection of items into which new data items may be added/inserted
and from which items may be deleted at only one end, called the top of the stack. As all the
addition and deletion in a stack is done from the top of the stack, the last added element will
be first removed from the stack. That is why the stack is also called Last-in-First-out (LIFO).

 Queue:A queue is logically a first in first out (FIFO or first come first serve) linear data
structure. It is a homogeneous collection of elements in which new elements are added at one
end called rear, and the existing elements are deleted from other end called front. The basic
operations that can be performed on queue are 1. Insert (or add) an element to the queue
(push) 2. Delete (or remove) an element from a queue (pop).

 FILES: A file is typically a large list that is stored in the external memory (e.g., a magnetic
disk) of a computer. Nonlinear data structure this type is opposite to linear. The data values
in this structure are not arranged in order.
 Graph: Nonlinear data structures, Graphs representations have found application in almost
all subjects like geography, engineering and solving games and puzzles. A graph G consist
of Set of vertices V (called nodes), (V = {v1, v2, v3, v4......}) and Set of edges E (i.e., E {e1,
e2, e3......cm} A graph can be represents as G = (V, E), where V is a finite and non empty set
at vertices and E is a set of pairs of vertices called edges. Each edge ‘e’ in E is identified
with a unique pair (a, b) of nodes in V, denoted by e = [a, b].

Consider a graph, G in Fig. 9.1. Then the vertex V and edge E can be represented as: V =
{v1, v2, v3, v4, v5, v6} and E = {e1, e2, e3, e4, e5, e6} E = {(v1, v2) (v2, v3) (v1, v3) (v3,
v4), (v3, v5) (v5, v6)}. There are six edges and vertex in the graph.

 Tree: Many real life problems can be represented and solved using trees. Trees are very
flexible, versatile and powerful non-liner data structure. A tree is an ideal data structure for
representing hierarchical data. A tree can be theoretically defined as a finite set of one or
more data items (or nodes) such that: There is a special node called the root of the tree.
Removing nodes (or data item) are partitioned into number of mutually exclusive (i.e.,
disjoined) subsets each of which is itself a tree, are called sub tree.

Selection of DataStructure:There are many considerations to be taken into account when choosing the
best data structure for a specific program

 Size of data.
 Speed and manner data use.
 Data dynamics, as change and edit.
 Size of required storage.
 Fetch time of any information from data structure.
Algorithm:

An algorithm is a step by step procedure to solve a problem. In normal language, the algorithm is
defined as a sequence of statements which are used to perform a task. In computer science, an
algorithm can be defined as follows...
An algorithm is a sequence of unambiguous instructions used for solving a problem, which
can be implemented (as a program) on a computer.
Algorithms are used to convert our problem solution into step by step statements. These
statements can be converted into computer programming instructions which form a program.
This program is executed by a computer to produce a solution. Here, the program takes required
data as input, processes data according to the program instructions and finally produces a result
as shown in the following picture.

Specifications of Algorithms
Every algorithm must satisfy the following specifications...

1. Input - Every algorithm must take zero or more number of input values from external.
2. Output - Every algorithm must produce an output as a result.
3. Definiteness - Every statement/instruction in an algorithm must be clear and
unambiguous (only one interpretation).
4. Finiteness - For all different cases, the algorithm must produce a result within a finite
number of steps.
5. Effectiveness - Every instruction must be basic enough to be carried out and it also must
be feasible.

Example of an Algorithm
Let us consider the following problem for finding the largest value in a given list of values.
Problem Statement: Find the largest number in the given list of numbers?
Input: A list of positive integer numbers. (List must contain at least one number).
Output: The largest number in the given list of positive integer numbers.

Consider the given list of numbers as 'L' (input) and the largest number as 'max' (Output).
Algorithm

1. Step 1: Define a variable 'max' and initialize with '0'.


2. Step 2: Compare first number (say 'x') in the list 'L' with 'max', if 'x' is larger than 'max',
set 'max' to 'x'.
3. Step 3: Repeat step 2 for all numbers in the list 'L'.
4. Step 4: Display the value of 'max' as a result.

Code using C Programming Language


int findMax(L)
{
int max = 0,i;
for(i=0; i < listSize; i++)
{
if(L[i] > max)
max = L[i];
}
return max;
}

Efficiency of an Algorithm:

A measure of the average execution time necessary for an algorithm to complete work on a set
of data. Algorithm efficiency is characterized by its order. Typically a bubble
sort algorithm will have efficiency in sorting N items proportional to and of the order of N 2,
usually written O(N 2)

Time and Space Complexity:

There are two main complexity measures of the efficiency of an algorithm:

 Time complexity is a function describing the amount of time an algorithm takes in terms
of the amount of input to the algorithm. "Time" can mean the number of memory
accesses performed, the number of comparisons between integers, the number of times
some inner loop is executed, or some other natural unit related to the amount of real-time
the algorithm will take. We try to keep this idea of time separate from "wall clock" time
since many factors unrelated to the algorithm itself can affect the real-time (like the
language used, type of computing hardware, proficiency of the program, optimization in
the compiler, etc.). It turns out that, if we chose the units wisely, all of the other stuff
doesn't matter and we can get an independent measure of the efficiency of the algorithm.
 Space complexity is a function describing the amount of memory (space) an algorithm
takes in terms of the amount of input to the algorithm. We often speak of "extra" memory
needed, not counting the memory needed to store the input itself. Again, we use natural
(but fixed-length) units to measure this. We can use bytes, but it's easier to use, say,
number of integers used, number of fixed-sized structures, etc. In the end, the function we
come up with will be independent of the actual number of bytes needed to represent the
unit. Space complexity is sometimes ignored because the space used is minimal and/or
obvious, but sometimes it becomes as important an issue as time.

Asymptotic notations: Big Oh, Big Theta and Big Omega

Whenever we want to perform an analysis of an algorithm, we need to calculate the complexity


of that algorithm. But when we calculate the complexity of an algorithm it does not provide the
exact amount of resource required. So instead of taking the exact amount of resource, we
represent that complexity in a general form (Notation) which produces the basic nature of that
algorithm. We use that general form (Notation) for the analysis process.

Asymptotic notation of an algorithm is a mathematical representation of its complexity.

Majorly, we use three types of asymptotic notations and those are as follows:

1. Big - Oh (O)
2. Big - Omega (Ω)
3. Big - Theta (Θ)

Big - Oh Notation (O)


Big - Oh notation is used to define the upper bound of an algorithm in terms of Time
Complexity.
That means Big - Oh notation always indicates the maximum time required by an algorithm for
all input values.
That means Big - Oh notation describes the worst case of an algorithm time complexity.
Big - Oh Notation can be defined as follows.
Consider function f(n) as time complexity of an algorithm and g(n) is the most significant
term. If f(n) <= C g(n) for all n >= n0, C > 0 and n0 >= 1. Then we can represent f(n) as
O(g(n)).

f(n) = O(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-
Axis and time required is on Y-Axis

In above graph after a particular input value n0, always C g(n) is greater than f(n) which indicates
the algorithm's upper bound.
Example
Consider the following f(n) and g(n).
f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as O(g(n)) then it must satisfy f(n) <= C g(n) for all values of C >
0 and n0>= 1
f(n) <= C g(n)
⇒3n + 2 <= C n
Above condition is always TRUE for all values of C = 4 and n >= 2.
By using Big - Oh notation we can represent the time complexity as follows...
3n + 2 = O(n)

Big - Omega Notation (Ω)


Big - Omega notation is used to define the lower bound of an algorithm in terms of Time
Complexity.
That means Big-Omega notation always indicates the minimum time required by an algorithm
for all input values.
That means Big-Omega notation describes the best case of an algorithm time complexity.
Big - Omega Notation can be defined as follows:

Consider function f(n) as time complexity of an algorithm and g(n) is the most significant
term. If f(n) >= C g(n) for all n >= n0, C > 0 and n0 >= 1. Then we can represent f(n) as
Ω(g(n)).

f(n) = Ω(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-
Axis and time required is on Y-Axis
In above graph after a particular input value n0, always C g(n) is less than f(n) which indicates
the algorithm's lower bound.
Example
Consider the following f(n) and g(n)...
f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as Ω(g(n)) then it must satisfy f(n) >= C g(n) for all values of C >
0 and n0>= 1
f(n) >= C g(n)
⇒3n + 2 >= C n
Above condition is always TRUE for all values of C = 1 and n >= 1.
By using Big-Omega notation we can represent the time complexity as follows...
3n + 2 = Ω(n)

Big - Theta Notation (Θ)


Big - Theta notation is used to define the average bound of an algorithm in terms of Time
Complexity. That means Big - Theta notation always indicates the average time required by an
algorithm for all input values.
That means Big - Theta notation describes the average case of an algorithm time complexity.
Big - Theta Notation can be defined as follows:

Consider function f(n) as time complexity of an algorithm and g(n) is the most significant
term. If C1 g(n) <= f(n) <= C2 g(n) for all n >= n0, C1 > 0, C2 > 0 and n0 >= 1. Then we can
represent f(n) as Θ(g(n)).

f(n) = Θ(g(n))
Consider the following graph drawn for the values of f(n) and C g(n) for input (n) value on X-
Axis and time required is on Y-Axis
In above graph after a particular input value n0, always C1 g(n) is less than f(n) and C2 g(n) is
greater than f(n) which indicates the algorithm's average bound.

Example
Consider the following f(n) and g(n)...
f(n) = 3n + 2
g(n) = n
If we want to represent f(n) as Θ(g(n)) then it must satisfy C1 g(n) <= f(n) <= C2 g(n) for all
values of C1 > 0, C2 > 0 and n0>= 1
C1 g(n) <= f(n) <= C2 g(n)
⇒C1 n <= 3n + 2 <= C2 n
Above condition is always TRUE for all values of C1 = 1, C2 = 4 and n >= 2.
By using Big - Theta notation we can represent the time complexity as follows...
3n + 2 = Θ(n)
Common Asymptotic Notations
Following is a list of some common asymptotic notations:
Constant − Ο(1)

Logarithmic − Ο(log n)

Linear − Ο(n)

n log n − Ο(n log n)

Quadratic − Ο(n2)

Cubic − Ο(n3)
Polynomial − nΟ(1)

Exponential − 2Ο(n)

Time-Space Trade-off:

The best algorithm to solve a given problem is one that requires less space in memory and takes
less time to complete its execution. But in practice, it is not always possible to achieve both these
objectives. As we know there may be more than one approach to solve a particular problem. One
approach may take more space but takes less time to complete its execution while the other
approach may take less space but takes more time to complete its execution. We may have to
sacrifice one at the cost of the other. If space is our constraint, then we have to choose a program
that requires less space at the cost of more execution time. On the other hand, if time is our
constraint then we have to choose a program that takes less time to complete its execution at the
cost of more space. That is what we can say that there exists a time-space tradeoff among
algorithms.

Abstract Data Types (ADT):

Abstract data type (ADT) is a mathematical model with a collection of operations defined on
that model. The abstract data type encapsulates a data type in the sense that the definition of the
type and all operations can be localized and are not visible to the user of the ADT. To the user
the declaration of the ADT and its operations are important. Implementation of abstract data type
is the translation into statements of programming language which chooses the data structure to
represent the abstract data type. For example class, structures, union, enumerated data types are
abstract data types.

ADT (Abstract Data Types) are the data types which are made up of or composed of primitive
data types as these ADT can be application or implementation specific, they could be created on
a need basis.

Here are some examples.

 stack: operations are "push an item onto the stack", "pop an item from the stack", "ask
if the stack is empty"; an implementation may be as an array, linked list, etc.
 queue: operations are "add to the end of the queue", "delete from the beginning of the
queue", "ask if the queue is empty"; an implementation may be as an array or linked
list or heap.
 search structure: operations are "insert an item", "ask if an item is in the structure", and
"delete an item"; an implementation may be as an array, linked list, tree, a hash table.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy