Unit - 4 CTPS

Unit 4
Text Processing and Abstraction

Text Processing – String basics – String operations – Indexing – length –
concatenation – naming – substring – searching; Abstraction – Class Diagram -
Use case diagram – Activity diagram - Selection – Repetition – Control
abstraction in activity diagram – States and State diagram; Spread sheets –
Structure – Numbers – Operators – Cell references – Functions; Problems:
Keyword searching in Text--Text Line Length adjustment-Summation of Set of
Numbers-Generation of Fibonacci Sequence-Generating prime numbers-
Raising a number to larger power
4.1 Text Processing
What is text processing?
 Information stored on computers involves numbers, the majority of data
is textual rather than numeric.
 Example: your name, social security number, address, and Facebook
status are all textual in nature.
 Computer programmers use the term string when referring to textual data.
 A string is simply a piece of text, or more formally, an ordered sequence
of individual characters.
 A character is usually a letter of the alphabet, but a character might also
be a punctuation symbol such as a comma, semicolon, or question mark.
 A character might even be a nonprintable character such as a tab or a
linefeed. The length of a string is the number of characters contained in
the string.
 The length of a string may be zero, if the string contains no characters,
and it may be larger than zero.
 No string has a negative length since it is not possible to have a sequence
that contains a negative number of characters.
4.2 String Basics
 In most programming languages, string data is denoted by using double
quotes to surround the text. For example, “Hello” is a string having five
characters.
 Any sequence of characters that is enclosed by double-quotes is known as
a string literal.
 The double quotes are not part of the string itself; they simply serve to
notify the computer that the enclosed text is a string literal.
 String are represented by character array.
 The characters in a string are indexed such that the first character has an
index of 0, the second character has an index of 1, and so on.
 If we again consider the string “Hello”, we note that the character at
index 0 is H, the character at index 1 is e, and the character at index 4 is o.
 Note that the length of the string “Hello” is 5 while the largest index is 4.
This observation suggests that for any string of length N, the largest valid
index is N – 1.
 Since the smallest possible index is always 0, we note that for any string
of length N, the only valid indices are in the interval 0 to N – 1.
4.2.1 String Operations

Indexing
 Strings are often analyzed by using indices to access the individual
characters of the string.
 Given a string literal we can access the character at a particular index by
using a bracket notation. This is referred to as an indexing operation.
 for example, the fourth letter of the string literal “popcorn”.
 Since the fourth character has an index of 3, we write the number 3 inside
of the brackets following the string literal.
 This expression produces a string containing one letter, a lowercase c. Of
course, if the index we write inside of the brackets is not valid, the
resulting expression will be understood as an error.
 For example, the expression “popcorn”[15] produces an error since the
only valid indices are in the interval 0 to 6.
Length
 We can obtain the length of a string literal by typing a “. length” after the
string.
 Figure shows how we can obtain the length of the string literal “popcorn”.
 The expression “popcorn”. length produces the number 7 since the
length of the string literal “popcorn” is 7.
 In other words, the code “popcorn”.length is interchangeable with the
value produced by the code, the number 7.
Concatenation
 String concatenation is another common string processing operation.
 String concatenation is an operation that takes two strings and splices
them to form a third string as output.
 String concatenation is usually expressed, oddly enough, as a plus symbol
(+).
 Although we usually think of the plus symbol as referring to the
mathematical addition of two numbers, the plus symbol is also employed
to concatenate two strings.
 The two strings “pop” and “corn” can be concatenated to produce the
string “popcorn”.
Naming
 Variables are bound to data through a name binding operation using
the left-arrow symbol (←).
 On the left of this symbol must be a variable name and a value must
occur on the right of the arrow.
 In this sequence of actions we tell the computer to (1) bind the name x
to the string literal “pop”, (2) bind the name y to the string literal
“corn” and (3) bind the name z to the string “popcorn”, a string that is
produced by concatenating the strings referred to by the variables x
and y.
Substring
 The substring function allows us to obtain part of a string if we know
the indices of the first and last characters that we want to extract from
the string.
 The function substring is applied to the x variable, which is bound to
the string “computational thinking”.
 We are telling the computer to give us the sequence of characters
starting from the character at index 3 and ending with the character at
index 5.
Searching
 The indexOf function searches a string literal for a character and
returns the index of the first occurrence of the character.
 In this example, the value produced by this code is the number 3 since
that first lowercase c occurs at index 3 in the string literal “popcorn”.
 In other words, the code “popcorn”.indexOf(“o”) is interchangeable
with the number 3, the data the code produces.
4.3 Abstraction
 An abstraction is anything that allows us to concentrate on important
characteristics while deemphasizing less important, perhaps distracting,
details.
 For example, you are using abstraction if you tell someone that you just
saw a red Corvette. In this case the color and model of the car were
considered to be the most important, while details of the model year,
engine displacement, wheel dimensions, and so forth are omitted.
Abstraction- Class diagram control structure

A control structure is a mechanism for specifying the proper order in which
instructions must be performed. Algorithms are composed of five fundamental
control structures.
 Sequential control
 Selection
 Repetition
 Control abstraction
1. Sequential control ensures that one instruction executes before

another.
2. Selection occurs whenever the algorithm must choose (select)
among different options.
3. Repetition consists of executing one or more instructions
repeatedly.
4. Control abstraction occurs when one instruction in an algorithm
consists of a reference to executing a sub algorithm.
 Data abstraction is no less important than control abstraction. A standard

way to diagram data abstraction is known as a class diagram.
 Class diagrams use a rectangle to denote a single class of objects. The

class diagram rectangle abstracts the group of objects in terms of two
things.
 Attributes
 Operations
The theory behind this kind of data abstraction is that objects in our world can
be explained in terms of (1) what they are and (2) what they can do. An object’s
attributes capture what it is, and operations that can be performed upon the
object define what it can do. For example, consider the class diagram for a
thermostat.
Thermostat class diagram.

 The rectangle on the left abstractly describes the thermostat pictured
to its right. A class diagram rectangle consists of three parts,
diagrammed in three horizontal compartments. The name of the class
of objects (in this case Thermostat) is in the top compartment. The
middle compartment lists attributes. The thermostat can be
abstracted into three attributes:
(1) the current position of the upper left switch (COOL, OFF, or
HEAT);
(2) the current position of the upper-right fan switch (either ON or
AUTO);
(3) the current setting of the rotary temperature dial.
Abstraction: Use Case Diagrams
 Abstraction is useful during software design both for expressing the
instructions of an algorithm and for describing algorithm’s data.
 Actors of a use case diagram are grouped into roles. The reason for
the different roles is to group users according to their shared actions.
 Use case diagrams are a technique for depicting a system software
system or some other system by way of interaction between
computer users and a system.
 The two main components of a use case diagram are actors and use
cases. Each actor represents a group of users of similar type, and a
use case is an action that can be performed by the system.
 Lines are drawn from each actor to the particular actions that this
type of user can perform.
 Simple use case diagram that describes the work of a cab driver. The
single actor shown in this diagram is the cab driver and there are
four use cases to abstract four significant actions performed by the
actor.
Cab driver use case diagram.

 Use case diagram for a more complex system: a computer program
for online student course registration. In this case a rectangle has
been drawn around the actions to indicate that they collectively
represent the functionality of the registration system.
Registration system use case diagram.
 Use case diagrams may also depict relationships between actions.
 Two of the more common relationships are labeled «extend» and
«include».
Grocery store use case diagram.
 An «extend» relationship occurs whenever one action is an extension
or specialized version of another. An «include» relationship results
from one action making use of another as part of its function.
Activity Diagrams
 An algorithm is a group of instructions for performing some task.
The variety of such tasks is limitless.
 Computer scientists are most interested in those algorithms that lead
to software systems, but the design techniques used to develop
software algorithms are generally applicable to many types of
algorithms.
 The term control flow was introduced to mean order of instruction
execution, and diagramming control flow is the best way to think of
activity diagrams.
 Some people refer to this kind of diagram as a flow diagram or
flowchart. The picture elements of an activity diagram are quite
simple; just five symbols are needed for most diagrams.
Symbols used in activity diagrams.

 A rectangle represents a single activity. Activities correspond to
many instructions in a computer program. In non software
algorithms, an activity denotes some action to be performed.
 The action corresponding to the activity is described inside the
activity rectangle.
 Arrows are used to connect activities and other symbols, indicating
the control flow—the order in which the activities must occur.
 An arrow from the rectangle for Activity A to Activity B specifies
that A must be performed prior to B.
 Decision/merge diamonds are used in activity diagram locations
where the control flow must split or join together as we shall see
later.
 Start and end symbols are used to denote where the algorithm
begins or completes execution.
Activity diagram for a morning routine.

Selection In Activity Diagrams
Five different forms of control flow:
1. Sequential execution
2. Selection
3. Repetition
4. Control abstraction
5. Concurrency
 Selection occurs anytime that an algorithm must make a choice.

 In an activity diagram a selection is depicted as a split in the flow.
Using a decision symbol for selection.

 Shows how the decision symbol(a small diamond) is used to
denote such a split. This diagram shows that following Activity A
the algorithm contains a choice of next executing Activity B or
Activity C.
 Choices in activity diagrams are based upon conditions, where a
condition is any statement that must be either true or false (i.e., a
logical expression).
 Each arrow extending from a decision split needs to be labeled
with a condition (enclosed in square brackets) for which the
associated path is chosen.
 Activity B is executed in the event that Condition 1 is true
following the execution of Activity A.
 If Condition 2 is true, then Activity C is chosen. A properly written
activity diagram crafts the conditions in such a way that either
Condition 1 is true or Condition 2 is true, but both cannot be true at
once.
Activity diagram for taking a photo.

Repetition In Activity Diagrams
 The decision/merge symbol also serves as a notation for depicting
algorithms with repetition. This algorithm describes how to place a call to
a specific phone number on a typical cell phone.
 The first activity for placing a call is to turn on the cell, then the user
must press the proper button to initiate the telephone app.
 The repetition in this algorithm occurs when the caller dials a particular
phone number.
 Notice that the activity diagram pictures this as a choice based upon
whether the phone number is complete or incompletely dialled.
 In the case that the number is incomplete, the diagram indicates that the
caller presses the next button and returns to another choice based upon
phone number completeness.
Activity diagram for an online auction.

Control Abstraction In Activity Diagrams
 More complex activities can be abstracted as a rectangle. Zooming in on
the rectangle expands it into sub activities. Figure expands this activity
into more detail.
 This box contains a five-step sequence that is the sub algorithm. Sub
algorithms may contain all of the notations that are permitted in any other
activity diagram, including their own starting and ending symbols.
 The diagram intentionally makes the sub algorithm appear as a blowup of
the more abstract activity rectangle.
 Software applications for manipulating activity diagrams often use zoom-
in and zoom-out commands to manage such sub algorithms.
Activity diagram for an online auction with sub algorithm.
States And State Diagrams
 Activity diagrams are useful for diagramming the behavior (control flow)
of algorithms. However, modern computer applications often rely on the
concept of state, suggesting an alternative kind of design diagram.
 Physical matter can be in one of three states: solid, liquid, or gaseous.
 Bits of computer memory can be in one of two states: 0 or 1. A home’s
climate control system is in one of three states: off or heat or cool.
 It is computational state that is of the most concern to computer scientists.
 Imagine that you could take a snapshot of the bit values of every piece of
your computer’s memory.
 Suppose that an airline application allows users to check flight status,
make reservations, and investigate personal information regarding
pending flights and frequent flier miles. We can think of these three broad
functionalities as three separate states:
1. A state for checking flight status
2. A state for updating reservations
3. A state for checking and changing my account
 State diagrams use many of the same notations as activity diagrams.
 However, the meanings of the symbols are quite different for activity and
state diagrams.
 Rectangles symbolize states in a state diagram, and arrows denote
potential transitions from one state to another.
 The symbols used at the beginning and ending of a control flow in an
activity diagram are also used to indicate a starting state and ending state
for state diagrams.
State diagram notations
 Shows a simple initial example of a state diagram. This state diagram

depicts the three possible states of water: in a liquid form, a solid form
(Ice), and a gaseous form (Water Vapor).
 The four transitions illustrate how this kind of matter changes state.
 For example, cooling liquid water to temperature below 0°C causes water
to freeze into ice. Similarly, heating the liquid form of water above 100°C
results in a transformation
 from liquid to gas.
 This particular state diagram has the somewhat unusual property of
needing no start or end state.
State diagram for water transitions.
Keyword Searching in Text

Problem:
Count the number of times a particular word occurs in a given text.
Algorithm Development:
To determine whether the text SENTENCE occurs in the text.
THIS IS A SENTENCE OF TEXT
A word is separated by non-alphabetic characters(except first word).
Two words matches when they agree for character-for-character from first to
last character.
Store the word SENTENCE in an array called word. Let’s consider from the
first character of the text and the first character of word. To check the next
character increase the pointer ‘i’ to 2.
We must compare the chr just read with word[2]. If chr=’E’ the matching
process can continue. If i=9 a match is found.
When a mismatch is found reset the pointer of word array.
The steps to be taken when a mismatch occurs is
If word-match then
(a) Increment match count by 1.
(b) Set pre to most recent character read,
(c) Reset pointer i to first of word array
Algorithm Description:
1. Establish the word and wordlength wlength of the search-word.
2. Initialise the match-count nmatches, set preceding character and set
pointer for word array i to 1.
3. While not at end-of-line do
(a) while not end-of-line do
(a.1) read next character;
(a.2) if current text character chr matches ith character in word then
(2.a) extend partial match i by 1.
(2.b) if a word-pattern match then
(b.1) read next character post
(b.2) if preceding and following character not alphabetic then
(2.a) update match count nmatches
(b.3) re-initialize pointer to word array i
(b.4) save following character post as preceding character
else
(2’.a) save current text character as preceding character for match
(2’.b) reset word array pointer i to first position
(b) read past end of line
4. Return word match count nmatches
Text Line Length adjustment

Problem:
Given a set of lines of text of arbitrary length, reformat the text so that no lines
of more than n characters(eg, 40) are printed. In each output line the maximum
number of words that occupy less than or n characters, should be printed and no
word should extend across two lines. Paragraphs should also remain indented.
Partially filled line should be avoided.
Add the current word (plus its trailing blank).
1. While not at end-of-file do

(a) fill up output line;
(b) write out output line upto the last full word;
(c) move any partial word at the end of the line to the beginning of the
line
(d) Write out any partially filled line.
back:=41;
while a[back] <> ‘ ‘
back := back – 1;
In this case back = 41, when it encounters O back = 40, for N back = 39 for A
back = 38. When it encounters blank space it exits out of loop.
c:=0;
for j := back + 1 to 41 do
begin
c := c + 1;
a[c] = d[j]
end
1st iteration
back=38+1
for j = 39 to 41 do
c= 0+1 = 1
a[1]=A
2nd iteration
back = 40
c=2
a[2]=O
3rd iteration
back=41
c=3
a[3]=T
1. Establish the line length limit limit and add one to it to allow for a space.
For eg, back = 41
2. Initialize word and line character counts to zero and end-of-line flag to false.
3. While not end-of-file do
(a) Read and store next character
(b) If character is a space then
(b.1) if a new paragraph then
(1.a) move to next line and reset character count for new line,
(b.2) add current word length to current line length,
(b.3) if current word causes line length limit to be exceeded then
(3.a) move to next line set line length to current word length,
(b.4) write out current word and its trailing space and reinitialize character
count,
(b.5) turn off end-of-input-line flag,
(b.6) if at end-of-input-line then
(6.a) set end-of-input-line flag and move to next input line.
Summation of set of numbers
Problem:
Given a set of n numbers design an algorithm that adds these numbers and
returns the resultant sum. Assume n is greater than or equal to zero.
Write the numbers down one under the other and start adding up the right-hand
column, For eg, Consider the addition of 421, 583 and 714
421
583
714
…8
For eg S=421+583+714
S=a+b+c
General equation
Assign s=0
S=S+a1
S=S+ a2
.
.
.
In General- S=S+ai+1
Generation of Fibonacii sequence

Problem:
Generate and print the first n terms of Fibonacii series where n>=1.
The first few terms are
0 1 1 2 3 5 8 13 21 …
Each term beyond the first two is derived from the sum of its two nearest
predecessors.
New term = preceding term + term before preceding term
Let us define
a – as the term before preceding term
b – as the preceding term
c – new term
Then we have to start
For n>2
In the above code, a, b loses its relevance, Hence
Generating Prime Numbers

Problem:
Design an algorithm to establish all the primes in the first n positive
integers.
A prime number is a positive integer that is exactly divisible only by 1 and
itself.
The few prime numbers are
2 3 5 7 11 13 17 19 23 29 31 37….
All prime numbers apart from 2 are odd
Starting x at 1 and using the increment
x=x+2
Gives us the sequence of numbers
2 3 5 7 11 13 17 19 23
So far we have eliminated numbers divisible by 2
Can we extend this to eliminating numbers divisible by 3,5 and so on?
Write down the odd sequence of numbers with the multiples of 3 removed.
Beyond 5 we have alternating sequence of 2 and 4

Intially dx=4,
We have
dx = abs(dx-6)
The numbers that are not crossed out are prime numbers
It is named as “Seive of Eratosthenes” named after the Greek
mathematician Eratosthenes.
Storage problem arises when the list goes larger
To avoid this we use √x
For eg when x=1000 – we have to cross only multiples of 31 because √1000
≈ 31.
Raising a number to a large powerProblem
Given some integer x, compute the value of xn where n is a positive integer
greater than 1
Evaluating the expression
p=xn , where x and n are given is a straight forward task
A simple method of evaluation is accumulating product p by x for n iterations.
p := 1
for I := 1 to n do
p := p * x
However, sometimes the above method strive for higher efficiencies in
evaluating the power of some integers.
Consider the evaluation of x10. In our stepwise approach we have
Notice that x is increasing by one power at each step.
For eg
x4=x2*x2
Requires only two multiplications.
Multiplication = addition of powers
We have to choose the smallest pair to add upto 10

for eg 5+5 =10
All other combinations eg., 6+4 etc, will leave us with a larger smaller
power evaluation to solve (in this case x6).
For eg
With this scheme we have used only 4 multiplications rather than 9 to

evaluate x10.
Consider the case of evaluating x23. Requires only 7 multiplications
Two conditions apply
This means our algorithm needs two parts
The odd/even power information is stored in an array 1/0

We
can use the standard the following standard form:
product:=product * psequence
Initially

Unit - 4 CTPS

Uploaded by

Copyright:

Available Formats

Unit - 4 CTPS

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit - 4 CTPS

Uploaded by

Copyright:

Available Formats

Unit 4

Text Processing and Abstraction

4.2.1 String Operations

Abstraction- Class diagram control structure

1. Sequential control ensures that one instruction executes before

 Data abstraction is no less important than control abstraction. A standard

 Class diagrams use a rectangle to denote a single class of objects. The

Thermostat class diagram.

Cab driver use case diagram.

Symbols used in activity diagrams.

Activity diagram for a morning routine.

 Selection occurs anytime that an algorithm must make a choice.

Using a decision symbol for selection.

Activity diagram for taking a photo.

Activity diagram for an online auction.

State diagram notations

 Shows a simple initial example of a state diagram. This state diagram

Keyword Searching in Text

Text Line Length adjustment

Partially filled line should be avoided.

Add the current word (plus its trailing blank).

1. While not at end-of-file do

Summation of set of numbers

Generation of Fibonacii sequence

Generating Prime Numbers

Beyond 5 we have alternating sequence of 2 and 4

We have to choose the smallest pair to add upto 10

With this scheme we have used only 4 multiplications rather than 9 to

This means our algorithm needs two parts

The odd/even power information is stored in an array 1/0

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.