Python - Module 3

HKBK COLLEGE OF ENGINEERING, BENGALURU
syllabus
syllabus
Module - 3 : Lists
 List methods
2) extend ( )
1) append( )
Takes a list as an argument and appends all of the elements:
Adds a new element to the end of a list:
>>> t1 = ['a', 'b', 'c']
>>> t = ['a', 'b', 'c']
>>> t2 = ['d', 'e']
>>> t.append('d')
>>> t1.extend(t2)
>>> print(t)
>>> print(t1)
['a', 'b', 'c', 'd']
['a', 'b', 'c', 'd', 'e']
3) Sort ( )
This example leaves t2 unmodified.
Arranges the elements of the list from low to high:
>>> t = ['d', 'c', 'e', 'b', 'a']

Most list methods are void; they modify the list and return
>>> t.sort() None.
t = t.sort(), the result will be nothing
>>> print(t)
['a', 'b', 'c', 'd', 'e']

Module - 3 : Lists
 List methods -Deleting Elements
4) pop( )
>>> t = ['a', 'b', 'c']
To delete elements from a list. If the index of the
Element is known >>> x = t.pop( )
>>> print(t)
>>> t = ['a', 'b', 'c']
['a', ‘b']
>>> x = t.pop(1)
>>> print(x)
>>> print(t)
c
['a', 'c']
>>> print(x) 5) del( )
b If we don’t need the removed value, use the del operator:
pop modifies the list and returns the element that was >>> t = ['a', 'b', 'c']
removed. >>> del t[1]
If we don’t provide an index, it deletes and returns the >>> print(t)

last element. ['a', 'c']
Module - 3 : Lists
 List methods -Deleting Elements
6) remove( )
To remove more than one element, you can use del with a slice index:
If any element from the list to be removed , use remove:
>>> t = ['a', 'b', 'c', 'd', 'e', 'f']
>>> t = ['a', 'b', 'c']
>>> del t[1:5]
>>> t.remove('b')
>>> print(t)
>>> print(t)
['a', 'f']
['a', 'c']
As usual, the slice selects all the elements up to, but not including, the
second index.
# The return value from remove is None.

Module - 3 : Lists
 Lists and functions
There are a number of built-in functions that can be used on lists that allow to quickly look through a list without writing own loops:
>>> nums = [3, 41, 12, 9, 74, 15]

>>> print(len(nums))
The sum() function only works when the list elements are numbers.
6
The other functions (max(), len(), etc.) work with lists of strings and
>>> print(max(nums))
other types that can be comparable.
74
>>> print(min(nums))
3
>>> print(sum(nums))
154
>>> print(sum(nums)/len(nums)) # finding average
25
Module - 3 : Lists
We could rewrite the program that computed the average of a list of numbers entered by the user using a list.
The program to compute an average without a list:
total = 0
count = 0
while (True):
inp = input('Enter a number: ')
if inp == 'done': break
value = float(inp)
total = total + value
count = count + 1
average = total / count

print('Average:', average)
Module - 3 : Lists
We could rewrite the program that computed the average of a list of numbers entered by the user using a list.
The program to compute an average with a list:

numlist = list() # empty list
while (True):
inp = input('Enter a number: ')
if inp == 'done': break
value = float(inp)
numlist.append(value)
average = sum(numlist) / len(numlist)

print('Average:', average)
Module - 3 : Lists
 Lists and Strings
A string is a sequence of characters and a list is a sequence of values, but a list of characters is not the same
as a string.
To convert from a string to a list of characters, use list:
>>> s = ‘HKBK'
>>> t = list(s)
>>> print(t)
[‘H', ‘K', ‘B', ‘K']
The list function breaks a string into individual letters. If we want to break a string into words, use split method:
>>> s = ‘Welcome to HKBKCE'
>>> t = s.split()
>>> print(t)
[' Welcome ', ‘ to ‘, ‘ HKBKCE ’]
>>> print(t[2])
HKBKCE
Module - 3 : Lists
 Lists and Strings
We can call split with an optional argument called a delimiter that specifies which characters to use as word
boundaries.
The following example uses a hyphen as a delimiter:
>>> s = 'spam-spam-spam'
>>> delimiter = '-'
>>> s.split(delimiter)
['spam', 'spam', 'spam']
Join( ) is the inverse of split. It takes a list of strings and concatenates the elements.
join is a string method, so we have to invoke it on the delimiter and pass the list as a parameter:
>>> t = [' Welcome ', ‘ to ‘, ‘ HKBKCE ’]

>>> delimiter = ‘ '
>>> delimiter.join(t)
‘Welcome to HKBKCE’
Module - 3 : Lists
 Objects and values
a = 'banana'
b = 'banana‘
we know that a and b both refer to a string, but we don’t know whether they refer to the same string.
There are two possible states:
In one case, a and b refer to two different objects that have the same value.
In the second case, they refer to the same object.
Module - 3 : Lists
To check whether two variables refer to the same object, use the is operator.
>>> a = 'banana'
>>> b = 'banana'
>>> a is b
True
In this example, Python only created one string object, and both a and b refer to it.
Module - 3 : Lists
create two lists, we get two objects:
>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a is b
False
In this case , the two lists are equivalent, because they have the same elements, but not identical, because
they are not the same object.
If two objects are identical, they are also equivalent, but if they are equivalent, they are not necessarily
identical.
Module - 3 : Lists
 Aliasing
If a refers to an object and assign b = a, then both variables refer to the same object:
>>> a = [1, 2, 3]
>>> b = a
>>> b is a
True
The association of a variable with an object is called a reference.

In this example, there are two references to the same object.
An object with more than one reference has more than one name, then the object is aliased.
Module - 3 : Lists
 Aliasing
If the aliased object is mutable, changes made with one alias affect the other:
>>> a = [1, 2, 3]
>>> b = a
>>> b[0] = 17
>>> print(a)
[17, 2, 3]
For immutable objects like strings, aliasing is not as much of a problem.

a = 'banana'
b = 'banana'
it almost never makes a difference whether a and b refer to the same string or not.
Module - 3 : Lists
 List arguments
When we pass a list to a function, the function gets a reference to the list.
If the function modifies a list parameter, the caller sees the change.
For example,
delete_head removes the first element from a list:
def delete_head(t):
del t[0]
>>> letters = ['a', 'b', 'c']

>>> delete_head(letters)
>>> print(letters)
['b', 'c']
Module - 3 : Lists
 List arguments
The parameter t and the variable letters are aliases for the same object.
It is important to distinguish between operations that modify lists and operations that create new lists.
For example, the append method modifies a list, but the + operator creates a new list:
>>> t1 = [1, 2]
>>> t2 = t1.append(3)
>>> print(t1)
[1, 2, 3]
>>> print(t2)
None
>>> t3 = t1 + [3]
>>> print(t3)
[1, 2, 3]
>>> t2 is t3
False
Module - 3 : Lists
 List arguments
This difference is important when we write functions that are supposed to modify lists.
For example, this function does not delete the head of a list:
def bad_delete_head(t):
t = t[1:] # WRONG!
The slice operator creates a new list and the assignment makes t refer to it, but none of that has any
effect on the list that was passed as an argument.
An alternative is to write a function that creates and returns a new list.
For example, tail returns all but the first element of a list:
def tail(t):
return t[1:] #This function leaves the original list unmodified. Here’s how it is used:
>>> letters = ['a', 'b', 'c']
>>> rest = tail(letters)
>>> print(rest)
['b', 'c']
Module - 3 : Dictionaries
 Dictionary
• A dictionary is like a list, but more general.
• In a list, the index positions have to be integers; in a dictionary, the indices can be
(almost) any type.
• dictionary is as mapping between a set of indices (which are called keys) and a set
of values.
• Each key maps to a value.
• The association of a key and a value is called a key-value pair or sometimes an

item.
• the keys and the values are all strings.

 Dictionary
The function dict() creates a new dictionary with no items.
Because dict is the name of a built-in function, we should avoid using it as a variable name.
>>> city_capital = dict()
>>> print(city_capital)
{} # The curly brackets, { }, represent an empty dictionary.
To add items to the dictionary, use square brackets:

>>> city_capital[‘KAR'] = ‘Bangalore‘
This line creates an item that maps from the key ’KAR’ to the value “Bangalore”.
If we print the dictionary again, we see a key-value pair with a colon between the key and
value:
{‘KAR': ‘Bangalore'}
 Dictionary
>>> city_capital={'KAR':'Bangalore','TN':'Chennai','AP':'Hyderabad'}
{'KAR': 'Bangalore', 'TN': 'Chennai', 'AP': 'Hyderabad'}
The order of the key-value pairs is not the same.

In general, the order of items in a dictionary is unpredictable.
Use the keys to look up the corresponding values:

>>> print(city_capital['TN'])
Chennai
If the key isn’t in the dictionary, we get an exception:

>>> print(city_capital[‘KN'])
KeyError: ‘KN'
 Dictionary
The len function works on dictionaries;

it returns the number of key-value pairs:
>>> len(city_capital)
3
 Dictionary
The in operator works on dictionaries; it tells whether something appears as a key in the
dictionary.
>>> ‘KAR' in city_capital
True
>>> ‘KAN' in city_capital
False
To see whether something appears as a value in a dictionary, use the method values, which
returns the values as a list, and then use the in operator:
>>> city = list(city_capital.values())
>>> ‘Bangalore' in city
True
 Dictionary
• The in operator uses different algorithms for lists and dictionaries.
• For lists, it uses a linear search algorithm. As the list gets longer, the search time gets
longer in direct proportion to the length of the list.
• For dictionaries, Python uses an algorithm called a hash table that has a remarkable
property: the in operator takes about the same amount of time no matter how many items
there are in a dictionary.
 Dictionary as a set of counters
str = 'Good Morning God Mother'

d = dict()
for ch in str:
if ch not in d:
d[ch] = 1
else:
d[ch] = d[ch] + 1
print(d)
{'G': 2, 'o': 5, 'd': 2, ' ': 3, 'M': 2, 'r': 2, 'n': 2, 'i': 1, 'g': 1, 't': 1, 'h': 1, 'e': 1}
We are effectively computing a histogram, which is a statistical term for a set of counters (or
frequencies).
• Dictionaries have a method called get that takes a key and a default value.
• If the key appears in the dictionary, get returns the corresponding value;
otherwise it returns the default value.
For example:
>>> counts = { ‘Python' : 1 , ‘PCD' : 42, ‘Rupee': 100}

>>> print(counts.get(‘Rupee', 0))
100
>>> print(counts.get(‘doller', 0))
0
str = 'Good Morning God Mother'

d = dict( )
for ch in str:
d[ch] = d.get(ch,0) + 1
print(d)
{'G': 2, 'o': 5, 'd': 2, ' ': 3, 'M': 2, 'r': 2, 'n': 2, 'i': 1, 'g': 1, 't': 1, 'h': 1, 'e': 1}
We can use get to write our histogram loop more concisely.

Because the get method automatically handles the case where a key is not in a dictionary,
we can reduce four lines down to one and eliminate the if statement.
 Dictionaries and files
 common uses of a dictionary is to count the occurrence of words in a file with some written
text.
Write a Python program to read through the lines of the file, break each line into a list of words,
and then loop through each of the words in the line and count each word using a dictionary.
We will see that we have two for loops.
The outer loop is reading the lines of the file and the inner loop is iterating through each of the
words on that particular line. This is an example of a pattern called nested loops because one of
the loops is the outer loop and the other loop is the inner loop
 Dictionaries and files
fname = input('Enter the file name: ')
try:
fhand = open(fname)
HKBK.txt
except:
HKBK College of Engineering was established in 1997.
print('File cannot be opened:', fname) Teaching is more than a profession, for the faculty
exit() members of HKBKCE
HKBKCE is situated in Bangalore
counts = dict()
We are proud to be a HKBK Students.
for line in fhand:
words = line.split() Enter the file name: hkbk.txt
for word in words: {'HKBK': 2, 'College': 1, 'of': 2, 'Engineering': 1, 'was': 1, 'established':
if word not in counts: 1, 'in': 2, '1997.': 1, 'Teaching': 1, 'is': 2, 'more': 1, 'than': 1, 'a': 2,
counts[word] = 1 'profession,': 1, 'for': 1, 'the': 1, 'faculty': 1, 'members': 1, 'HKBKCE': 2,
else: 'situated': 1, 'Bangalore': 1, 'We': 1, 'are': 1, 'proud': 1, 'to': 1, 'be': 1,
counts[word] += 1 'Students.': 1}
print(counts)
 Looping and dictionaries
If we use a dictionary as the sequence in a for statement, it traverses the keys of the dictionary.
This loop prints each key and the corresponding value:
counts = { 'BNG' : 1 , 'MYS' : 42, 'SMG': 100}

for key in counts:
print(key, counts[key])
Output:
BNG 1
MYS 42
SMG 100
 Looping and dictionaries
If we want to print the keys in alphabetical order,

1. make a list of the keys in the dictionary using the keys method available in dictionary objects
2. sort that list and loop through the sorted list, looking up each key
Program for Printing out key-value pairs in sorted order as follows:
counts = {'SMG' : 1 , 'MYS' : 42, 'BNG' : 100}

lst = list(counts.keys())
print(lst)
lst.sort()
for key in lst:
print(key, counts[key]) Output:
['SMG', 'MYS', 'BNG']
BNG 100
MYS 42
SMG 1
 Advanced text parsing
Python split function looks for spaces and treats words as tokens separated by spaces.
We can solve both Punctuation and case sensitive problems by using the string methods lower,
punctuation, and translate.
The translate is the most subtle of the methods.
line.translate(str.maketrans(fromstr, tostr, deletestr))
Replace the characters in fromstr with the character in the same position in tostr and delete all
characters that are in deletestr.
The fromstr and tostr can be empty strings and the deletestr parameter can be omitted.
 Advanced text parsing
>>> import string

>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~‘
HKBKpunch.txt
Greetings from HKBKCE!

Hello, How are You?
Teaching is more than a profession, for the faculty members of HKBKCE
import string  Advanced text parsing
fname = input('Enter the file name: ') #counting words program
try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
exit()
counts = dict()
for line in fhand:
line = line.rstrip()
line = line.translate(line.maketrans('', '', string.punctuation))
line = line.lower()
Output:
words = line.split()
Enter the file name: HKBKpunch.txt
for word in words:
{'greetings': 1, 'from': 1, 'hkbkce': 3, 'hello': 1, 'how': 1, 'are': 2, 'you':
if word not in counts:
1, 'hkbk': 2, 'college': 1, 'of': 2, 'engineering': 1, 'was': 1, 'established':
counts[word] = 1
1, 'in': 2, '1997': 1, 'teaching': 1, 'is': 2, 'more': 1, 'than': 1, 'a': 2,
else:
'profession': 1, 'for': 1, 'the': 1, 'faculty': 1, 'members': 1, 'situated': 1,
counts[word] += 1
'bangalore': 1, 'we': 1, 'proud': 1, 'to': 1, 'be': 1, 'students': 1}
print(counts)
Module - 3 : Tuples
 Tuples are immutable
 A tuple is a sequence of values much like a list.
 The values stored in a tuple can be any type, and they are indexed by integers.
 The important difference is that tuples are immutable.
 Tuples are also comparable and hashable so we can sort lists of them and use tuples as key
values in Python dictionaries.
 Syntactically, a tuple is a comma-separated list of values:

t = 'a', 'b', 'c', 'd', 'e' or t = ('a', 'b', 'c', 'd', 'e')
>>> t1 = ('a',) >>> t2 = ('a')
>>> type(t1) >>> type(t2)
<type 'tuple'> <type 'str'>
Module - 3 : Tuples
Another way to construct a tuple is the built-in function tuple.

With no argument, it creates an empty tuple:
>>> t = tuple()
>>> print(t)
()
If the argument is a sequence (string, list, or tuple), the result of the call to tuple is a tuple with
the elements of the sequence:
>>> t = tuple('lupins')
>>> print(t)
('l', 'u', 'p', 'i', 'n', 's')
Because tuple is the name of a constructor, avoid using it as a variable name.
Module - 3 : Tuples
Most list operators also work on tuples.

The bracket operator indexes an element:
>>> t = ('a', 'b', 'c', 'd', 'e')
>>> print(t[0])
'a'
slice operator selects a range of elements:

>>> print(t[1:3])
('b', 'c')
Module - 3 : Tuples
But if try to modify one of the elements of the tuple, will get an error:
>>> t[0] = 'A'
TypeError: object doesn't support item assignment
We can’t modify the elements of a tuple, but We can replace one tuple with another:
>>> t = ('A',) + t[1:]

>>> print(t)
('A', 'b', 'c', 'd', 'e')
Module - 3 : Tuples
 Comparing tuples
 The comparison operators work with tuples and other sequences.

 Python starts by comparing the first element from each sequence.
 If they are equal, it goes on to the next element, and so on, until it finds elements that differ.
 Subsequent elements are not considered (even if they are really big).
>>> (0, 1, 2) < (0, 3, 4)

True
>>> (0, 1, 2000000) < (0, 3, 4)
True
Module - 3 : Tuples
 The sort function works the same way.

 It sorts primarily by first element, but in the case of a tie, it sorts by second element, and so
on.
 This feature lends itself to a pattern called DSU for
Decorate a sequence by building a list of tuples with one or more sort keys preceding the
elements from the sequence,
Sort the list of tuples using the Python built-in sort, and
Undecorate by extracting the sorted elements of the sequence.
[DSU]
Module - 3 : Tuples
#Program to sort list of words from shortest to longest
txt = 'HKBK College of Engineering was established in 1997'

words = txt.split()
t = list()
for word in words:
t.append((len(word), word))
print(t) Output:
t.sort() [(4, 'HKBK'), (7, 'College'), (2, 'of'), (11, 'Engineering'), (3, 'was'), (11, 'established'),
print(t) (2, 'in'), (4, '1997')]
res = list() [(2, 'in'), (2, 'of'), (3, 'was'), (4, '1997'), (4, 'HKBK'), (7, 'College'), (11, 'Engineering'),
for length, word in t: (11, 'established')]
['in', 'of', 'was', '1997', 'HKBK', 'College', 'Engineering', 'established']
res.append(word)
print(res)
Module - 3 : Tuples
#Program to sort list of words from longest to shortest
txt = 'HKBK College of Engineering was established in 1997'

words = txt.split()
t = list()
for word in words:
t.append((len(word), word))
t.sort(reverse=True)
res = list()
for length, word in t: Output:
res.append(word)
['established', 'Engineering', 'College', 'HKBK', '1997', 'was', 'of', 'in']
print(res)
Module - 3 : Tuples
 Tuple assignment
 One of the unique syntactic features of the Python language is the ability to have a tuple on
the left side of an assignment statement.
 This allows to assign more than one variable at a time when the left side is a sequence.
 Two-element list (which is a sequence) and assign the first and second elements of the
sequence to the variables x and y in a single statement.
>>> m = [ 'have', 'fun' ] >>> m = [ 'have', 'fun' ] >>> m = [ 'have', 'fun' ]
>>> x, y = m >>> x = m[0] >>> (x, y) = m
>>> x >>> y = m[1] >>> x
'have' >>> x 'have'
>>> y 'have' >>> y
'fun' >>> y 'fun'
>>> 'fun' >>>
>>>
Module - 3 : Tuples
 Tuple assignment
Tuple assignment allows us to swap the values of two variables in a single statement:
>>> a, b = b, a
The number of variables on the left and the number of values on the right must be
the same:
>>> a, b = 1, 2, 3
ValueError: too many values to unpack
>>> addr = 'monty@python.org'
>>> uname, domain = addr.split('@')
The return value from split is a list with two elements;
the first element is assigned to uname, the second to domain.
>>> print(uname)
monty
>>> print(domain)
python.org
Module - 3 : Tuples
 Dictionaries and tuples
Dictionaries have a method called items.

Item() returns a list of tuples, where each tuple is a key-value pair:
>>> d = {'a':10, 'b':1, 'c':22}

>>> t = list(d.items())
>>> print(t)
[('b', 1), ('a', 10), ('c', 22)]
As we should expect from a dictionary, the items are in no particular order.

Module - 3 : Tuples
 Dictionaries and tuples
sort the list of tuples.
Converting a dictionary to a list of tuples is a way for us to output the contents of a dictionary
sorted by key:
>>> d = {'a':10, 'b':1, 'c':22}

>>> t = list(d.items())
>>> t
[('b', 1), ('a', 10), ('c', 22)]
>>> t.sort()
>>> t
[('a', 10), ('b', 1), ('c', 22)]
The new list is sorted in ascending alphabetical order by the key value.
Module - 3 : Tuples
 Multiple assignment with dictionaries
Combining items, tuple assignment, and for will give code pattern for traversing the keys and values of a
dictionary in a single loop:
d = {'a':10, 'b':1, 'c':22}

for key, val in list(d.items()):
print(val, key)
This loop has two iteration variables because items returns a list of tuples and key, val is a tuple assignment
that successively iterates through each of the key-value pairs in the dictionary.
For each iteration through the loop, both key and value are advanced to the next key-value pair in the
dictionary (still in hash order).
The output of this loop is:

10 a
22 c
1b
Module - 3 : Tuples
 The most common words
#program to count words
import string # Sort the dictionary by value
fhand = open('HKBKpunch.txt') lst = list()
counts = dict() for key, val in list(counts.items()):
for line in fhand: lst.append((val, key))
line = line.translate(str.maketrans('', '', string.punctuation)) Output:
line = line.lower() lst.sort(reverse=True) 3 hkbkce
words = line.split() 2 of
for word in words: for key, val in lst[:10]: 2 is
if word not in counts: print(key, val) 2 in
counts[word] = 1 2 hkbk
else: 2 are
counts[word] += 1 2a
1 you
The fact that this complex data parsing and analysis can be done with an easy-to 1 we
understand Python program is one reason why Python is a good choice as 1 was
a language for exploring information
Module - 3 : Tuples
 Using tuples as keys in dictionaries
Define the variables last, first, and number, we could write a dictionary assignment statement as
follows:
directory[last,first] = number
The expression in brackets is a tuple. We could use tuple assignment in a for loop to traverse this
dictionary.
for last, first in directory: #Using tuples as keys in dictionaries

print(first, last, directory[last,first])
phone=dict()
phone['mustafa‘,'syed']=22223333
for lname, fname in phone:
print(fname, lname, phone[lname,fname])
Output: syed mustafa 22223333

Module - 3 : Regular expressions
The task of searching and extracting from a string is done by Python with a very
powerful library called regular expressions.
Regular expressions are almost their own little programming language for
searching and parsing strings
The regular expression library ‘re’ must be imported into program before we
can use it.
The simplest use of the regular expression library is the search( ) function.
# Search for lines that contain 'Bangalore'

import re
fhand = open('HKBK.txt')
for line in fhand:
HKBK.txt
if re.search('Bangalore', line): HKBK College of Engineering was established in 1997.
print(line) Teaching is more than a profession, for the faculty
members of HKBKCE
Output:
The power of the regular expressions comes when we add special characters to
the search string that allow us to more precisely control which lines match the
string.
For example, the caret (^) character is used in regular expressions to match “the
beginning” of a line.
# Search for lines that start with ‘HKBK‘
import re
fhand = open(‘HKBK.txt')
for line in fhand:
HKBK.txt
line = line.rstrip() HKBK College of Engineering was established in 1997.
if re.search('^HKBK', line): Teaching is more than a profession, for the faculty
print(line) members of HKBKCE
Output:
 Character matching in regular expressions
The most commonly used special character is the period or full stop, which
matches any character to build even more powerful regular expressions since the
period characters in the regular expression match any character.
Example:
The regular expression “F..m:” would match any of the strings:
“From:”, “Fxxm:”, “F12m:”, or “F!@m:”
# Search for lines that start with 'H', followed by 2 characters, followed by 'K'
import re
fhand = open('search.txt')
for line in fhand: search.txt
line = line.rstrip() HKBK College of Engineering was established in 1997.
Teaching is more than a profession, for the faculty
if re.search('^H..K', line): members of HKBKCE
print(line) HISK is sample sentance
Output:
HISK is sample sentance
This is particularly powerful when combined with the ability to indicate that a
character can be repeated any number of times using the “*” or “+” characters in
regular expression.
The special character or wild cards

1. asterisk ‘*’ matches zero-or-more characters.
2. plus sign ‘+’ matches one-or-more of the characters.
# Search for lines that start with From and have an at (@) sign
import re
fhand = open(‘from.txt')
for line in fhand: From.txt
line = line.rstrip() From: syedmustafa@gmail.com
Professor and HOD
if re.search('^From:.+@', line): From: nikilravi@yahoo.co.in
print(line) Email for sample
From: madhu@hotmail.com
Output:
From: syedmustafa@gmail.com
From: nikilravi@yahoo.co.in
From: madhu@hotmail.com
 Extracting data using regular expressions
If we want to extract data from a string in Python, we can use the findall( )
method to extract all of the substrings which match a regular expression.
example program to extract anything that looks like an email address from
any line regardless of format.
For example, to pull the email addresses from each of the following lines:
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008

Return-Path: <postmaster@collab.sakaiproject.org>
for <source@collab.sakaiproject.org>;
Received: (from apache@localhost)
Author: stephen.marquard@uct.ac.za
# program to extract only Email Ids.
import re
s = 'A message from csev@umich.edu to cwen@iupui.edu about meeting @2PM'
lst = re.findall('\S+@\S+', s)
print(lst)
Output:
['syed@gmail.com', 'mustafa@yahoo.com']
The findall() method searches the string in the second argument and returns a list of
all of the strings that look like email addresses.
We can use two-character sequence that matches a non-whitespace character (\S).
# Search for lines that have an at sign between characters
import re
fhand = open('mbox.txt')
for line in fhand: Mbox.txt
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
line = line.rstrip() Return-Path: <postmaster@collab.sakaiproject.org>
x = re.findall('\S+@\S+', line) for <source@collab.sakaiproject.org>;
if len(x) > 0: Received: (from apache@localhost)
Author: stephen.marquard@uct.ac.za
print(x)
Output:
['stephen.marquard@uct.ac.za']
['<postmaster@collab.sakaiproject.org>']
['<source@collab.sakaiproject.org>;']
['apache@localhost)']
Square brackets are used to indicate a set of multiple acceptable characters we are
willing to consider matching.
[a-z] –matches any one character from the range a to z
[A-Z] –matches any one character from the range A to Z
[0-9] –matches any one number from the range 0 to 9
new regular expression: [a-zA-Z0-9]\S*@\S*[a-zA-Z]
For substrings that start with a single lowercase letter, uppercase letter, or number
“[a-zA-Z0-9]”, followed by zero or more non-blank characters (“\S*”), followed by an
at-sign, followed by zero or more non-blank characters (“\S*”), followed by an
uppercase or lowercase letter.
Remember that the “*” or “+” applies to the single character immediately to the left
of the plus or asterisk.
# Search for lines that have an at sign between characters
# The characters must be a letter or number
import re
fhand = open('mbox.txt')
for line in fhand:
x = re.findall('[a-zA-Z0-9]\S+@\S+[a-zA-Z]', line)
if len(x) > 0:
print(x) Output:
['postmaster@collab.sakaiproject.org']
['source@collab.sakaiproject.org']
['apache@localhost']
 Combining searching and extracting
If we want to find numbers on lines that start with the string “X-” such as:
X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000
we don’t just want any floating-point numbers from any lines.
We only want to extract numbers from lines that have the above syntax.
regular expression : ^X-.*: [0-9.]+
start with “X-”, followed by zero or more characters (“.*”), followed by a colon (“:”)
and then a space.
After the space,looking for one or more characters that are either a digit (0-9) or a
period “[0-9.]+”.
Note that inside the square brackets, the period matches an actual period
(i.e., it is not a wildcard between the square brackets).
# Search for lines that start with 'X' followed by any non whitespace characters
# and ':' followed by a space and any number. The number can include a decimal.
import re Mboxno.txt
fhand = open('mboxno.txt') X-Authentication-Warning: nakamura.uits.iupui.edu: apache set
for line in fhand: sender to stephen.marquard@uct.ac.za using -f
To: source@collab.sakaiproject.org
line = line.rstrip() From: stephen.marquard@uct.ac.za
if re.search('^X\S*: [0-9.]+', line): Subject: [sakai] svn commit: r39772 - content/branches/sakai_2-
print(line) 5-x/content-impl/impl/src/java/org/sakaiproject/content/impl
X-Content-Type-Outer-Envelope: text/plain; charset=UTF-8
X-Content-Type-Message-Body: text/plain; charset=UTF-8
Content-Type: text/plain; charset=UTF-8
Output: X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475 X-DSPAM-Processed: Sat Jan 5 09:14:16 2008
X-DSPAM-Probability: 0.0.8000 X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0.8000
‘’’ Search for lines that start with 'X' followed by any non whitespace characters and
':' followed by a space and any number. The number can include a decimal.
Then print the number if it is greater than zero. ‘’’ Mboxno.txt
import re X-Authentication-Warning: nakamura.uits.iupui.edu: apache set
hand = open(Mmboxno.txt') sender to stephen.marquard@uct.ac.za using -f
To: source@collab.sakaiproject.org
for line in hand: From: stephen.marquard@uct.ac.za
line = line.rstrip() Subject: [sakai] svn commit: r39772 - content/branches/sakai_2-
x = re.findall('^X\S*: ([0-9.]+)', line) 5-x/content-impl/impl/src/java/org/sakaiproject/content/impl
X-Content-Type-Outer-Envelope: text/plain; charset=UTF-8
if len(x) > 0: X-Content-Type-Message-Body: text/plain; charset=UTF-8
print(x) Content-Type: text/plain; charset=UTF-8
Output: X-DSPAM-Result: Innocent
['0.8475'] X-DSPAM-Processed: Sat Jan 5 09:14:16 2008
['0.0.8000'] X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0.8000
# Search for lines that start with 'Details: rev=' followed by numbers and '.'
# Then print the number if it is greater than zero
import re
hand = open('mbox-short.txt')
for line in hand:
x = re.findall('^Details:.*rev=([0-9.]+)', line)
if len(x) > 0: Output:
print(x) ['39772']
['39771']
['39770']
['39769']
['39766']
['39765']
‘’’ Search for lines that start with From and a characterfollowed by a two digit
number between 00 and 99 followed by ':' Then print the number if it is greater than
zero [extract hours] ’’’
import re
hand = open('mbox-short.txt')
for line in hand:
line = line.rstrip() Output:
x = re.findall('^From .* ([0-9][0-9]):', line) ['09']
if len(x) > 0: print(x) ['18']
['16']
['15']
['15']
['14']
 Escape character to match the actual character such as a dollar sign or caret
# esape char example
import re
x = 'We just received $10.00 for cookies.'
y = re.findall('\$[0-9.]+',x)
print (y)
Output:
['$10.00']
Special characters and character sequences:
 ˆ Matches the beginning of the line.
 $ Matches the end of the line.
 . Matches any character (a wildcard).
 \s Matches a whitespace character.
 \S Matches a non-whitespace character (opposite of \s).
 * Applies to the immediately preceding character and indicates to
match zero or more of the preceding character(s).
 *? Applies to the immediately preceding character and indicates to
match zero or more of the preceding character(s) in “non-greedy
mode”.
 + Applies to the immediately preceding character and indicates to
match one or more of the preceding character(s).
 +? Applies to the immediately preceding character and indicates

to match one or more of the preceding character(s) in “non-
greedy mode”.
 [aeiou] Matches a single character as long as that character is in

the specified set. In this example, it would match “a”, “e”, “i”, “o”, or
“u”, but no other characters.
 [a-z0-9] You can specify ranges of characters using the minus sign. This
example is a single character that must be a lowercase letter or a digit.
 [ˆA-Za-z] When the first character in the set notation is a caret, it inverts the
logic.This example matches a single character that is anything other than an
uppercase or lowercase letter.
 ( ) When parentheses are added to a regular expression, they are ignored for
the purpose of matching, but allow you to extract a particular subset of the
matched string rather than the whole string when using findall().
 \b Matches the empty string, but only at the start or end of a word.
 \B Matches the empty string, but not at the start or end of a word.
 \d Matches any decimal digit; equivalent to the set [0-9].
 \D Matches any non-digit character; equivalent to the set [ˆ0-9].
Module - 4 : Classes and Objects
 CLASS
class: A programmer-defined type. A class definition creates a new class object.

class object: An object that contains information about a programmer-defined
type. The class object can be used to create instances of the type.
instance: An object that belongs to a class.
instantiate: To create a new object.
attribute: One of the named values associated with an object.
embedded object: An object that is stored as an attribute of another object.

shallow copy: To copy the contents of an object, including any references to

embedded objects; implemented by the copy function in the copy module.
deep copy: To copy the contents of an object as well as any embedded objects,
and any objects embedded in them, and so on; implemented by the deepcopy
function in the copy module.
object diagram: A diagram that shows objects, their attributes, and the values of
the attributes.
# class and object example
>>> class point:

x
Traceback (most recent call last):

File "<pyshell#28>", line 1, in <module>
class point:
File "<pyshell#28>", line 2, in point
x
NameError: name 'x' is not defined
>>>
>>> class point:
x=0
>>> point
<class '__main__.point'>
#Because Point is defined at the top level, its “full name” is __main__.Point
>>> p=point()
#Creating a new object is called instantiation, and the object is an instance of the
class.
>>> p
<__main__.point object at 0x06177EB0>
#The return value is a reference to a Point object, which we assign to p
When we print an instance, Python tells what class it belongs to and where it is stored in memory
(the prefix 0x means that the following number is in hexadecimal).
>>> class point:
x=10
>>> p=point() #p is an object
>>> p.x=10 # accessing data member of class
>>> print(p.x)
10
>>> class point:
x=0
y=0
>>> p=point()
>>> p.x=10
>>> p.y=20
>>> print(p.x,p.y)
10 20
>>> p.x #read the value of an attribute
10
>>> p.y
20
>>> x=p.x
>>> x
10
>>> '(%g, %g)' % (p.x, p.y)
'(10, 20)'
>>>
class Point:
""" Represents a point in 2-D space.attributes: x, y """
x=0
y=0
p=Point()
p.x=10
p.y=20
print(p.x,p.y)
class Point:
pass
p=Point()
p.x=10
p.y=20
print(p.x,p.y)
# class and object example An object that is an attribute of another object is embedded
class Point:
pass
class Rectangle:
pass
p=Point()
p.x=10
p.y=20
print(p.x,p.y)
box = Rectangle()
box.width = 100.0
box.height = 200.0
box.corner = Point()
box.corner.x = 0.0
box.corner.y = 0.0
print(box.width,box.height,box.corner.x,box.corner.y)
Thank You

Python - Module 3

Uploaded by

Copyright:

Available Formats

Python - Module 3

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Python - Module 3

Uploaded by

Copyright:

Available Formats

HKBK COLLEGE OF ENGINEERING, BENGALURU

>>> t = ['d', 'c', 'e', 'b', 'a']

['a', 'b', 'c', 'd', 'e']

b If we don’t need the removed value, use the del operator:

removed. >>> del t[1]

If we don’t provide an index, it deletes and returns the >>> print(t)

# The return value from remove is None.

>>> nums = [3, 41, 12, 9, 74, 15]

The program to compute an average without a list:

average = total / count

The program to compute an average with a list:

average = sum(numlist) / len(numlist)

>>> t = [' Welcome ', ‘ to ‘, ‘ HKBKCE ’]

There are two possible states:

create two lists, we get two objects:

The association of a variable with an object is called a reference.

For immutable objects like strings, aliasing is not as much of a problem.

delete_head removes the first element from a list:

>>> letters = ['a', 'b', 'c']

• A dictionary is like a list, but more general.

• Each key maps to a value.

• The association of a key and a value is called a key-value pair or sometimes an

• the keys and the values are all strings.

To add items to the dictionary, use square brackets:

The order of the key-value pairs is not the same.

Use the keys to look up the corresponding values:

If the key isn’t in the dictionary, we get an exception:

The len function works on dictionaries;

• The in operator uses different algorithms for lists and dictionaries.

str = 'Good Morning God Mother'

>>> counts = { ‘Python' : 1 , ‘PCD' : 42, ‘Rupee': 100}

str = 'Good Morning God Mother'

We can use get to write our histogram loop more concisely.

We will see that we have two for loops.

This loop prints each key and the corresponding value:

counts = { 'BNG' : 1 , 'MYS' : 42, 'SMG': 100}

If we want to print the keys in alphabetical order,

Program for Printing out key-value pairs in sorted order as follows:

counts = {'SMG' : 1 , 'MYS' : 42, 'BNG' : 100}

line.translate(str.maketrans(fromstr, tostr, deletestr))

>>> import string

Greetings from HKBKCE!

 A tuple is a sequence of values much like a list.

 The important difference is that tuples are immutable.

 Syntactically, a tuple is a comma-separated list of values:

Another way to construct a tuple is the built-in function tuple.

Most list operators also work on tuples.

slice operator selects a range of elements:

>>> t = ('A',) + t[1:]

 The comparison operators work with tuples and other sequences.

>>> (0, 1, 2) < (0, 3, 4)

 The sort function works the same way.

Undecorate by extracting the sorted elements of the sequence.

#Program to sort list of words from shortest to longest

txt = 'HKBK College of Engineering was established in 1997'

#Program to sort list of words from longest to shortest

txt = 'HKBK College of Engineering was established in 1997'

Dictionaries have a method called items.

>>> d = {'a':10, 'b':1, 'c':22}