Supercharged Python - Take Your Code To The Next Level
Supercharged Python - Take Your Code To The Next Level
Supercharged Python
Brian Overland
John Bennett
Acknowledgments xxvii
vii
From the Library of Vineeth Babu
Index 605
xxiii
From the Library of Vineeth Babu
◗ You’re rusty: If you’ve dabbled in Python but you’re a little rusty, you may
want to take a look at Chapter 1, “Review of the Fundamentals.” Otherwise,
you may want to skip Chapter 1 or only take a brief look at it.
◗ You know the basics but are still learning: Start with Chapters 2 and 3, which sur-
vey the abilities of strings and lists. This survey includes some advanced abilities
of these data structures that people often miss the first time they learn Python.
◗ Your understanding of Python is strong, but you don’t know everything yet:
Start with Chapter 4, which lists 22 programming shortcuts unique to Python,
that most people take a long time to fully learn.
◗ You want to master special features: You can start in an area of specialty. For
example, Chapters 5, 6, and 7 deal with text formatting and regular expres-
sions. The two chapters on regular expression syntax, Chapters 6 and 7, start
with the basics but then cover the finer points of this pattern-matching tech-
nology. Other chapters deal with other specialties. For example, Chapter 8
describes the different ways of handling text and binary files.
◗ You want to learn advanced math and plotting software: If you want to do
plotting, financial, or scientific applications, start with Chapter 12, “The ‘numpy’
(Numeric Python) Package.” This is the basic package that provides an under-
lying basis for many higher-level capabilities described in Chapters 13 through 15.
Note Ë We sometimes use Notes to point out facts you’ll eventually want to know
but that diverge from the main discussion. You might want to skip over Notes the
first time you read a section, but it’s a good idea to go back later and read them.
Ç Note
ntax
The Key Syntax Icon introduces general syntax displays, into which you supply
Key Sy
some or all of the elements. These elements are called “placeholders,” and they
appear in italics. Some of the syntax—especially keywords and punctuation—
are in bold and intended to be typed in as shown. Finally, square brackets,
when not in bold, indicate an optional item. For example:
set([iterable])
This syntax display implies that iterable is an iterable object (such as a
list or a generator object) that you supply. And it’s optional.
Square brackets, when in bold, are intended literally, to be typed in as
shown. For example:
list_name = [obj1, obj2, obj3, …]
Ellipses (…) indicate a language element that can be repeated any number
of times.
Performance Performance tips are like Notes in that they constitute a short digression
Tip from the rest of the chapter. These tips address the question of how you
can improve software performance. If you’re interested in that topic, you’ll
want to pay special attention to these notes.
Ç Performance Tip
Have Fun
When you master some or all of the techniques of this book, you should make
a delightful discovery: Python often enables you to do a great deal with a rel-
atively small amount of code. That’s why it’s dramatically increasing in popu-
larity every day. Because Python is not just a time-saving device, it’s fun to be
able to program this way . . . to see a few lines of code do so much.
We wish you the joy of that discovery.
Register your copy of Supercharged Python on the InformIT site for conve-
nient access to updates and/or corrections as they become available. To start
the registration process, go to informit.com/register and log in or create
an account. Enter the product ISBN (9780135159941) and click Submit.
Look on the Registered Products tab for an Access Bonus Content link
next to this product, and follow that link to access any available bonus
materials. If you would like to be notified of exclusive offers on new edi-
tions and updates, please check the box to receive email from us.
From John
I want to thank my coauthor, Brian Overland, for inviting me to join him on
this book. This allows me to pass on many of the things I had to work hard to
find documentation for or figure out by brute-force experimentation. Hope-
fully this will save readers a lot of work dealing with the problems I ran into.
xxvii
From the Library of Vineeth Babu
xxix
From the Library of Vineeth Babu
chapter. You might want to take a look at the global statement at the end of
this chapter, however, if you’re not familiar with it. Many people fail to under-
stand this keyword.
1
From the Library of Vineeth Babu
If it helps you in the beginning, you can think of variables as storage loca-
tions into which to place values, even though that’s not precisely what Python
does.
What Python really does is make a, b, and c into names for the values 10,
20, and 30. By this we mean “names” in the ordinary sense of the word. These
names are looked up in a symbol table; they do not correspond to fixed places
in memory! The difference doesn’t matter now, but it will later, when we get
to functions and global variables. These statements, which create a, b, and c
as names, are assignments.
In any case, you can assign new values to a variable once it’s created. So
in the following example, it looks as if we’re incrementing a value stored in a
magical box (even though we’re really not doing that).
>>> n = 5
>>> n = n + 1
>>> n = n + 1
>>> n
7
What’s really going on is that we’re repeatedly reassigning n as a name for
an increasingly higher value. Each time, the old association is broken and n
refers to a new value.
Assignments create variables, and you can’t use a variable name that hasn’t
yet been created. IDLE complains if you attempt the following:
>>> a = 5
>>> b = a + x # ERROR!
Because x has not yet been assigned a value, Python isn’t happy. The solu-
tion is to assign a value to x before it’s used on the right side of an assignment.
In the next example, referring to x no longer causes an error, because it’s been
assigned a value in the second line.
>>> a = 5
>>> x = 2.5
>>> b = a + x
>>> b
7.5
Python has no data declarations. Let us repeat that: There are no data dec-
larations. Instead, a variable is created by an assignment. There are some
other ways to create variables (function arguments and for loops), but for
the most part, a variable must appear on the left of an assignment before it
appears on the right.
1
following:
Then choose Run Module from the Run menu. When you’re prompted to
save the file, click OK and enter the program name as hyp.py. The program
then runs and prints the results in the main IDLE window (or “shell”).
Alternatively, you could enter this program directly into the IDLE environ-
ment, one statement at a time, in which case the sample session should look
like this:
>>> side1 = 5
>>> side2 = 12
>>> hyp = (side1 * side1 + side2 * side2) ** 0.5
>>> hyp
13.0
Let’s step through this example a statement or two at a time. First, the val-
ues 5 and 12 are assigned to variables side1 and side2. Then the hypotenuse
of a right triangle is calculated by squaring both values, adding them together,
and taking the square root of the result. That’s what ** 0.5 does. It raises a
value to the power 0.5, which is the same as taking its square root.
(That last factoid is a tidbit you get from not falling asleep in algebra class.)
The answer printed by the program should be 13.0. It would be nice to
write a program that calculated the hypotenuse for any two values entered by
the user; but we’ll get that soon enough by examining the input statement.
Before moving on, you should know about Python comments. A comment is
text that’s ignored by Python itself, but you can use it to put in information help-
ful to yourself or other programmers who may need to maintain the program.
All text from a hashtag (#) to the end of the line is a comment. This is text
ignored by Python itself that still may be helpful for human readability’s sake.
For example:
side1 = 5 # Initialize one side.
side2 = 12 # Initialize the other.
hyp = (side1 * side1 + side2 * side2) ** 0.5
print(hyp) # Print results.
◗ The first character must be a letter or an underscore (_), but the remaining
characters can be any combination of underscores, letters, and digits.
◗ However, names with leading underscores are intended to be private to a
class, and names starting with double underscores may have special meaning,
such as _ _init_ _ or _ _add_ _, so avoid using names that start with double
underscores.
◗ Avoid any name that is a keyword, such as if, else, elif, and, or, not,
class, while, break, continue, yield, import, and def.
◗ Also, although you can use capitals if you want (names are case-sensitive),
initial-all-capped names are generally reserved for special types, such as class
names. The universal Python convention is to stick to all-lowercase for most
variable names.
Within these rules, there is still a lot of leeway. For example, instead of
using boring names like a, b, and c, we can use i, thou, and a jug_of_wine—
because it’s more fun (with apologies to Omar Khayyam).
i = 10
thou = 20
a_jug_of_wine = 30
loaf_of_bread = 40
inspiration = i + thou + a_jug_of_wine + loaf_of_bread
print(inspiration, 'percent good')
This prints the following:
100 percent good
1
offers a shortcut, just as C and C++ do. Python provides shortcut assignment
ops for many combinations of different operators within an assignment.
n = 0 # n must exist before being modified.
n += 1 # Equivalent to n = n + 1
n += 10 # Equivalent to n = n + 10
n *= 2 # Equivalent to n = n * 2
n -= 1 # Equivalent to n = n - 1
n /= 3 # Equivalent to n = n / 3
The effect of these statements is to start n at the value 0. Then they add 1
to n, then add 10, and then double that, resulting in the value 22, after which 1
is subtracted, producing 21. Finally, n is divided by 3, producing a final result
of n set to 7.0.
Table 1.1 shows that exponentiation has a higher precedence than the mul-
tiplication, division, and remainder operations, which in turn have a higher
precedence than addition and subtraction.
Consequently, parentheses are required in the following statement to pro-
duce the desired result:
hypot = (a * a + b * b) ** 0.5
This statement adds a squared to b squared and then takes the square root
of the sum.
The way that Python interprets integer and floating-point division (/) depends
on the version of Python in use.
◗ Division of any two numbers (integer and/or floating point) always results in a
floating-point result. For example:
4 / 2 # Result is 2.0
7 / 4 # Result is 1.75
1
ground division (//). This also works with floating-point values.
4 // 2 # Result is 2
7 // 4 # Result is 1
23 // 5 # Result is 4
8.0 // 2.5 # Result is 3.0
◗ You can get the remainder using remainder (or modulus) division.
23 % 5 # Result is 3
Note that in remainder division, a division is carried out first and the quo-
tient is thrown away. The result is whatever is left over after division. So 5
goes into 23 four times but results in a remainder of 3.
In Python 2.0, the rules are as follows:
Python also supports a divmod function that returns quotient and remain-
der as a tuple (that is, an ordered group) of two values. For example:
quot, rem = divmod(23, 10)
The values returned in quot and rem, in this case, will be 2 and 3 after exe-
cution. This means that 10 divides into 23 two times and leaves a remainder
of 3.
In Python 2.0, the input function works differently: it instead evaluates the
string entered as if it were a Python statement. To achieve the same result as
the Python 3.0 input statement, use the raw_input function in Python 2.0.
The input function prints the prompt string, if specified; then it returns
the string the user entered. The input string is returned as soon as the user
presses the Enter key; but no newline is appended.
ntax
Key Sy
input(prompt_string)
To store the string returned as a number, you need to convert to integer
(int) or floating-point (float) format. For example, to get an integer use this
code:
n = int(input('Enter integer here: '))
Or use this to get a floating-point number:
x = float(input('Enter floating pt value here: '))
The prompt is printed without an added space, so you typically need to
provide that space yourself.
Why is an int or float conversion necessary? Remember that they are
necessary when you want to get a number. When you get any input by using
the input function, you get back a string, such as “5.” Such a string is fine for
many purposes, but you cannot perform arithmetic on it without performing
the conversion first.
Python 3.0 also supports a print function that—in its simplest form—
prints all its arguments in the order given, putting a space between each.
ntax
Key Sy
print(arguments)
Python 2.0 has a print statement that does the same thing but does not use
parentheses.
The print function has some special arguments that can be entered by
using the name.
1
Python script that’s a complete program. For example, you can enter the fol-
lowing statements into a text file and run it as a script.
side1 = float(input('Enter length of a side: '))
side2 = float(input('Enter another length: '))
hyp = ((side1 * side1) + (side2 * side2)) ** 0.5
print('Length of hypotenuse is:', hyp)
Note Ë Mixing tab characters with actual spaces can cause errors even though it
might not look wrong. So be careful with tabs!
Ç Note
Because there is no “begin block” and “end block” syntax, Python relies on
indentation to know where statement blocks begin and end. The critical rule
is this:
✱ Within any given block of code, the indentation of all statements (that is, at
the same level of nesting) must be the same.
1
the function until you’re done—after which, enter a blank line. Then run
the function by typing its name followed by parentheses. Once a function is
defined, you can execute it as often as you want.
So the following sample session, in the IDLE environment, shows the process
of defining a function and calling it twice. For clarity, user input is in bold.
>>> def main():
side1 = float(input('Enter length of a side: '))
side2 = float(input('Enter another length: '))
hyp = (side1 * side1 + side2 * side2) ** 0.5
print('Length of hypotenuse is: ', hyp)
>>> main()
Enter length of a side: 3
Enter another length: 4
Length of hypotenuse is: 5.0
>>> main()
Enter length of a side: 30
Enter another length: 40
Length of hypotenuse is: 50.0
As you can see, once a function is defined, you can call it (causing it to exe-
cute) as many times as you like.
The Python philosophy is this: Because you should do this indentation any-
way, why shouldn’t Python rely on the indentation and thereby save you the
extra work of putting in curly braces? This is why Python doesn’t have any
“begin block” or “end block” syntax but relies on indentation.
else:
print('a is not greater than b')
c = -10
An if statement can also have any number of optional elif clauses.
Although the following example has statement blocks of one line each, they
can be larger.
age = int(input('Enter age: '))
if age < 13:
print('You are a preteen.')
elif age < 20:
print('You are a teenager.')
elif age <= 30:
print('You are still young.')
else:
print('You are one of the oldies.')
You cannot have empty statement blocks; to have a statement block that
does nothing, use the pass keyword.
Here’s the syntax summary, in which square brackets indicate optional
items, and the ellipses indicate a part of the syntax that can be repeated any
number of times.
ntax
Key Sy
if condition:
indented_statements
[ elif condition:
indented_statements ]...
[ else:
indented_statements ]
while condition:
indented_statements
1
n = 10 # This may be set to any positive integer.
i = 1
while i <= n:
print(i, end=' ')
i += 1
Let’s try entering these statements in a function. But this time, the function
takes an argument, n. Each time it’s executed, the function can take a differ-
ent value for n.
>>> def print_nums(n):
i = 1
while i <= n:
print(i, end=' ')
i += 1
>>> print_nums(3)
1 2 3
>>> print_nums(7)
1 2 3 4 5 6 7
>>> print_nums(8)
1 2 3 4 5 6 7 8
It should be clear how this function works. The variable i starts as 1, and
it’s increased by 1 each time the loop is executed. The loop is executed again
as long as i is equal to or less than n. When i exceeds n, the loop stops, and no
further values are printed.
Optionally, the break statement can be used to exit from the nearest
enclosing loop. And the continue statement can be used to continue to the
next iteration of the loop immediately (going to the top of the loop) but not
exiting as break does.
ntax
Key Sy
break
For example, you can use break to exit from an otherwise infinite loop.
True is a keyword that, like all words in Python, is case-sensitive. Capitaliza-
tion matters.
n = 10 # Set n to any positive integer.
i = 1
while True: # Always executes!
print(i)
if i >= n:
break
i += 1
Note the use of i += 1. If you’ve been paying attention, this means the
same as the following:
i = i + 1 # Add 1 to the current value and reassign.
1
The second app (try it yourself!) is a complete computer game. It secretly
selects a random number between 1 and 50 and then requires you, the player,
to try to find the answer through repeated guesses.
The program begins by using the random package; we present more infor-
mation about that package in Chapter 11. For now, enter the first two lines as
shown, knowing they will be explained later in the book.
from random import randint
n = randint(1, 50)
while True:
ans = int(input('Enter a guess: '))
if ans > n:
print('Too high! Guess again. ')
elif ans < n:
print('Too low! Guess again. ')
else:
print('Congrats! You got it!')
break
To run, enter all this in a Python script (choose New from the File menu),
and then choose Run Module from the Run menu, as usual. Have fun.
All the operators in Table 1.2 are binary—that is, they take two oper-
ands—except not, which takes a single operand and reverses its logical value.
Here’s an example:
if not (age > 12 and age < 20):
print('You are not a teenager.')
By the way, another way to write this—using a Python shortcut—is to write
the following:
if not (12 < age < 20):
print('You are not a teenager.')
This is, as far as we know, a unique Python coding shortcut. In Python
3.0, at least, this example not only works but doesn’t even require parentheses
right after the if and not keywords, because logical not has low precedence
as an operator.
def function_name(arguments):
indented_statements
In this syntax, arguments is a list of argument names, separated by com-
mas if there’s more than one. Here’s the syntax of the return statement:
return value
You can also return multiple values:
return value, value ...
Finally, you can omit the return value. If you do, the effect is the same as the
statement return None.
return # Same effect as return None
1
caller of the function. Reaching the end of a function causes an implicit return—
returning None by default. (Therefore, using return at all is optional.)
Technically speaking, Python argument passing is closer to “pass by ref-
erence” than “pass by value”; however, it isn’t exactly either. When a value is
passed to a Python function, that function receives a reference to the named
data. However, whenever the function assigns a new value to the argument
variable, it breaks the connection to the original variable that was passed.
Therefore, the following function does not do what you might expect. It
does not change the value of the variable passed to it.
def double_it(n):
n = n * 2
x = 10
double_it(x)
print(x) # x is still 10!
This may at first seem a limitation, because sometimes a programmer needs
to create multiple “out” parameters. However, you can do that in Python by
returning multiple values directly. The calling statement must expect the values.
def set_values():
return 10, 20, 30
a, b, c = set_values()
The variables a, b, and c are set to 10, 20, and 30, respectively.
Because Python has no concept of data declarations, an argument list in
Python is just a series of comma-separated names—except that each may
optionally be given a default value. Here is an example of a function definition
with two arguments but no default values:
def calc_hyp(a, b):
hyp = (a * a + b * b) ** 0.5
return hyp
These arguments are listed without type declaration; Python functions do
no type checking except the type checking you do yourself! (However, you can
check a variable’s type by using the type or isinstance function.)
Although arguments have no type, they may be given default values.
The use of default values enables you to write a function in which not all
arguments have to be specified during every function call. A default argument
has the following form:
argument_name = default_value
For example, the following function prints a value multiple times, but the
default number of times is 1:
def print_nums(n, rep=1):
i = 1
while i <= rep:
print(n)
i += 1
Here, the default value of rep is 1; so if no value is given for the last argument,
it’s given the value 1. Therefore this function call prints the number 5 one time:
print_nums(5)
The output looks like this:
5
Note Ë Because the function just shown uses n as an argument name, it’s nat-
ural to assume that n must be a number. However, because Python has no
variable or argument declarations, there’s nothing enforcing that; n could just
as easily be passed a string.
But there are repercussions to data types in Python. In this case, a problem
can arise if you pass a nonnumber to the second argument, rep. The value
passed here is repeatedly compared to a number, so this value, if given, needs
to be numeric. Otherwise, an exception, representing a runtime error, is raised.
Ç Note
1
Named arguments, if used, must come at the end of the list of arguments.
1
'H'
However, you cannot assign new values to characters within existing
strings, because Python strings are immutable: They cannot be changed.
How, then, can new strings be constructed? You do that by using a combi-
nation of concatenation and assignment. Here’s an example:
s1 = 'Abe'
s2 = 'Lincoln'
s1 = s1 + ' ' + s2
In this example, the string s1 started with the value 'Abe', but then it ends
up containing 'Abe Lincoln'.
This operation is permitted because a variable is only a name.
Therefore, you can “modify” a string through concatenation without actually
violating the immutability of strings. Why? It’s because each assignment cre-
ates a new association between the variable and the data. Here’s an example:
my_str = 'a'
my_str += 'b'
my_str += 'c'
The effect of these statements is to create the string 'abc' and to assign
it (or rather, reassign it) to the variable my_str. No string data was actually
modified, despite appearances. What’s really going on in this example is that
the name my_str is used and reused, to name an ever-larger string.
You can think of it this way: With every statement, a larger string is created
and then assigned to the name my_str.
In dealing with Python strings, there’s another important rule to keep in
mind: Indexing a string in Python produces a single character. In Python, a
single character is not a separate type (as it is in C or C++), but is merely a
string of length 1. The choice of quotation marks used has no effect on this
rule.
[ items ]
Here the square brackets are intended literally, and items is a list of zero or
more items, separated by commas if there are more than one. Here’s an exam-
ple, representing a series of high temperatures, in Fahrenheit, over a summer
weekend:
[78, 81, 81]
Lists can contain any kind of object (including other lists!) and, unlike C or
C++, Python lets you mix the types. For example, you can have lists of strings:
['John', 'Paul', 'George', 'Ringo' ]
And you can have lists that mix up the types:
['John', 9, 'Paul', 64 ]
However, lists that have mixed types cannot be automatically sorted in
Python 3.0, and sorting is an important feature.
Unlike some other Python collection classes (dictionaries and sets), order
is significant in a list, and duplicate values are allowed. But it’s the long list of
built-in capabilities (all covered in Chapter 3) that makes Python lists really
impressive. In this section we use two: append, which adds an element to a list
dynamically, and the aforementioned sort capability.
Here’s a slick little program that showcases the Python list-sorting capabil-
ity. Type the following into a Python script and run it.
a_list = []
while True:
s = input('Enter name: ')
if not s:
break
a_list.append(s)
a_list.sort()
print(a_list)
Wow, that’s incredibly short! But does it work? Here’s a sample session:
Enter name: John
Enter name: Paul
Enter name: George
Enter name: Ringo
Enter name: Brian
Enter name:
['Brian', 'George', 'John', 'Paul', 'Ringo']
1
the group and now all are printed in alphabetical order.
This little program, you should see, prompts the user to enter one name at
a time; as each is entered, it’s added to the list through the append method.
Finally, when an empty string is entered, the loop breaks. After that, it’s sorted
and printed.
It may seem that this loop should double each element of my_lst, but it
does not. To process a list in this way, changing values in place, it’s necessary
to use indexing.
my_lst = [10, 15, 25]
for i in [0, 1, 2]:
my_lst[i] *= 2
This has the intended effect: doubling each individual element of my_lst,
so that now the list data is [20, 30, 50].
To index into a list this way, you need to create a sequence of indexes of the
form
0, 1, 2, ... N-1
in which N is the length of the list. You can automate the production of such
sequences of indexes by using the range function. For example, to double
every element of an array of length 5, use this code:
my_lst = [100, 102, 50, 25, 72]
for i in range(5):
my_lst[i] *= 2
This code fragment is not optimal because it hard-codes the length of the
list, that length being 5, into the code. Here is a better way to write this loop:
my_lst = [100, 102, 50, 25, 72]
for i in range(len(my_lst)):
my_lst[i] *= 2
After this loop is executed, my_lst contains [200, 204, 100, 50, 144].
The range function produces a sequence of integers as shown in Table 1.3,
depending on whether you specify one, two, or three arguments.
1
integers. For example, the following loop calculates a factorial number.
n = int(input('Enter a positive integer: '))
prod = 1
for i in range(1, n + 1):
prod *= i
print(prod)
This loop works because range(1, n + 1) produces integers beginning
with 1 up to but not including n + 1. This loop therefore has the effect of
doing the following calculation:
1 * 2 * 3 * ... n
1.17 Tuples
The Python concept of tuple is closely related to that of lists; if anything, the
concept of tuple is even more fundamental. The following code returns a list
of integers:
def my_func():
return [10, 20, 5]
This function returns values as a list.
my_lst = my_func()
But the following code, returning a simple series of values, actually returns
a tuple:
def a_func():
return 10, 20, 5
It can be called as follows:
a, b, c = a_func()
Note that a tuple is a tuple even if it’s grouped within parentheses for clar-
ity’s sake.
return (10, 20, 5) # Parens have no effect in
# this case.
The basic properties of a tuple and a list are almost the same: Each is an
ordered collection, in which any number of repeated values are allowed.
1.18 Dictionaries
A Python dictionary is a collection that contains a series of associations
between key-value pairs. Unlike lists, dictionaries are specified with curly
braces, not square brackets.
ntax
Key Sy
1
grade_dict = { }
Additional rules apply to selecting types for use in dictionaries:
◗ In Python version 3.0, all the keys must share the same type, or at least a com-
patible type, such as integers and floating point, that can be compared.
◗ The key type should be immutable (data you cannot change “in place”).
Strings and tuples are immutable, but lists are not.
◗ Therefore, lists such as [1, 2] cannot be used for keys, but tuples, such as
(1, 2), can.
◗ The values may be of any type; however, it is often a good idea to use the same
type, if possible, for all the value objects.
There’s a caution you should keep in mind. If you attempt to get the value
for a particular key and if that key does not exist, Python raises an exception.
To avoid this, use the get method to ensure that the specified key exists.
ntax
Key Sy
dictionary.get(key [,default_value])
In this syntax, the square brackets indicate an optional item. If the key
exists, its corresponding value in the dictionary is returned. Otherwise, the
default_value is returned, if specified; or None is returned if there is no
such default value. This second argument enables you to write efficient histo-
gram code such as the following, which counts frequencies of words.
s = (input('Enter a string: ')).split()
wrd_counter = {}
for wrd in s:
wrd_counter[wrd] = wrd_counter.get(wrd, 0) + 1
What this example does is the following: When it finds a new word, that
word is entered into the dictionary with the value 0 + 1, or just 1. If it finds an
existing word, that word frequency is returned by get, and then 1 is added to
it. So if a word is found, its frequency count is incremented by 1. If the word
is not found, it’s added to the dictionary with a starting count of 1. Which is
what we want.
In this example, the split method of the string class is used to divide a
string into a list of individual words. For more information on split, see Sec-
tion 2.12, “Breaking Up Input Using ‘split’.”
1.19 Sets
Sets are similar to dictionaries, but they lack associated values. A set, in effect,
is only a set of unique keys, which has the effect of making a set different from
a list in the following ways:
◗ All its members must be unique. An attempt to add an existing value to a set is
simply ignored.
◗ All its members should be immutable, as with dictionary keys.
◗ Order is never significant.
1
operation; it has all those elements that appear in one set or the other but not
both. setSub contains elements that are in the first set (setA in this case) but
not the second (setB).
Appendix C, “Set Methods,” lists all the methods supported by the set
class, along with examples for most of them.
def funcB():
print(count) # Prints 10, the global value.
Do you see how this works? The first function in this example uses its
own local version of count, because such a variable was created within that
function.
But the second function, funcB, created no such variable. Therefore, it uses the
global version, which was created in the first line of the example (count = 10).
The difficulty occurs when you want to refer to a global version of a vari-
able, but you make it the target of an assignment statement. Python has no
def my_func():
global count
count += 1
1
global foo—and therefore foo is created as a global variable. This works
even though the assignment to foo is not part of module-level code.
In general, there is a golden rule about global and local variables in Python.
It’s simply this:
✱ If there’s any chance that a function might attempt to assign a value to a global
variable, use the global statement so that it’s not treated as local.
Chapter 1 Summary
Chapter 1 covers the fundamentals of Python except for class definitions,
advanced operations on collections, and specialized parts of the library such
as file operations. The information presented here is enough to write many
Python programs.
So congratulations! If you understand everything in this chapter, you are
already well on the way to becoming a fluent Python programmer. The next
couple of chapters plunge into the fine points of lists and strings, the two most
important kinds of collections.
Chapter 3 covers called something called “comprehension” in Python (not
to be confused with artificial intelligence) and explains how comprehension
applies not only to lists but also to sets, dictionaries, and other collections. It
also shows you how to use lambda functions.
6 Explain precisely why tab characters can cause a problem with the indenta-
tions used in a Python program (and thereby introduce syntax errors)?
7 What is the advantage of having to rely so much on indentation in Python?
8 How many different values can a Python function return to the caller?
9 Recount this chapter’s solution to the forward reference problem for func-
tions. How can such an issue arise in the first place?
10 When you’re writing a Python text string, what, if anything, should guide
your choice of what kind of quotation marks to use (single, double, or triple)?
11 Name at least one way in which Python lists are different from arrays in other
languages, such as C, which are contiguously stored collections of a single
base type.
33
From the Library of Vineeth Babu
type(data_object)
The action is to take the specified data_object and produce the result
after converting it to the specified type—if the appropriate conversion exists.
If not, Python raises a ValueError exception.
Here are some examples:
s = '45'
n = int(s)
x = float(s)
If you then print n and x, you get the following:
45
45.0
2
Likewise, you can use other bases with the int conversion. The following
code uses octal (8) and hexadecimal (16) bases.
n1 = int('775', 8)
n2 = int('1E', 16)
print('775 octal and 16 hex:', n1, n2)
These statements print the following results:
775 octal and 1E hex: 509 30
We can therefore summarize the int conversion as taking an optional sec-
ond argument, which has a default value of 10, indicating decimal radix.
ntax
Key Sy
int(data_object, radix=10)
The int and float conversions are necessary when you get input from the
keyboard—usually by using the input statement—or get input from a text
file, and you need to convert the digit characters into an actual numeric value.
A str conversion works in the opposite direction. It converts a number into
its string representation. In fact, it works on any type of data for which the
type defines a string representation.
Converting a number to a string enables you to do operations such as
counting the number of printable digits or counting the number of times a
specific digit occurs. For example, the following statements print the length of
the number 1007.
n = 1007
s = str(n) # Convert to '1007'
print('The length of', n, 'is', len(s), 'digits.')
This example prints the following output:
The length of 1007 is 4 digits.
There are other ways to get this same information. You could, for exam-
ple, use the mathematical operation that takes the base-10 logarithm. But
this example suggests what you can do by converting a number to its string
representation.
Note Ë Converting a number to its string representation is not the same as con-
verting a number to its ASCII or Unicode number. That's a different opera-
tion, and it must be done one character at a time by using the ord function.
Ç Note
2
str1 != str2 Returns True if str1 and str2 have different contents.
str1 < str2 Returns True if str1 is earlier in alphabetical ordering than str2. For
example, 'abc' < 'def' returns True, but 'abc' < 'aaa' returns False.
(See the note about ordering.)
str1 > str2 Returns True if str1 is later in alphabetical ordering than str2. For example,
'def' > 'abc' returns True, but 'def' > 'xyz' returns False.
str1 <= str2 Returns True if str1 is earlier than str2 in alphabetical ordering or if the
strings have the same content.
str1 >= str2 Returns True if str1 is later than str2 in alphabetical ordering or if the
strings have the same content.
str1 + str2 Produces the concatenation of the two strings, which is the result of simply
gluing str2 contents onto the end of str1. For example, 'Big' + 'Deal'
produces the concatenated string 'BigDeal'.
str1 * n Produces the result of a string concatenated onto itself n times, where n is an
integer. For example, 'Goo' * 3 produces 'GooGooGoo'.
n * str1 Same effect as str1 * n.
str1 in str2 Produces True if the substring str1, in its entirety, is contained in str2.
str1 not in str2 Produces True if the substring str1 is not contained in str2.
str is obj Returns True if str and obj refer to the same object in memory; sometimes
necessary for comparisons to None or to an unknown object type.
str is not obj Returns True if str and obj do not refer to the same object in memory.
Note Ë When strings are compared, Python uses a form of alphabetical order;
more specifically, it uses code point order, which looks at ASCII or Unicode
values of the characters. In this order, all uppercase letters precede all lower-
case letters, but otherwise letters involve alphabetical comparisons, as you'd
expect. Digit comparisons also work as you’d expect, so that '1' is less than '2'.
Ç Note
Concatenation does not automatically add a space between two words. You
have to do that yourself. But all strings, including literal strings such as ' ',
have the same type, str, so Python has no problem carrying out the following:
first = 'Will'
last = 'Shakespeare'
full_name = first + ' ' + last
print(full_name)
This example prints
Will Shakespeare
The string-multiplication operator (*) can be useful when you’re doing
character-oriented graphics and want to initialize a long line—a divider, for
example.
divider_str = '_' * 30
print(divider_str)
This prints the following:
_ __ __ __ __ __ __ __ __ __ __ __ __ __ __ _
The result of this operation, '_' * 30, is a string made up of 30 underscores.
Be careful not to abuse the is and is not operators. These operators test
for whether or not two values are the same object in memory. You could have
two string variables, for example, which both contain the value "cat". Test-
ing them for equality (==) will always return True in this situation, but obj1
is obj2 might not.
When should you use is or is not? You should use them primarily when
you’re comparing objects of different types, for which the appropriate test for
equality (==) might not be defined. One such case is testing to see whether
some value is equal to the special value None, which is unique and therefore
appropriate to test using is.
2
place within the string.
◗ Slicing is an ability more unique to Python. It enables you to refer to an entire
substring of characters by using a compact syntax.
✱ You cannot use indexing, slicing, or any other operation to change values of a
string “in place,” because strings are immutable.
You can use both positive (nonnegative) and negative indexes in any combi-
nation. Figure 2.1 illustrates how positive indexes run from 0 to N–1, where N
is the length of the string.
This figure also illustrates negative indexes, which run backward from –1
(indicating the last character) to –N.
K i n g M e !
0 1 2 3 4 5 6 7
K i n g M e !
–8 –7 –6 –5 –4 –3 –2 –1
Figure 2.1. String indexing in Python
Suppose you want to remove the beginning and last characters from a
string. In this case, you’ll want to combine positive and negative indexes. Start
with a string that includes opening and closing double quotation marks.
king_str = '"Henry VIII"'
If you print this string directly, you get the following:
"Henry VIII"
But what if you want to print the string without the quotation marks? An
easy way to do that is by executing the following code:
new_str = king_str[1:-1]
print(new_str)
The output is now
Henry VIII
Figure 2.2 illustrates how this works. In slicing operations, the slice begins
with the first argument, up to but not including the second argument.
0 1 –1
“ H e n r y V I I I “
2
Sliced section includes 1, up to
but not including –1
Figure 2.2. String slicing example 1
Here’s another example. Suppose we’d like to extract the second word,
“Bad,” from the phrase “The Bad Dog.” As Figure 2.3 illustrates, the correct
slice would begin with index 4 and extend to all the characters up to but not
including index 7. The string could therefore be accessed as string[4:7].
string[4:7]
0 1 2 3 4 5 6 7 8 9 10
T h e B a d D o g
◗ If both beg and end are positive indexes, beg-end gives the maximum length
of the slice.
◗ To get a string containing the first N characters of a string, use string[:N].
◗ To get a string containing the last N characters of a string, use string[-N:].
◗ To cause a complete copy of the string to be made, use string[:].
Slicing permits a third, and optional, step argument. When positive, the
step argument specifies how many characters to move ahead at a time. A
step argument of 2 means “Get every other character.” A step argument of
3 means “Get every third character.” For example, the following statements
start with the second character in 'RoboCop' and then step through the string
two characters at a time.
a_str = 'RoboCop'
b_str = a_str[1::2] # Get every other character.
print(b_str)
This example prints the following:
ooo
Here’s another example. A step value of 3 means “Get every third charac-
ter.” This time the slice, by default, starts in the first position.
a_str = 'AbcDefGhiJklNop'
b_str = a_str[::3] # Get every third character.
print(b_str)
This example prints the following:
ADGJN
You can even use a negative step value, which causes the slicing to be per-
formed backward through the string. For example, the following function
returns the exact reverse of the string fed to it as an argument.
def reverse_str(s):
return s[::-1]
2
lowing example confirms that the ASCII code for the letter A is decimal 65.
print(ord('A')) # Print 65.
The chr function is the inverse of the ord function. It takes a character code
and returns its ASCII or Unicode equivalent, as a string of length1. Calling chr
with an argument of 65 should therefore print a letter A, which it does.
print(chr(65)) # Print 'A'
The in and not in operators, although not limited to use with one-character
strings, often are used that way. For example, the following statements test
whether the first character of a string is a vowel:
s = 'elephant'
if s[0] in 'aeiou':
print('First char. is a vowel.')
Conversely, you could write a consonant test.
s = 'Helephant'
if s[0] not in 'aeiou':
print('First char. is a consonant.')
One obvious drawback is that these examples do not correctly work on
uppercase letters. Here’s one way to fix that:
if s[0] in 'aeiouAEIOU':
print('First char. is a vowel.')
Alternatively, you can convert a character to uppercase before testing it;
that has the effect of creating a case-insensitive comparison.
s = 'elephant'
if s[0].upper() in 'AEIOU':
print('First char. is a vowel.')
You can also use in and not in to test substrings that contain more than
one character. In that case, the entire substring must be found to produce True.
'bad' in 'a bad dog' # True!
Is there bad in a bad dog? Yes, there is.
2
immutability, they actually do not.
a_str = 'Big '
a_str += 'Bad '
a_str += 'John'
This technique, of using =, +, and += to build strings, is adequate for simple
cases involving a few objects. For example, you could build a string contain-
ing all the letters of the alphabet as follows, using the ord and chr functions
introduced in Section 2.5, “Single-Character Operations (Character Codes).”
n = ord('A')
s = ''
for i in range(n, n + 26):
s += chr(i)
This example has the virtue of brevity. But it causes Python to create
entirely new strings in memory, over and over again.
An alternative, which is slightly better, is to use the join method.
ntax
Key Sy
separator_string.join(list)
This method joins together all the strings in list to form one large string.
If this list has more than one element, the text of separator_string is placed
between each consecutive pair of strings. An empty list is a valid separator
string; in that case, all the strings in the list are simply joined together.
Use of join is usually more efficient at run time than concatenation, although
you probably won’t see the difference in execution time unless there are a great
many elements.
n = ord('A')
a_lst = [ ]
for i in range(n, n + 26):
a_lst.append(chr(i))
s = ''.join(a_lst)
The join method concatenates all the strings in a_lst, a list of strings,
into one large string. The separator string is empty in this case.
Performance The advantage of join over simple concatenation can be seen in large
Tip cases involving thousands of operations. The drawback of concatena-
tion in such cases is that Python has to create thousands of strings of increas-
ing size, which are used once and then thrown away, through “garbage
collection.” But garbage collection exacts a cost in execution time, assuming it
is run often enough to make a difference.
Ç Performance Tip
Here’s a case in which the approach of using join is superior: Suppose you
want to write a function that takes a list of names and prints them one at a
time, nicely separated by commas. Here’s the hard way to write the code:
def print_nice(a_lst):
s = ''
for item in a_lst:
s += item + ', '
if len(s) > 0: # Get rid of trailing
# comma+space
s = s[:-2]
print(s)
Given this function definition, we can call it on a list of strings.
print_nice(['John', 'Paul', 'George', 'Ringo'])
This example prints the following:
John, Paul, George, Ringo
Here’s the version using the join method:
def print_nice(a_lst):
print(', '.join(a_lst))
That’s quite a bit less code!
2
One of the most important functions is len, which can be used with any
of the standard collection classes to determine the number of elements. In
the case of strings, this function returns the number of characters. Here’s an
example:
dog1 = 'Jaxx'
dog2 = 'Cutie Pie'
print(dog1, 'has', len(dog1), 'letters.')
print(dog2, 'has', len(dog2), 'letters.')
This prints the following strings. Note that “Cutie Pie” has nine letters
because it counts the space.
Jaxx has 4 letters.
Cutie Pie has 9 letters.
The reversed and sorted functions produce an iterator and a list, respec-
tively, rather than strings. However, the output from these data objects can
be converted back into strings by using the join method. Here’s an example:
a_str = ''.join(reversed('Wow,Bob,wow!'))
print(a_str)
b_str = ''.join(sorted('Wow,Bob,wow!'))
print(b_str)
This prints the following:
!wow,boB,woW
!,,BWbooowww
2
ter. This requires that each word be capitalized and that no uppercase
letter appear anywhere but at the beginning of a word. There may be
whitespace and punctuation characters in between words.
str.isupper() All letters in the string are uppercase, and there is at least one letter.
(There may, however, be nonalphabetic characters.)
These functions are valid for use with single-character strings as well as
longer strings. The following code illustrates the use of both.
h_str = 'Hello'
if h_str[0].isupper():
print('First letter is uppercase.')
if h_str.isupper():
print('All chars are uppercase.')
else:
print('Not all chars are uppercase.')
This example prints the following:
First letter is uppercase.
Not all chars are uppercase.
This string would also pass the test for being a title, because the first letter
is uppercase and the rest are not.
if h_str.istitle():
print('Qualifies as a title.')
The effects of the lower and upper methods are straightforward. The first
converts each uppercase letter in a string to a lowercase letter; the second does
the converse, converting each lowercase letter to an uppercase letter. Nonletter
characters are not altered but kept in the string as is.
The result, after conversion, is then returned as a new string. The original
string data, being immutable, isn’t changed “in place.” But the following state-
ments do what you’d expect.
my_str = "I'm Henry VIII, I am!"
new_str = my_str.upper()
my_str = new_str
The last two steps can be efficiently merged:
my_str = my_str.upper()
If you then print my_str, you get the following:
I'M HENRY VIII, I AM!
The swapcase method is used only rarely. The string it produces has an
uppercase letter where the source string had a lowercase latter, and vice versa.
For example:
my_str = my_str.swapcase()
print(my_str)
This prints the following:
i'M hENRY viii, i AM!
2
me_str = 'John Bennett, PhD'
is_doc = me_str.endswith('PhD')
These methods, startswith and endswith, can be used on an empty
string without raising an error. If the substring is empty, the return value is
always True.
Now let’s look at other search-and-replace methods of Python strings.
ntax
Key Sy
n = frank_str.count('doo')
print(n) # Print 3.
You can optionally use the start and end arguments with this same
method call.
print(frank_str.count('doo', 1)) # Print 2
print(frank_str.count('doo', 1, 10)) # Print 1
A start argument of 1 specifies that counting begins with the second char-
acter. If start and end are both used, then counting happens over a target
string beginning with start position up to but not including the end position.
These arguments are zero-based indexes, as usual.
If either or both of the arguments (begin, end) are out of range, the count
method does not raise an exception but works on as many characters as it can.
Similar rules apply to the find method. A simple call to this method finds
the first occurrence of the substring argument and returns the nonnegative
index of that instance; it returns –1 if the substring isn’t found.
frank_str = 'doo be doo be doo...'
print(frank_str.find('doo')) # Print 0
print(frank_str.find('doob')) # Print -1
If you want to find the positions of all occurrences of a substring, you can
call the find method in a loop, as in the following example.
frank_str = 'doo be doo be doo...'
n = -1
while True:
n = frank_str.find('doo', n + 1)
if n == -1:
break
print(n, end=' ')
This example prints every index at which an instance of 'doo' can be
found.
0 7 14
This example works by taking advantage of the start argument. After
each successful call to the find method, the initial searching position, n, is set
to the previous successful find index and then is adjusted upward by 1. This
guarantees that the next call to the find method must look for a new instance
of the substring.
If the find operation fails to find any occurrences, it returns a value of –1.
The index and rfind methods are almost identical to the find method,
with a few differences. The index function does not return –1 when it fails to
find an occurrence of the substring. Instead it raises a ValueError exception.
The rfind method searches for the last occurrence of the substring argu-
ment. By default, this method starts at the end and searches to the left. How-
ever, this does not mean it looks for a reverse of the substring. Instead, it
searches for a regular copy of the substring, and it returns the starting index
number of the last occurrence—that is, where the last occurrence starts.
frank_str = 'doo be doo be doo...'
print(frank_str.rfind('doo')) # Prints 14.
The example prints 14 because the rightmost occurrence of 'doo' starts in
zero-based position 14.
2
title = '25 Hues of Grey'
new_title = title.replace('Grey', 'Gray')
Printing new_title produces this:
25 Hues of Gray
The next example illustrates how replace works on multiple occurrences
of the same substring.
title = 'Greyer Into Grey'
new_title = title.replace('Grey', 'Gray')
The new string is now
Grayer Into Gray
input_str.split(delim_string=None)
The call to this method returns a list of substrings taken from input_
string. The delim_string specifies a string that serves as the delimiter; this
is a substring used to separate one token from another.
If delim_string is omitted or is None, then the behavior of split is to, in
effect, use any sequence of one or more whitespace characters (spaces, tabs,
and newlines) to distinguish one token from the next.
For example, the split method—using the default delimiter of a space—
can be used to break up a string containing several names.
stooge_list = 'Moe Larry Curly Shemp'.split()
The resulting list, if printed, is as follows:
['Moe', 'Larry', 'Curly', 'Shemp']
The behavior of split with a None or default argument uses any number
of white spaces in a row as the delimiter. Here’s an example:
stooge_list = 'Moe Larry Curly Shemp'.split()
If, however, a delimiter string is specified, it must be matched precisely to
recognize a divider between one character and the next.
stooge_list = 'Moe Larry Curly Shemp'.split(' ')
In this case, the split method recognizes an extra string—although it is
empty—wherever there’s an extra space. That might not be the behavior you
want. The example just shown would produce the following:
['Moe', '', '', '', 'Larry', 'Curly', '', 'Shemp']
Another common delimiter string is a comma, or possibly a comma com-
bined with a space. In the latter case, the delimiter string must be matched
exactly. Here’s an example:
stooge_list = 'Moe, Larry, Curly, Shemp'.split(', ')
In contrast, the following example uses a simple comma as delimiter. This
example causes the tokens to contain the extra spaces.
stooge_list = 'Moe, Larry, Curly, Shemp'.split(',')
The result in this case includes a leading space in the last three of the four
string elements:
['Moe', ' Larry', ' Curly', ' Shemp']
If you don’t want those leading spaces, an easy solution is to use stripping,
as shown next.
2.13 Stripping
Once you retrieve input from the user or from a text file, you may want to
place it in the correct format by stripping leading and trailing spaces. You
might also want to strip leading and trailing “0” digits or other characters.
The str class provides several methods to let you perform this stripping.
ntax
Key Sy
2
trailing asterisks (*) as well as all leading or trailing “0” digits and plus signs (+).
Internal instances of the character to be stripped are left alone. For exam-
ple, the following statement strips leading and trailing spaces but not the space
in the middle.
name_str = ' Will Shakes '
new_str = name_str.strip()
Figure 2.4 illustrates how this method call works.
W i l l S h a k e s
W i l l S h a k e s
Figure 2.4. Python stripping operations
◗ The text of str is placed in a larger print field of size specified by width.
◗ If the string text is shorter than the specified length, the text is justified left,
right, or centered, as appropriate. The center method slightly favors left jus-
tification if it cannot be centered perfectly.
◗ The rest of the result is padded with the fill character. If this fill character is
not specified, then the default value is a white space.
Here’s an example:
new_str = 'Help!'.center(10, '#')
print(new_str)
This example prints
##Help!###
Another common fill character (other than a space) is the digit character
“0”. Number strings are typically right justified rather than left justified.
Here’s an example:
new_str = '750'.rjust(6, '0')
print(new_str)
This example prints
000750
The zfill method provides a shorter, more compact way of doing the
same thing: padding a string of digits with leading “0” characters.
s = '12'
print(s.zfill(7))
But the zfill method is not just a shortcut for rjust; instead, with zfill,
the zero padding becomes part of the number itself, so the zeros are printed
between the number and the sign:
>>> '-3'.zfill(5)
'-0003'
>>> '-3'.rjust(5, '0')
'000-3'
Chapter 2 Summary
The Python string type (str) is an exceptionally powerful data type, even in
comparison to strings in other languages. String methods include the abilities
to tokenize input (splitting); remove leading and trailing spaces (stripping);
convert to numeric formats; and print numeric expressions in any radix.
The built-in search abilities include methods for counting and finding sub-
strings (count, find, and index) as well as the ability to do text replacement.
2
Chapter 2 Review Questions
1 Does assignment to an indexed character of a string violate Python’s immuta-
bility for strings?
2 Does string concatenation, using the += operator, violate Python’s immutabil-
ity for strings? Why or why not?
3 How many ways are there in Python to index a given character?
4 How, precisely, are indexing and slicing related?
5 What is the exact data type of an indexed character? What is the data type of a
substring produced from slicing?
6 In Python, what is the relationship between the string and character “types”?
7 Name at least two operators and one method that enable you to build a larger
string out of one or more smaller strings.
8 If you are going to use the index method to locate a substring, what is the
advantage of first testing the target string by using in or not in?
9 Which built-in string methods, and which operators, produce a simple Bool-
ean (true/false) results?
To paraphrase the Lord High Executioner in The Mikado, we’ve got a little
list. . . . Actually, in Python we’ve got quite a few of them. One of the foun-
dations of a strong programming language is the concept of arrays or lists—
objects that hold potentially large numbers of other objects, all held together
in a collection.
Python’s most basic collection class is the list, which does everything an
array does in other languages, but much more. This chapter explores the
basic, intermediate, and advanced features of Python lists.
◗ Specify the data on the right side of an assignment. This is where a list is actu-
ally created, or built.
◗ On the left side, put a variable name, just as you would for any other assign-
ment, so that you have a way to refer to the list.
59
From the Library of Vineeth Babu
But it’s much better to use a variable to represent only one type of data and
stick to it. We also recommend using suggestive variable names. For example,
it’s a good idea to use a “list” suffix when you give a name to list collections.
my_int_list = [5, -20, 5, -69]
Here’s a statement that creates a list of strings and names it beat_list:
beat_list = [ 'John', 'Paul', 'George', 'Ringo' ]
You can even create lists that mix numeric and string data.
mixed_list = [10, 'John', 5, 'Paul' ]
But you should mostly avoid mixing data types inside lists. In Python 3.0,
mixing data types prevents you from using the sort method on the list. Inte-
ger and floating-point data, however, can be freely mixed.
num_list = [3, 2, 17, 2.5]
num_list.sort() # Sorts into [2, 2.5, 3, 17]
Another technique you can use for building a collection is to append one
element at a time to an empty list.
my_list = [] # Must do this before you append!
my_list.append(1)
my_list.append(2)
my_list.append(3)
These statements have the same effect as initializing a list all at once, as here:
my_list = [1, 2, 3]
You can also remove list items.
my_list.remove(1) # List is now [2, 3]
The result of this statement is to remove the first instance of an element
equal to 1. If there is no such value in the list, Python raises a ValueError
exception.
List order is meaningful, as are duplicate values. For example, to store a
series of judge’s ratings, you might use the following statement, which indi-
cates that three different judges all assigned the score 1.0, but the third judge
assigned 9.8.
the_scores = [1.0, 1.0, 9.8, 1.0]
The following statement removes only the first instance of 1.0.
the_scores.remove(1.0) # List now equals [1.0, 9.8, 1.0]
3
The first statement creates a list by building it on the right side of the assign-
ment (=). But the second statement in this example creates no data. It just does
the following action:
Make “b_list” an alias for whatever “a_list” refers to.
The variable b_list therefore becomes an alias for whatever a_list
refers to. Consequently, if changes are made to either variable, both reflect
that change.
b_list.append(100)
a_list.append(200)
b_list.append(1)
print(a_list) # This prints [2, 5, 10, 100, 200, 1]
If instead you want to create a separate copy of all the elements of a list, you
need to perform a member-by-member copy. The simplest way to do that is to
use slicing.
my_list = [1, 10, 5]
yr_list = my_list[:] # Perform member-by-member copy.
Now, because my_list and yr_list refer to separate copies of [1, 10, 5],
you can change one of the lists without changing the other.
3.3 Indexing
Python supports both nonnegative and negative indexes.
The nonnegative indexes are zero-based, so in the following example,
list_name[0] refers to the first element. (Section 3.3.2 covers negative
indexes.)
my_list = [100, 500, 1000]
print(my_list[0]) # Print 100.
Because lists are mutable, they can be changed “in place” without creat-
ing an entirely new list. Consequently, you can change individual elements by
making one of those elements the target of an assignment—something you
can’t do with strings.
my_list[1] = 55 # Set second element to 55.
0 1 2 3 4 5
100 200 300 400 500 600
Figure 3.1. Nonnegative indexes
Performance Here, as elsewhere, we’ve used separate calls to the print function
Tip because it’s convenient for illustration purposes. But remember that
repeated calls to print slow down your program, at least within IDLE. A
faster way to print these values is to use only one call to print.
print(a_list[0], a_list[1], a_list[2], sep='\n')
Ç Performance Tip
–6 –5 –4 –3 –2 –1
3
100 200 300 400 500 600
Figure 3.2. Negative indexes
for s in a_list:
print(s)
This prints the following:
Tom
Dick
Jane
This approach is more natural and efficient than relying on indexing, which
would be inefficient and slower.
for i in range(len(a_list)):
print(a_list[i])
But what if you want to list the items next to numbers? You can do that by
using index numbers (plus 1, if you want the indexing to be 1-based), but a
better technique is to use the enumerate function.
ntax
Key Sy
enumerate(iter, start=0)
In this syntax, start is optional. Its default value is 0.
This function takes an iterable, such as a list, and produces another iter-
able, which is a series of tuples. Each of those tuples has the form
(num, item)
In which num is an integer in a series beginning with start. The following
statement shows an example, using a_list from the previous example and
starting the series at 1:
list(enumerate(a_list, 1))
This produces the following:
[(1, 'Tom'), (2, 'Dick'), (3, 'Jane')]
We can put this together with a for loop to produce the desired result.
for item_num, name_str in enumerate(a_list, 1):
print(item_num, '. ', name_str, sep='')
This loop calls the enumerate function to produce tuples of the form (num,
item). Each iteration prints the number followed by a period (“.”) and an
element.
1. Tom
2. Dick
3. Jane
3
list[beg: end: step] All elements starting with beg, up to but not including end;
but movement through the list is step items at a time.
With this syntax, any or all of the three values may be omit-
ted. Each has a reasonable default value; the default value of
step is 1.
Note Ë When Python carries out a slicing operation, which always includes at
least one colon (:) between the square brackets, the index specifications are
not required to be in range. Python copies as many elements as it can. If it fails
to copy any elements at all, the result is simply an empty list.
Ç Note
Figure 3.3 shows an example of how slicing works. Remember that Python
selects elements starting with beg, up to but not including the element referred
to by end. Therefore, the slice a_list[2:5] copies the sublist [300, 400, 500].
a_list[2:5]
0 1 2 3 4 5
100 200 300 400 500 600
Finally, specifying a value for step, the third argument, can affect the data
produced. For example, a value of 2 causes Python to get every other element
from the range [2:5].
a_list = [100, 200, 300, 400, 500, 600]
b_list = a_list[2:5:2] # Produces [300, 500]
A negative step value reverses the direction in which list elements are
accessed. So a step value of –1 produces values in the slice by going backward
through the list one item at a time. A step value of –2 produces values in the
slice by going backward through the list two items at a time.
The following example starts with the last element and works backwards;
it therefore produces an exact copy of the list—with all elements reversed!
rev_list = a_list[::-1]
Here’s an example:
a_list = [100, 200, 300]
rev_list = a_list[::-1]
print(rev_list) # Prints [300, 200, 100]
The step argument can be positive or negative but cannot be 0. If step is
negative, then the defaults for the other values change as follows:
◗ The default value of beg becomes the last element in the list (indexed as –1).
◗ The default value of end becomes the beginning of the list.
3
[10, 707, 777, 50, 60]
You may even assign into a position of length 0. The effect is to insert new
list items without deleting existing ones. Here’s an example:
my_list = [1, 2, 3, 4]
my_list[0:0] = [-50, -40]
print(my_list) # prints [-50, -40, 1, 2, 3, 4]
The following restrictions apply to this ability to assign into slices:
◗ When you assign to a slice of a list, the source of the assignment must be
another list or collection, even if it has zero or one element.
◗ If you include a step argument in the slice to be assigned to, the sizes of the
two collections—the slice assigned to and the sequence providing the data—
must match in size. If step is not specified, the sizes do not need to match.
The first two of these operators (+ and *) involve making copies of list
items. But these are shallow copies. (Section 3.7, “Shallow Versus Deep Copy-
ing,” discusses this issue in greater detail.) So far, shallow copying has worked
fine, but the issue will rear its head when we discuss multidimensional arrays
in Section 3.18.
Consider the following statements:
a_list = [1, 3, 5, 0, 2]
b_list = a_list # Make an alias.
c_list = a_list[:] # Member-by-member copy
After b_list is created, the variable name b_list is just an alias for
a_list. But the third statement in this example creates a new copy of the
data. If a_list is modified later, c_list retains the original order.
3
Section 9.10.3, “Comparison Methods.”
Neither an empty list nor the value None necessarily returns True when
applied to the in operator.
a = [1, 2, 3]
None in a # This produces False
[] in a # So does this.
1 1
2 2
10
And now you can see the problem. A member-by-member copy was carried
out, but the list within the list was a reference, so both lists ended up referring
to the same data in the final position.
The solution is simple. You need to do a deep copy to get the expected
behavior. To get a deep copy, in which even embedded list items get copied,
import the copy package and use copy.deepcopy.
import copy
1 1
2 2
5 5
10 10
3
Figure 3.5. Deep copying
With deep copying, the depth of copying extends to every level. You could
have collections within collections to any level of complexity.
If changes are now made to b_list after being copied to a_list, they will
have no further effect on a_list. The last element of a_list will remain
set to [5,10] until changed directly. All this functionality is thanks to deep
copying.
You’ll often use len when working with lists. For example, the following
loop doubles every item in a list. It’s necessary to use len to make this a gen-
eral solution.
for i in range(len(a_list)):
a_list[i] *= 2
The max and min functions produce maximum and minimum elements,
respectively. These functions work only on lists that have elements with com-
patible types, such as all numeric elements or all string elements. In the case of
strings, alphabetical order (or rather, code point order) enables comparisons.
Here’s an example:
a_list = [100, -3, -5, 120]
print('Length of the list is', len(a_list))
print('Max and min are', max(a_list), min(a_list))
This prints the following:
Length of the list is 4
Max and min are 120 -5
The sorted and reversed functions are similar to the sort and reverse
methods, presented in Section 3.11. But whereas those methods reorganize a
list in place, these functions produce new lists.
These functions work on tuples and strings as well as lists, but the sorted
function always produces a list. Here’s an example:
a_tup = (30, 55, 15, 45)
print(sorted(a_tup)) # Print [15, 30, 45, 55]
The reversed function is unusual because it produces an iterable but not
a collection. In simple terms, this means you need a for loop to print it or else
use a list or tuple conversion. Here’s an example:
a_tup = (1, 3, 5, 0)
for i in reversed(a_tup):
print(i, end=' ')
This prints
0 5 3 1
Alternatively, you can use the following:
print(tuple(reversed(a_tup)))
3
>>> num_list = [2.45, 1, -10, 55.5, 100.03, 40, -3]
>>> print('The average is ', sum(num_list) / len(num_list))
The average is 26.56857142857143
a_list.append(4)
a_list.extend([4]) # This has the same effect.
If the index is out of range, the method places the new value at the end of
the list if the index is too high to be in range, and it inserts the new value at the
beginning of the list if the index is too low. Here’s an example:
a_list = [10, 20, 40] # Missing 30.
a_list.insert(2, 30 ) # At index 2 (third), insert 30.
print(a_list) # Prints [10, 20, 30, 40]
a_list.insert(100, 33)
print(a_list) # Prints [10, 20, 30, 40, 33]
a_list.insert(-100, 44)
print(a_list) # Prints [44, 10, 20, 30, 40, 33]
The remove method removes the first occurrence of the specified argument
from the list. There must be at least one occurrence of this value, or Python
raises a ValueError exception.
my_list = [15, 25, 15, 25]
my_list.remove(25)
print(my_list) # Prints [15, 15, 25]
You may want to use in, not in, or the count method to verify that a
value is in a list before attempting to remove it.
Here’s a practical example that combines these methods.
In competitive gymnastics, winners are determined by a panel of judges,
each of whom submits a score. The highest and lowest scores are thrown out,
and then the average of the remaining scores is taken. The following function
performs these tasks:
def eval_scores(a_list):
a_list.remove(max(a_list))
a_list.remove(min(a_list))
return sum(a_list) / len(a_list)
Here’s a sample session. Suppose that the_scores contains the judges’
ratings.
the_scores = [8.5, 6.0, 8.5, 8.7, 9.9, 9.0]
The eval_scores function throws out the low and high values (6.0 and
9.9); then it calculates the average of the rest, producing 8.675.
print(eval_scores(the_scores))
3
# indexed item: use
# last by default.
In this syntax, brackets are not intended literally but instead indicate optional
items.
The count method returns the number of occurrences of the specified element.
It returns the number of matching items at the top level only. Here’s an example:
yr_list = [1, 2, 1, 1,[3, 4]]
print(yr_list.count(1)) # Prints 3
print(yr_list.count(2)) # Prints 1
print(yr_list.count(3)) # Prints 0
print(yr_list.count([3, 4])) # Prints 1
The index method returns the zero-based index of the first occurrence
of a specified value. You may optionally specify start and end indexes; the
searching happens in a subrange beginning with the start position, up to but
not including the end position. An exception is raised if the item is not found.
For example, the following call to the index method returns 3, signifying
the fourth element.
beat_list = ['John', 'Paul', 'George', 'Ringo']
print(beat_list.index('Ringo')) # Print 3.
But 3 is also printed if the list is defined as
beat_list = ['John', 'Paul', 'George', 'Ringo', 'Ringo']
list.sort([key=None] [, reverse=False])
list.reverse() # Reverse existing order.
Each of these methods changes the ordering of all the elements in place.
In Python 3.0, all the elements of the list—in the case of either method—
must have compatible types, such as all strings or all numbers. The sort
method places all the elements in lowest-to-highest order by default—or by
highest-to-lowest if reverse is specified and set to True. If the list consists of
strings, the strings are placed in alphabetical (code point) order.
The following example program prompts the user for a series of strings,
until the user enters an empty string by pressing Enter without any other
input. The program then prints the strings in alphabetical order.
def main():
my_list = [] # Start with empty list
while True:
s = input('Enter next name: ')
if len(s) == 0:
break
my_list.append(s)
my_list.sort() # Place all elems in order.
print('Here is the sorted list:')
for a_word in my_list:
print(a_word, end=' ')
main()
Here’s a sample session of this program, showing user input in bold.
Enter next name: John
Enter next name: Paul
Enter next name: George
Enter next name: Ringo
Enter next name: Brian
Enter next name:
Here is the sorted list:
Brian George John Paul Ringo
The sort method has some optional arguments. The first is the key argu-
ment, which by default is set to None. This argument, if specified, is a func-
tion (a callable) that’s run on each element to get that element’s key value.
Those keys are compared to determine the new order. So, for example, if a
three-member list produced key values of 15, 1, and 7, they would be sorted as
middle-last-first.
For example, suppose you want a list of strings to be ordered according to
case-insensitive comparisons. An easy way to do that is to write a function
3
b_list.sort(key=ignore_case)
If you now print a_list and b_list in an IDLE session, you get the fol-
lowing results (with user input shown in bold):
>>> a_list
['George', 'Ringo', 'brian', 'john', 'paul']
>>> b_list
['brian', 'George', 'john', 'paul', 'Ringo']
Notice how a_list and b_list, which started with identical contents, are
sorted. The first was sorted by ordinary, case-sensitive comparisons, in which
all uppercase letters are “less than” compared to lowercase letters. The second
list was sorted by case-insensitive comparisons, pushing poor old 'Ringo' to
the end.
The second argument is the reversed argument, which by default is
False. If this argument is included and is True, elements are sorted in high-
to-low order.
The reverse method changes the ordering of the list, as you’d expect, but
without sorting anything. Here’s an example:
my_list = ['Brian', 'John', 'Paul', 'George', 'Ringo']
my_list.reverse() # Reverse elems in place.
for a_word in my_list:
print(a_word, end=' ')
Calling reverse has the effect of producing a reverse sort: the last shall be
first, and the first shall be last. Now Ringo becomes the frontman.
Ringo Paul John George Brian
Note Ë Using the keys argument, as just explained, is a good candidate for the
use of lambda functions, as explained later in Section 3.14.
Ç Note
Push(20)
Push(10) Pop ->20
Pop ->10
20 20
10 10 10 10
0 0 0 0 0
The push and pop functions on a traditional stack are replaced by the
append and pop methods of a Python list.
The key change that needs to be made—conceptually, at any rate—is to
think of operating on the last element to be added to the end of the list, rather
than to the literal top of a stack.
This end-of-the-list approach is functionally equivalent to a stack. Figure 3.7
illustrates 10 and 20 being pushed on, and then popped off, a list used as a
stack. The result is that the items are popped off in reverse order.
0 10 stk.append(10)
0 10 20 stk.append(20)
0 10 20 stk.pop() -> 20
0 10 stk.pop() -> 10
3
7 3 +
This adds 7 to 3, which produces 10. Or, to multiply 10 by 5, producing 50,
you use this:
10 5 *
Then—and here is why RPN is so useful—you can put these two expres-
sions together in a clear, unambiguous way, without any need for parentheses:
10 5 * 7 3 + /
This expression is equivalent to the following standard notation, which
produces 5.0:
(10 * 5) / (7 + 3)
Here's another example:
1 2 / 3 4 / +
This example translates into (1/2) + (3/4) and therefore produces 1.25.
Here’s another example:
2 4 2 3 7 + + + *
This translates into
2 * (4 + (2 + (3 + 7)))
which evaluates to 32. The beauty of an RPN expression is that parentheses
are never needed. The best part is that the interpreter follows only a few sim-
ple rules:
def push(v):
the_stack.append(v)
def pop():
return the_stack.pop()
def main():
s = input('Enter RPN string: ')
a_list = s.split()
for item in a_list:
if item in '+-*/':
op2 = pop()
op1 = pop()
if item == '+':
push(op1 + op2)
elif item == '-':
push(op1 - op2)
elif item == '*':
push(op1 * op2)
else:
push(op1 / op2)
main()
This application, although not long, could be more compact. We’ve included
dedicated push and pop functions operating on a global variable, the_stack.
A few lines could have been saved by using methods of the_stack directly.
op1 = the_stack.pop()
3
...
the_stack.append(op1 + op2) # Push op1 + op2.
Revising the example so that it uses these methods directly is left as an exer-
cise. Note also that there is currently no error checking, such as checking to
make sure that the stack is at least two elements in length before an operation
is carried out. Error checking is also left as an exercise.
Performance The following tip saves you seven lines of code. Instead of testing for
Tip each operator separately, you can use the eval function to take a Python
command string and execute it. You would then need only one function call to
carry out any arithmetic operation in this app.
push(eval(str(op1) + item + str(op2)))
Be careful, however, because the eval function can easily be misused. In
this application, it should be called only if the item is one of the four opera-
tors: +, *, –, or /.
Ç Performance Tip
functools.reduce(function, list)
The action of reduce is to apply the specified function to each succes-
sive pair of neighboring elements in list, accumulating the result, passing it
along, and finally returning the overall answer. The function argument—a
callable—must itself take two arguments and produce a result. Assuming that
a list (or other sequence) has at least four elements, the effect is as follows.
◗ Take the first two elements as arguments to the function. Remember the result.
◗ Take the result from step 1 and the third element as arguments to the func-
tion. Remember this result.
◗ Take the result from step 2 and the fourth element as arguments to the
function.
◗ Continue to the end of the list in this manner.
n = 5
a_list = list(range(1, n + 1))
3
Ç Note
But this usage, while interesting to note, is not usually how a lambda is
used. A more practical use is with the reduce function. For example, here’s
how to calculate the triangle number for 5:
t5 = functools.reduce(lambda x, y: x + y, [1,2,3,4,5])
Here’s how to calculate the factorial of 5:
f5 = functools.reduce(lambda x, y: x * y, [1,2,3,4,5])
Programs create data dynamically, at run time, and assign names to data
objects if you want to refer to them again. The same thing happens with func-
tions (callables); they are created at run time and are either assigned names—
if you want to refer to them again—or used anonymously, as in the last two
examples.
3
b_list = [i * i for i in a_list]
Perhaps by now you can see the pattern. In this second example, the ele-
ments inside the square brackets can be broken down as follows:
b_list = [ ]
for i in a_lst:
b_lst.append(i * i)
[ value for_statement_header ]
new_list = []
for i in my_list:
if i > 0:
new_list.append(i)
The result, in this case, is to place the values [10, 12, 13, 15] in new_list.
The following statement, using list comprehension, does the same thing:
new_list = [i for i in my_list if i > 0 ]
The list-comprehension statement on the right, within the square brackets,
breaks down into three pieces in this case:
3
neg_list:
[-10, -500, -1]
Alternatively, suppose you want to produce the same set, but have it consist
of the squares of positive values from a_list, resulting in {25, 4}. In that
case, you could use the following statement:
my_set = {i * i for i in a_list if i > 0}
Dictionary comprehension is a little more complicated, because in order to
work, it’s necessary to create a loop that generates key-value pairs, using this
syntax:
key : value
Suppose you have a list of tuples that you’d like to be the basis for a data
dictionary.
vals_list = [ ('pi', 3.14), ('phi', 1.618) ]
A dictionary could be created as follows:
my_dict = { i[0]: i[1] for i in vals_list }
Note the use of the colon (:) in the key-value expression, i[0] : i[1].
You can verify that a dictionary was successfully produced by referring to or
printing the following expression, which should produce the number 3.14:
my_dict['pi'] # Produces 3.14.
Here’s another example, which combines data from two lists into a dictio-
nary. It assumes that these two lists are the same length.
keys = ['Bob', 'Carol', 'Ted', 'Alice' ]
vals = [4.0, 4.0, 3.75, 3.9]
grade_dict = { keys[i]: vals[i] for i in range(len(keys)) }
This example creates a dictionary initialized as follows:
grade_dict = { 'Bob':4.0, 'Carol':4.0, 'Ted':3.75,
'Alice':3.9 }
Performance You can improve the performance of the code in this last example by
Tip using the built-in zip function to merge the lists. The comprehension
then is as follows:
grade_dict = { key: val for key, val in zip(keys, vals)}
Ç Performance Tip
3
idict = {v : k for k, v in phone_dict.items() }
The items method of data dictionaries produces a list of k, v pairs, in
which k is a key and v is a value. For each such pair, the value expression v:k
inverts the key-value relationship in producing the new dictionary, idict.
a_list = [0, 0, 0]
set_list_vals(a_list)
print(a_list) # Prints [100, 200, 150]
This approach works because the values of the list are changed in place,
without creating a new list and requiring variable reassignment. But the fol-
lowing example fails to change the list passed to it.
def set_list_vals(list_arg):
list_arg = [100, 200, 150]
a_list = [0, 0, 0]
set_list_vals(a_list)
print(a_list) # Prints [0, 0, 0]
With this approach, the values of the list, a_list, were not changed after
the function returned. What happened?
The answer is that the list argument, list_arg, was reassigned to refer to
a completely new list. The association between the variable list_arg and the
original data, [0, 0, 0], was broken.
However, slicing and indexing are different. Assigning into an indexed item
or a slice of a list does not change what the name refers to; it still refers to the
same list, but the first element of that list is modified.
my_list[0] = new_data # This really changes list data.
Note Ë This chapter describes how to use the core Python language to create
multidimensional lists. Chapter 12 describes the use of the numpy package,
3
which enables the use of highly optimized routines for manipulating multidi-
mensional arrays, especially arrays (or matrixes) of numbers.
Ç Note
It might seem that list multiplication would solve the problem. It does, in
the case of one-dimensional lists.
big_list = [0] * 100 # Create a list of 100 elements
# each initialized to 0.
This works so well, you might be tempted to just generalize to a second
dimension.
mat = [[0] * 100] * 200
But although this statement is legal, it doesn’t do what you want. The inner
expression, [0] * 100, creates a list of 100 elements. But the code repeats
that data 200 times—not by creating 200 separate rows but instead by creat-
ing 200 references to the same row.
The effect is to create 200 rows that aren’t separate. This is a shallow copy;
you get 200 redundant references to the same row. This is frustrating. The
way around it is to append each of the 200 rows one at a time, which you can
do in a for loop:
mat = [ ]
for i in range(200):
mat.append([0] * 100)
In this example, mat starts out as an empty list, just like any other.
Each time through the loop, a row containing 100 zeros is appended. After
this loop is executed, mat will refer to a true two-dimensional matrix made up
of 20,000 fully independent cells. It can then be indexed as high as mat[199]
[99]. Here’s an example:
mat[150][87] = 3.141592
As with other for loops that append data to a list, the previous example is a
great candidate for list comprehension.
mat = [ [0] * 100 for i in range(200) ]
The expression [0] * 100 is the value part of this list-comprehension
expression; it specifies a one-dimensional list (or “row”) that consists of 100
elements, each set to 0. This expression should not be placed in an additional
pair of brackets, by the way, or the effect would be to create an extra, and
unnecessary, level of indexing.
The expression for i in range(200) causes Python to create, and
ntax
append, such a row . . . 200 times.
Key Sy
3
mat2 = [[ [0] * 25 for _ in range(20) ]
for _ in range(30) ]
And here is a 10 × 10 × 10 × 10 four-dimensional list:
mat2 = [[[ [0] * 10 for _ in range(10) ]
for _ in range(10) ]
for _ in range(10) ]
You can build matrixes of higher dimensions still, but remember that as
dimensions increase, things get bigger—fast!
Chapter 3 Summary
This chapter has demonstrated just how powerful Python lists are. Many of
these same abilities are realized in functions, such as len, count, and index,
which apply to other collection classes as well, including strings and tuples.
However, because lists are mutable, there are some list capabilities not sup-
ported by those other types, such as sort and reverse, which alter list data
“in place.”
This chapter also introduced some exotic abilities, such as the use of functools
and lambda functions. It also explained techniques for creating multidimen-
sional lists, an ability that Chapter 12 provides efficient and superior alterna-
tives to; still, it’s useful to know how to create multidimensional lists using the
core language.
2 What’s the most efficient way of creating a Python list that has 1,000 elements
to start with? Assume every element should be initialized to the same value.
3 How do you use slicing to get every other element of a list, while ignoring the
rest? (For example, you want to create a new list that has the first, third, fifth,
seventh, and so on element.)
4 Describe some of the differences between indexing and slicing.
5 What happens when one of the indexes used in a slicing expression is out of
range?
6 If you pass a list to a function, and if you want the function to be able to
change the values of the list—so that the list is different after the function
returns—what action should you avoid?
7 What is an unbalanced matrix?
8 Why does the creation of arbitrarily large matrixes require the use of either
list comprehension or a loop?
4.1 Overview
Python is unusually gifted with shortcuts and time-saving programming
techniques. This chapter begins with a discussion of twenty-two of these
techniques.
Another thing you can do to speed up certain programs is to take advantage
of the many packages that are available with Python. Some of these—such as
re (regular expressions), system, random, and math—come with the stan-
dard Python download, and all you have to do is to include an import state-
ment. Other packages can be downloaded quite easily with the right tools.
95
From the Library of Vineeth Babu
4
the next physical line. Consequently, you can enter as long a statement as you
want—and you can enter a string of any length you want—without necessar-
ily inserting newlines.
my_str = ('I am Hen-er-y the Eighth, '
'I am! I am not just any Henry VIII, '
'I really am!')
This statement places all this text in one string. You can likewise use open
parentheses with other kinds of statements.
length_of_hypotenuse = ( (side1 * side1 + side2 * side2)
** 0.5 )
A statement is not considered complete until all open parentheses [(] have
been matched by closing parentheses [)]. The same is true for braces and
square brackets. As a result, this statement will automatically continue to the
next physical line.
If you ever write code like this, you should try to break the habit as soon as
you can. It’s better to print the contents of a list or iterator directly.
beat_list = ['John', 'Paul', 'George', 'Ringo']
for guy in beat_list:
print(guy)
Even if you need access to a loop variable, it’s better to use the enumerate
function to generate such numbers. Here’s an example:
beat_list = ['John', 'Paul', 'George', 'Ringo']
for i, name in enumerate(beat_list, 1):
print(i, '. ', name, sep='')
This prints
1. John
2. Paul
3. George
4. Ringo
There are, of course, some cases in which it’s necessary to use indexing.
That happens most often when you are trying to change the contents of a list
in place.
4
a_list += [30, 40]
print('a_list:', a_list)
print('b_list:', b_list)
This code prints
a_list: [10, 20, 30, 40]
b_list: [10, 20, 30, 40]
In this case, the change was made to the list in place, so there was no need
to create a new list and reassign that list to the variable. Therefore, a_list
was not assigned to a new list, and b_list, a variable that refers to the same
data in memory, reflects the change as well.
In-place operations are almost always more efficient. In the case of lists,
Python reserves some extra space to grow when allocating a list in memory,
and that in turns permits append operations, as well as +=, to efficiently
grow lists. However, occasionally lists exceed the reserved space and must be
moved. Such memory management is seamless and has little or no impact on
program behavior.
Non-in-place operations are less efficient, because a new object must be cre-
ated. That’s why it’s advisable to use the join method to grow large strings
rather than use the += operator, especially if performance is important. Here’s an
example using the join method to create a list and join 26 characters together.
str_list = []
n = ord('a')
for i in range(n, n + 26):
str_list += chr(i)
alphabet_str = ''.join(str_list)
Figures 4.1 and 4.2 illustrate the difference between in-place operations
and non-in-place operations. In Figure 4.1, string data seems to be appended
onto an existing string, but what the operation really does is to create a new
string and then assign it to the variable—which now refers to a different place
in memory.
2 Create new
S ‘Here’s a string’ string.
1
But in Figure 4.2, list data is appended onto an existing list without the
need to create a new list and reassign the variable.
a_list 10 20 30 40
Create new Grow the list
1 2
list. in place.
Figure 4.2. Appending to a list (in-place)
Here’s a summary:
4
example, suppose you want to assign 1 to a, and 0 to b. The obvious way to do
that is to use the following statements:
a = 1
b = 0
But through tuple assignment, you can combine these into a single
statement.
a, b = 1, 0
In this form of assignment, you have a series of values on one side of the
equals sign (=) and another on the right. They must match in number, with
one exception: You can assign a tuple of any size to a single variable (which
itself now represents a tuple as a result of this operation).
a = 4, 8, 12 # a is now a tuple containing three values.
Tuple assignment can be used to write some passages of code more com-
pactly. Consider how compact a Fibonacci-generating function can be in
Python.
def fibo(n):
a, b = 1, 0
while a <= n:
print(a, end=' ')
a, b = a + b, a
In the last statement, the variable a gets a new value: a + b; the variable b
gets a new value—namely, the old value of a.
4
<class 'int'>
This is not what was wanted in this case. The parentheses were treated as a
no-op, as would any number of enclosing parentheses. But the following state-
ment produces a tuple with one element, although, to be fair, a tuple with just
one element isn’t used very often.
my_tup = (3,) # Assign tuple with one member, 3.
The use of an asterisk (*) provides a good deal of additional flexibility with
tuple assignment. You can use it to split off parts of a tuple and have one (and
only one) variable that becomes the default target for the remaining elements,
which are then put into a list. Some examples should make this clear.
a, *b = 2, 4, 6, 8
In this example, a gets the value 2, and b is assigned to a list:
2
[4, 6, 8]
You can place the asterisk next to any variable on the left, but in no case
more than one. The variable modified with the asterisk is assigned a list of
whatever elements are left over. Here’s an example:
a, *b, c = 10, 20, 30, 40, 50
In this case, a and c refer to 10 and 50, respectively, after this statement is
executed, and b is assigned the list [20, 30, 40].
You can, of course, place the asterisk next to a variable at the end.
big, bigger, *many = 100, 200, 300, 400, 500, 600
4
example:
def double_me(n):
n *= 2
a = 10
double_me(a)
print(a) # Value of a did not get doubled!!
When n is assigned a new value, the association is broken between that
variable and the value that was passed. In effect, n is a local variable that is
now associated with a different place in memory. The variable passed to the
function is unaffected.
But you can always use a return value this way:
def double_me(n):
return n * 2
a = 10
a = double_me(a)
print(a)
Therefore, to get an out parameter, just return a value. But what if you
want more than one out parameter?
In Python, you can return as many values as you want. For example, the
following function performs the quadratic equation by returning two values.
def quad(a, b, c):
determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)
x2 = (-b - determin) / (2 * a)
return x1, x2
This function has three input arguments and two output variables. In call-
ing the function, it’s important to receive both arguments:
x1, x2 = quad(1, -1, -1)
If you return multiple values to a single variable in this case, that variable
will store the values as a tuple. Here’s an example:
>>> x = quad(1, -1, -1)
>>> x
(1.618033988749895, -0.6180339887498949)
Note that this feature—returning multiple values—is actually an applica-
tion of the use of tuples in Python.
4
◗ Nonempty collections and nonempty strings evaluate as True; so do nonzero
numeric values.
◗ Zero-length collections and zero-length strings evaluate to False; so does
any number equal to 0, as well as the special value None.
4
Here’s an example:
a = b = c = d = e = 100
if a == b == c == d == e:
print('All the variables are equal to each other.')
For larger data sets, there are ways to achieve these results more efficiently.
Any list, no matter how large, can be tested to see whether all the elements are
equal this way:
if min(a_list) == max(a_list):
print('All the elements are equal to each other.')
However, when you just want to test a few variables for equality or perform
a combination of comparisons on a single line, the techniques shown in this
section are a nice convenience with Python. Yay, Python!
elif n == 3:
do_volume_subplot(stockdf)
elif n == 4:
do_movingavg_plot(stockdf)
Code like this is verbose. It will work, but it’s longer than it needs to be.
But Python functions are objects, and they can be placed in a list just like any
other kind of objects. You can therefore get a reference to one of the functions
and call it.
fn = [do_plot, do_highlow_plot, do_volume_subplot,
do_movingavg_plot][n-1]
fn(stockdf) # Call the function
For example, n-1 is evaluated, and if that value is 0 (that is, n is equal to 1),
the first function listed, do_plot, is executed.
This code creates a compact version of a C++ switch statement by calling
a different function depending on the value of n. (By the way, the value 0 is
excluded in this case, because that value is used to exit.)
You can create a more flexible control structure by using a dictionary com-
bined with functions. For example, suppose that “load,” “save,” “update,”
and “exit” are all menu functions. We might implement the equivalent of a
switch statement this way:
menu_dict = {'load':load_fn, 'save':save_fn,
'exit':exit_fn, 'update':update_fn}
(menu_dict[selector])() # Call the function
Now the appropriate function will be called, depending on the string con-
tained in selector, which presumably contains 'load', 'save', 'update',
or 'exit'.
4
False. When you’re certain that you’re comparing a value to a unique object,
then the is keyword works reliably; moreover, it’s preferable in those situa-
tions because such a comparison is more efficient.
a_value = my_function()
if a_value is None:
# Take special action if None is returned.
0 1 2 3 4 5 6 7 8 9
Notice that when you’re within IDLE, this for loop is like any other: You
need to type an extra blank line in order to terminate it.
5 7 9 11 13
You can squeeze other kinds of loops onto a line in this way. Also, you don’t
have to use loops but can place any statements on a line that you can manage
to fit there.
>>> a = 1; b = 2; c = a + b; print(c)
3
At this point, some people may object, “But with those semicolons, this
looks like C code!” (Oh, no—anything but that!)
Maybe it does, but it saves space. Keep in mind that the semicolons are
statement separators and not terminators, as in the old Pascal language.
4
blue = 1
green = 2
black = 3
white = 4
This works fine, but it would be nice to find a way to automate this code.
There is a simple trick in Python that allows you to do that, creating an enu-
meration. You can take advantage of multiple assignment along with use of
the range function:
red, blue, green, black, white = range(5)
The number passed to range in this case is the number of settings. Or, if
you want to start the numbering at 1 instead of 0, you can use the following:
red, blue, green, black, white = range(1, 6)
Note Ë For more sophisticated control over the creation and specification of
enumerated types, you can import and examine the enum package.
import enum
help(enum)
You can find information on this feature at
https://docs.python.org/3/library/enum.html.
Ç Note
4
scores were present. This technique involves several rules.
This technique affects only how numbers appear in the code itself and not
how anything is printed. To print a number with thousands-place separators,
use the format function or method as described in Chapter 5, “Formatting
Text Precisely.”
To use Python from the command line, first start the DOS Box applica-
tion, which is present as a major application on all Windows systems. Python
should be easily available because it should be placed in a directory that is part
of the PATH setting. Checking this setting is easy to do while you’re running
a Windows DOS Box.
In Windows, you can also check the PATH setting by opening the Control
Panel, choose Systems, and select the Advanced tab. Then click Environment
Variables.
You then should be able to run Python programs directly as long as they’re
in your PATH. To run a program from the command line, enter python and
the name of the source file (the main module), including the .py extension.
python test.py
4
On Windows-based systems, use the following command to download and
install a desired package.
pip install package_name
The package name, incidentally, uses no file extension:
pip install numpy
On Macintosh systems, you may need to use the pip3 utility, which is
download with Python 3 when you install it on your computer. (You may also
have inherited a version of pip, but it will likely be out-of-date and unusable.)
pip3 install package_name
determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)
x2 = (-b - determin) / (2 * a)
return x1, x2
When this doc string is entered in a function definition, you can get help
from within IDLE:
>>> help(quad)
Help on function quad in module _ _main_ _:
quad(a, b, c)
Quadratic Formula function.
◗ The doc string itself must immediately follow the heading of the function.
◗ It must be a literal string utilizing the triple-quote feature. (You can actually
use any style quote, but you need a literal quotation if you want to span mul-
tiple lines.)
◗ The doc string must also be aligned with the “level-1” indentation under the
function heading: For example, if the statements immediately under the func-
tion heading are indented four spaces, then the beginning of the doc string
must also be indented four spaces.
◗ Subsequent lines of the doc string may be indented as you choose, because
the string is a literal string. You can place the subsequent lines flush left or
continue the indentation you began with the doc string. In either case, Python
online help will line up the text in a helpful way.
This last point needs some clarification. The doc string shown in the previ-
ous example could have been written this way:
def quad(a, b, c):
'''Quadratic Formula function.
4
This function applies the Quadratic Formula
to determine the roots of x in a quadratic
equation of the form ax^2 + bx + c = 0.
'''
As part of the stylistic guidelines, it’s recommended that you put in a brief
summary of the function, followed by a blank line, followed by more detailed
description.
When running Python from the command line, you can use the pydoc util-
ity to get this same online help shown earlier. For example, you could get help
on the module named queens.py. The pydoc utility responds by printing a
help summary for every function. Note that “py” is not entered as part of the
module name in this case.
python -m pydoc queens
◗ Packages included with the Python download itself. This includes math, random,
sys, os, time, datetime, and os.path. These packages are especially conve-
nient, because no additional downloading is necessary.
◗ Packages you can download from the Internet.
import package_name
For example:
import math
Once a package is imported, you can, within IDLE, get help on its contents.
Here’s an example:
>>> import math
>>> help(math)
If you type these commands from within IDLE, you’ll see that the math
package supports a great many functions.
But with this approach, each of the functions needs to be qualified using
the dot (.) syntax. For example, one of the functions supported is sqrt (square
root), which takes an integer or floating-point input.
>>> math.sqrt(2)
1.4142135623730951
You can use the math package, if you choose, to calculate the value of pi.
However, the math package also provides this number directly.
>>> math.atan(1) * 4
3.141592653589793
>>> math.pi
3.141592653589793
Let’s look at one of the variations on the import statement.
ntax
Key Sy
4
an asterisk (*).
>>> from math import *
>>> print(pi)
3.141592653589793
>>> print(sqrt(2))
1.4142135623730951
The drawback of using this version of import is that with very large and
complex programs, it gets difficult to keep track of all the names you’re using,
and when you import packages without requiring a package-name qualifier,
name conflicts can arise.
So, unless you know what you’re doing or are importing a really small pack-
age, it’s more advisable to import specific symbols than use the asterisk (*).
4
and plotting routines to create impressive-looking graphs.
This package is explored in Chapter 15. It also needs to be downloaded.
new name. You can also assign a different function altogether to the symbolic
name, avg.
def new_func(a_list):
return (sum(a_list) / len(a_list))
old_avg = avg
avg = new_func
The symbolic name old_avg now refers to the older, and longer, function
we defined before. The symbolic name avg now refers to the newer function just
defined.
The name old_avg now refers to our first averaging function, and we can
call it, just as we used to call avg.
>>> old_avg([4, 6])
The average is 5.0
5.0
The next function shown (which we might loosely term a “metafunction,”
although it’s really quite ordinary) prints information about another function—
specifically, the function argument passed to it.
def func_info(func):
print('Function name:', func._ _name_ _)
print('Function documentation:')
help(func)
If we run this function on old_avg, which has been assigned to our first
averaging function at the beginning of this section, we get this result:
Function name: avg
Function documentation:
Help on function avg in module _ _main_ _:
avg(a_list)
This function finds the average val in a list.
We’re currently using the symbolic name old_avg to refer to the first func-
tion that was defined in this section. Notice that when we get the function’s
name, the information printed uses the name that the function was originally
defined with.
All of these operations will become important when we get to the topic of
“decorating” in Section 4.9, “Decorators and Function Profilers.”
4
The brackets are used in this case to show that *args may optionally be
preceded by any number of ordinary positional arguments, represented here
as ordinary_args. The use of such arguments is always optional.
In this syntax, the name args can actually be any symbolic name you want.
By convention, Python programs use the name args for this purpose.
The symbolic name args is then interpreted as a Python list like any other;
you expand it by indexing it or using it in a for loop. You can also take its
length as needed. Here’s an example:
def my_var_func(*args):
print('The number of args is', len(args))
for item in args:
print(items)
This function, my_var_func, can be used with argument lists of any length.
>>> my_var_func(10, 20, 30, 40)
The number of args is 4
10
20
30
40
A more useful function would be one that took any number of numeric
arguments and returned the average. Here’s an easy way to write that function.
def avg(*args):
return sum(args)/len(args)
Now we can call the function with a different number of arguments each
time.
>>> avg(11, 22, 33)
22.0
>>> avg(1, 2)
1.5
The advantage of writing the function this way is that no brackets are
needed when you call this function. The arguments are interpreted as if they
were elements of a list, but you pass these arguments without list syntax.
What about the ordinary arguments we mentioned earlier? Additional
arguments, not included in the list *args, must either precede *args in the
argument list or be keyword arguments.
For example, let’s revisit the avg example. Suppose we want a separate
argument that specifies what units we’re using. Because units is not a key-
word argument, it must appear at the beginning of the list, in front of *args.
def avg(units, *args):
print (sum(args)/len(args), units)
Here’s a sample use:
>>> avg('inches', 11, 22, 33)
22.0 inches
This function is valid because the ordinary argument, units, precedes the
argument list, *args.
Note Ë The asterisk (*) has a number of uses in Python. In this context, it’s
called the splat or the positional expansion operator. Its basic use is to rep-
resent an “unpacked list”; more specifically, it replaces a list with a simple
sequence of separate items.
The limitation on such an entity as *args is that there isn’t much you can
do with it. One thing you can do (which will be important in Section 4.9,
“Decorators and Function Profilers”) is pass it along to a function. Here’s an
example:
>>> ls = [1, 2, 3] # Unpacked list.
>>> print(*ls) # Print unpacked version
1 2 3
>>> print(ls) # Print packed (ordinary list).
[1, 2, 3]
4
arguments.
ntax
Key Sy
The following example defines such a function and then calls it.
def pr_vals_2(*args, **kwargs):
for i in args:
print(i)
for k in kwargs:
print(k, ':', kwargs[k])
Note Ë Although args and kwargs are expanded into a list and a dictionary,
respectively, these symbols can be passed along to another function, as shown
in the next section.
Ç Note
F1 = Decorator(F1)
4
Here’s an example of a decorator function that takes a function as argu-
ment and wraps it by adding calls to the time.time function. Note that time
is a package, and it must be imported before time.time is called.
import time
def make_timer(func):
def wrapper():
t1 = time.time()
ret_val = func()
t2 = time.time()
print('Time elapsed was', t2 - t1)
return ret_val
return wrapper
There are several functions involved with this simple example (which, by
the way, is not yet complete!), so let’s review.
◗ There is a function to be given as input; let’s call this the original function (F1
in this case). We’d like to be able to input any function we want, and have it
decorated—that is, acquire some additional statements.
◗ The wrapper function is the result of adding these additional statements to
the original function. In this case, these added statements report the number
of seconds the original function took to execute.
◗ The decorator is the function that performs the work of creating the wrapper
function and returning it. The decorator is able to do this because it internally
uses the def keyword to define a new function.
If you look at this decorator function, you should notice it has an important
omission: The arguments to the original function, func, are ignored. The wrap-
per function, as a result, will not correctly call func if arguments are involved.
The solution involves the *args and **kwargs language features, intro-
duced in the previous section. Here’s the full decorator:
import time
def make_timer(func):
def wrapper(*args, **kwargs):
t1 = time.time()
ret_val = func(*args, **kwargs)
t2 = time.time()
print('Time elapsed was', t2 - t1)
return ret_val
return wrapper
The new function, remember, will be wrapper. It is wrapper (or rather, the
function temporarily named wrapper) that will eventually be called in place
of func; this wrapper function therefore must be able to take any number of
arguments, including any number of keyword arguments. The correct action
is to pass along all these arguments to the original function, func. Here’s how:
ret_val = func(*args, **kwargs)
Returning a value is also handled here; the wrapper returns the same value
as func, as it should. What if func returns no value? That’s not a problem,
because Python functions return None by default. So the value None, in that
case, is simply passed along. (You don’t have to test for the existence of a
return value; there always is one!)
Having defined this decorator, make_timer, we can take any function and
produce a wrapped version of it. Then—and this is almost the final trick—
we reassign the function name so that it refers to the wrapped version of the
function.
def count_nums(n):
for i in range(n):
for j in range(1000):
pass
count_nums = make_timer(count_nums)
4
time, and (2) this more elaborate version is what the name, count_nums, will
hereafter refer to. Python symbols can refer to any object, including functions
(callable objects). Therefore, we can reassign function names all we want.
count_nums = wrapper
Or, more accurately,
count_nums = make_timer(count_nums)
So now, when you run count_nums (which now refers to the wrapped ver-
sion of the function), you’ll get output like this, reporting execution time in
seconds.
>>> count_nums(33000)
Time elapsed was 1.063697338104248
The original version of count_nums did nothing except do some count-
ing; this wrapped version reports the passage of time in addition to calling the
original version of count_nums.
As a final step, Python provides a small but convenient bit of syntax to
automate the reassignment of the function name.
ntax
Key Sy
@decorator
def func(args):
statements
This syntax is translated into the following:
def func(args):
statements
func = decorator(func)
In either case, it’s assumed that decorator is a function that has already
been defined. This decorator must take a function as its argument and return
a wrapped version of the function. Assuming all this has been done correctly,
here’s a complete example utilizing the @ sign.
@make_timer
def count_nums(n):
for i in range(n):
for j in range(1000):
pass
After this definition is executed by Python, count_num can then be called,
and it will execute count_num as defined, but it will also add (as part of the
wrapper) a print statement telling the number of elapsed seconds.
Remember that this part of the trick (the final trick, actually) is to get the
name count_nums to refer to the new version of count_nums, after the new
statements have been added through the process of decoration.
4.10 Generators
There’s no subject in Python about which more confusion abounds than gen-
erators. It’s not a difficult feature once you understand it. Explaining it’s the
hard part.
But first, what does a generator do? The answer: It enables you to deal with
a sequence one element at a time.
Suppose you need to deal with a sequence of elements that would take a
long time to produce if you had to store it all in memory at the same time. For
example, you want to examine all the Fibonacci numbers up to 10 to the 50th
power. It would take a lot of time and space to calculate the entire sequence.
Or you may want to deal with an infinite sequence, such as all even numbers.
The advantage of a generator is that it enables you to deal with one member
of a sequence at a time. This creates a kind of “virtual sequence.”
4
>>> iter1 = reversed([1, 2, 3, 4])
>>> for i in iter1:
print(i, end=' ')
4 3 2 1
Iterators have state information; after reaching the end of its series, an iter-
ator is exhausted. If we used iter1 again without resetting it, it would produce
no more values.
Here’s what almost everybody gets wrong when trying to explain this pro-
cess: It looks as if the yield statement, placed in the generator function (the
thing on the left in Figure 4.4), is doing the yielding. That’s “sort of” true, but
it’s not really what’s going on.
The generator function defines the behavior of the iterator. But the iterator
object, the thing to its right in Figure 4.4, is what actually executes this behavior.
When you include one or more yield statements in a function, the func-
tion is no longer an ordinary Python function; yield describes a behavior in
which the function does not return a value but sends a value back to the caller
of next. State information is saved, so when next is called again, the iterator
advances to the next value in the series without starting over. This part, every-
one seems to understand.
But—and this is where people get confused—it isn’t the generator function
that performs these actions, even though that’s where the behavior is defined.
Fortunately, you don’t need to understand it; you just need to use it. Let’s start
with a function that prints even numbers from 2 to 10:
def print_evens():
for n in range(2, 11, 2):
print(n)
Now replace print(n) with the statement yield n. Doing so changes the
nature of what the function does. While we’re at it, let’s change the name to
make_evens_gen to have a more accurate description.
4
iterator object, and that’s the object that yields a value. We can save the itera-
tor object (or generator object) and then pass it to next.
>>> my_gen = make_evens_gen()
>>> next(my_gen)
2
>>> next(my_gen)
4
>>> next(my_gen)
6
Eventually, calling next exhausts the series, and a StopIteration excep-
tion is raised. But what if you want to reset the sequence of values to the begin-
ning? Easy. You can do that by calling make_evens_gen again, producing a
new instance of the iterator. This has the effect of starting over.
>>> my_gen = make_evens_gen() # Start over
>>> next(my_gen)
2
>>> next(my_gen)
4
>>> next(my_gen)
6
>>> my_gen = make_evens_gen() # Start over
>>> next(my_gen)
2
>>> next(my_gen)
4
>>> next(my_gen)
6
What happens if you call make_evens_gen every time? In that case, you
keep starting over, because each time you’re creating a new generator object.
This is most certainly not what you want.
>>> next(make_evens_gen())
2
>>> next(make_evens_gen())
2
>>> next(make_evens_gen())
2
Generators can be used in for statements, and that’s one of the most fre-
quent uses. For example, we can call make_evens_gen as follows:
for i in make_evens_gen():
print(i, end=' ')
This block of code produces the result you’d expect:
2 4 6 8 10
But let’s take a look at what’s really happening. The for block calls make_
evens_gen one time. The result of the call is to get a generator object. That
object then provides the values in the for loop. The same effect is achieved by
the following code, which breaks the function call onto an earlier line.
>>> my_gen = make_evens_gen()
>>> for i in my_gen:
print(i, end=' ')
Remember that my_gen is an iterator object. If you instead referred to
make_evens_gen directly, Python would raise an exception.
for i in make_evens_gen: # ERROR! Not an iterable!
print(i, end=' ')
Once you understand that the object returned by the generator function
is the generator object, also called the iterator, you can call it anywhere an
iterable or iterator is accepted in the syntax. For example, you can con-
vert a generator object to a list, as follows.
>>> my_gen = make_evens_gen()
>>> a_list = list(my_gen)
>>> a_list
[2, 4, 6, 8, 10]
4
>>> a_list = list(make_evens_gen())
>>> a_list
[2, 4, 6, 8, 10]
One of the most practical uses of an iterator is with the in and not in
keywords. We can, for example, generate an iterator that produces Fibonacci
numbers up to and including N, but not larger than N.
def make_fibo_gen(n):
a, b = 1, 1
while a <= n:
yield a
a, b = a + b, a
The yield statement changes this function from an ordinary function to
a generator function, so it returns a generator object (iterator). We can now
determine whether a number is a Fibonacci by using the following test:
n = int(input('Enter number: '))
if n in make_fibo_gen(n):
print('number is a Fibonacci. ')
else:
print('number is not a Fibonacci. ')
This example works because the iterator produced does not yield an infinite
sequence, something that would cause a problem. Instead, the iterator termi-
nates if n is reached without being confirmed as a Fibonacci.
Remember—and we state this one last time—by putting yield into the
function make_fibo_gen, it becomes a generator function and it returns the
generator object we need. The previous example could have been written as
follows, so that the function call is made in a separate statement. The effect is
the same.
n = int(input('Enter number: '))
my_fibo_gen = make_fibo_gen(n)
if n in my_fibo_gen:
print('number is a Fibonacci. ')
else:
print('number is not a Fibonacci. ')
As always, remember that a generator function (which contains the yield
statement) is not a generator object at all, but rather a generator factory. This
is confusing, but you just have to get used to it. In any case, Figure 4.4 shows
what’s really going on, and you should refer to it often.
len(argv) = 4
4
Program name
Figure 4.5. Command-line arguments and argv
In most cases, you’ll probably ignore the program name and focus on the
other arguments. For example, here is a program named silly.py that does
nothing but print all the arguments given to it, including the program name.
import sys
for thing in sys.argv:
print(thing, end=' ')
Now suppose we enter this command line:
python silly.py arg1 arg2 arg3
The Terminal program (in Mac) or the DOS Box prints the following:
silly.py arg1 arg2 arg3
The following example gives a more sophisticated way to use these strings,
by converting them to floating-point format and passing the numbers to the
quad function.
import sys
determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)
x2 = (-b - determin) / (2 * a)
return x1, x2
def main():
'''Get argument values, convert, call quad.'''
main()
The interesting line here is this one:
s1, s2, s3 = sys.argv[1], sys.argv[2], sys.argv[3]
Again, the sys.argv list is zero-based, like any other Python list, but the
program name, referred to as sys.arg[0], typically isn’t used in the program
code. Presumably you already know what the name of your program is, so you
don’t need to look it up.
Of course, from within the program you can’t always be sure that argument
values were specified on the command line. If they were not specified, you
may want to provide an alternative, such as prompting the user for these same
values.
Remember that the length of the argument list is always N+1, where N
is the number of command-line arguments—beyond the program name, of
course.
Therefore, we could revise the previous example as follows:
import sys
determin = (b * b - 4 * a * c) ** .5
x1 = (-b + determin) / (2 * a)
x2 = (-b - determin) / (2 * a)
return x1, x2
def main():
'''Get argument values, convert, call quad.'''
main()
The key lines in this version are in the following if statement:
if len(sys.argv) > 3:
4
s1, s2, s3 = sys.argv[1], sys.argv[2], sys.argv[3]
else:
s1 = input('Enter a: ')
s2 = input('Enter b: ')
s3 = input('Enter c: ')
a, b, c = float(s1), float(s2), float(s3)
If there are at least four elements in sys.argv (and therefore three
command-line arguments beyond the program name itself), the program uses
those strings. Otherwise, the program prompts for the values.
So, from the command line, you’ll be able to run the following:
python quad.py 1 -9 20
The program then prints these results:
x values: 4.0 5.0
Chapter 4 Summary
A large part of this chapter presented ways to improve your efficiency through
writing better and more efficient Python code. Beyond that, you can make your
Python programs run faster if you call the print function as rarely as possible
from within IDLE—or else run programs from the command line only.
A technique helpful in making your code more efficient is to profile it by
using the time and datetime packages to compute the relative speed of the
code, given different algorithms. Writing decorators is helpful in this respect,
because you can use them to profile function performance.
145
From the Library of Vineeth Babu
A better approach is to use the str class formatting operator (%) to format
the output, using format specifiers like those used by the C-language “printf”
function. Here’s how you’d revise the example:
print('%d plus %d equals %d.' % (a, b, c))
Isn’t that better?
The expression (a, b, c) is actually a tuple containing three arguments,
each corresponding to a separate occurrence of %d within the format string.
The parentheses in (a, b, c) are strictly required—although they are not
required if there is only one argument.
>>> 'Here is a number: %d.' % 100
'Here is a number: 100.'
These elements can be broken up programmatically, of course. Here’s an
example:
n = 25 + 75
fmt_str = 'The sum is %d.'
print(fmt_str % n)
This example prints the following:
The sum is 100.
The string formatting operator, %, can appear in either of these two
versions.
ntax
Key Sy
5
%x Hexadecimal integer. ff09a
%X Same as %x, but letter digits A–F are uppercase. FF09A
%o Octal integer. 177
%u Unsigned integer. (But note that this doesn’t reliably change signed 257
integers into their unsigned equivalent, as you’d expect.)
%f Floating-point number to be printed in fixed-point format 3.1400
%F Same as %f. 33.1400
%e Floating-point number, printing exponent sign (e). 3.140000e+00
%E Same as %e but uses uppercase E. 3.140000E+00
%g Floating point, using shortest canonical representation. 7e-06
%G Same as %g but uses uppercase E if printing an exponent. 7E-06
%% A literal percent sign (%). %
Here’s an example that uses the int conversion, along with hexadecimal
output, to add two hexadecimal numbers: e9 and 10.
h1 = int('e9', 16)
h2 = int('10', 16)
print('The result is %x.' % (h1 + h2))
The example prints
The result is f9.
%[-][width][.precision]c
In this syntax, the square brackets indicate optional items and are not
intended literally. The minus sign (–) specifies left justification within the print
field. With this technology, the default is right justification for all data types.
But the following example uses left justification, which is not the default,
by including the minus sign (–) as part of the specifier.
>>> 'This is a number: %-6d.' % 255
'This is a number: 255 .'
As for the rest of the syntax, a format specifier can take any of the follow-
ing formats.
%c
%widthc
%width.precisionc
%.precisionc
5
These statements print
Amount is 25.
Amount is 00025.
Amount is 00025.
Finally, the width and precision fields control print-field width and pre-
cision in a floating-point number. The precision is the number of digits to the
right of the decimal point; this number contains trailing zeros if necessary.
Here’s an example:
print('result:%12.5f' % 3.14)
print('result:%12.5f' % 333.14)
These statements print the following:
result: 3.14000
result: 333.14000
In this case, the number 3.14 is padded with trailing zeros, because a pre-
cision of 5 digits was specified. When the precision field is smaller than the
precision of the value to be printed, the number is rounded up or down as
appropriate.
print('%.4f' % 3.141592)
This function call prints the following—in this case with 4 digits of preci-
sion, produced through rounding:
3.1416
Use of the %s and %r format characters enables you to work with any classes
of data. These specifiers result in the calling of one of the internal methods
from those classes supporting string representation of the class, as explained
in Chapter 9, “Classes and Magic Methods.”
In many cases, there’s no difference in effect between the %s and %r speci-
fiers. For example, either one, used with an int or float object, will result in
that number being translated into the string representation you’d expect.
You can see those results in the following IDLE session, in which user input
is in bold.
>>> 'The number is %s.' % 10
The number is 10.
>>> 'The number is %r.' % 10
The number is 10.
From these examples, you can see that both the %s and the %r just print the
standard string representation of an integer.
In some cases, there is a difference between the string representation indi-
cated by %s and by %r. The latter is intended to get the canonical representa-
tion of the object as it appears in Python code.
One of the principal differences between the two forms of representation is
that the %r representation includes quotation marks around strings, whereas
%s does not.
>>> print('My name is %r.' % 'Sam')
My name is 'Sam'.
>>> print('My name is %s.' % 'Sam')
My name is Sam.
5
there must be an additional argument. So if you want to format two such
data objects at once, you’d need to have four arguments altogether. Here’s an
example:
>>> 'Item 1: %*s, Item 2: %*s' % (8, 'Bob', 8, 'Suzanne')
'Item 1: Bob, Item 2: Suzanne'
The arguments—all placed in the tuple following the argument (with
parentheses required, by the way)—are 8, 'Bob', 8, and 'Suzanne'.
The meaning of these four arguments is as follows:
✱ Where you’d normally put an integer as a formatting code, you can instead
place an asterisk (*); and for each such asterisk, you must place a correspond-
ing integer expression in the argument list.
Class of object
being printed
5
For each
print
field
May have
multiple print
fields
Figure 5.1. Flow of control between formatting routines
ntax
Key Sy
format(data, spec)
This function returns a string after evaluating the data and then formatting
according to the specification string, spec. The latter argument is a string
containing the specification for printing one item.
The syntax shown next provides a simplified view of spec grammar. It
omits some features such as the fill and align characters, as well as the use of 0
in right justifying and padding a number. To see the complete syntax of spec,
see Section 5.8, “The ‘spec’ Field of the ‘format’ Method.”
ntax
Key Sy
[width][,][.precision][type]
In this syntax, the brackets are not intended literally but signify optional
items. Here is a summary of the meaning.
The function attempts to place the string representation of the data into a
print field of width size, justifying text if necessary by padding with spaces.
Numeric data is right justified by default; string data is left justified by default.
The comma (,) indicates insertion of commas as thousands place separa-
tors. This is legal only with numeric data; otherwise, an exception is raised.
The precision indicates the total number of digits to print with a float-
ing-point number, or, if the data is not numeric, a maximum length for string
data. It is not supported for use with integers. If the type_char is f, then the
precision indicates a fixed number of digits to print to the right of the decimal
point.
The type_char is sometimes a radix indicator, such as b or x (binary or
hexadecimal), but more often it is a floating-point specifier such as f, which
indicates fixed-point format, or e and g, as described later in Table 5.5.
Table 5.2 gives some examples of using this specification. You can figure
out most of the syntax by studying these examples.
The remainder of this section discusses the features in more detail, particu-
larly width and precision fields.
5
The thousands place separator is fairly self-explanatory but works only
with numbers. Python raises an exception if this specifier is used with data
that isn’t numeric.
You might use it to format a large number such as 150 million.
>>> n = 150000000
>>> print(format(n, ','))
150,000,000
The width character is used consistently, always specifying a minimum
print-field width. The string representation is padded—with spaces by
default—and uses a default of left justification for strings and right justifica-
tion for numbers. Both the padding character and justification can be altered,
however, as explained later in this chapter, in Section 5.8.2, “Text Justifica-
tion: ‘fill’ and ‘align’ Characters.”
Here are examples of justification, padding, and print fields. The single
quotation marks implicitly show the extent of the print fields. Remember that
numeric data (150 and 99, in this case) are right justified by default, but other
data is not.
>>> format('Bob', '10')
'Bob '
>>> format('Suzie', '7')
'Suzie '
format_specifying_str.format(args)
Let’s break down the syntax a little. This expression passes through all
the text in format_specifying_str (or just “format string”), except where
there’s a print field. Print fields are denoted as “{}.” Within each print field,
the value of one of the args is printed.
If you want to print data objects and are not worried about the finer issues
5
of formatting, just use a pair of curly braces, {}, for each argument. Strings are
printed as strings, integers are printed as integers, and so on, for any type of
data. Here’s an example:
fss = '{} said, I want {} slices of {}.'
name = 'Pythagoras'
pi = 3.141592
print(fss.format(name, 2, pi))
This prints
Pythagoras said, I want 2 slices of 3.141592.
The arg values, of course, either can be constants or can be supplied by
variables (such as name and pi in this case).
Curly braces are special characters in this context. To print literal curly
braces, not interpreted as field delimiters, use {{ and }}. Here’s an example:
print('Set = {{{}, {}}}'.format(1, 2))
This prints
Set = {1, 2}
This example is a little hard to read, but the following may be clearer.
Remember that double open curly braces, {{, and double closed curly braces,
}}, cause a literal curly brace to be printed.
fss = 'Set = {{ {}, {}, {} }}'
print(fss.format(15, 35, 25))
This prints
Set = { 15, 35, 25 }
Of course, as long as you have room on a line, you can put everything
together:
print('Set = {{ {}, {}, {} }}'.format(15, 35, 25))
This prints the same output. Remember that each pair of braces defines
a print field and therefore causes an argument to be printed, but {{ and }}
cause printing of literal braces.
✱ A call to the format method must have at least as many arguments as the
format-specification string has print fields, unless fields are repeated as shown
at the end of this section. But if more arguments than print fields appear, the
excess arguments (the last ones given) are ignored.
5
These are zero-based indexes, so they are numbered 0, 1, and 2.
print('The items are {2}, {1}, {0}.'.format(10, 20, 30))
This statement prints
The items are 30, 20, 10.
You can also use zero-based index numbers to refer to excess arguments, in
which there are more arguments than print fields. Here’s an example:
fss = 'The items are {3}, {1}, {0}.'
print(fss.format(10, 20, 30, 40))
These statements print
The items are 40, 20, 10.
Note that referring to an out-of-range argument raises an error. In this
example there are four arguments, so they are indexed as 0, 1, 2, and 3. No
index number was an out-of-range reference in this case.
Print fields can also be matched to arguments according to argument
names. Here’s an example:
fss = 'a equals {a}, b equals{b}, c equals {c}.'
print(fss.format(a=10, c=100, b=50))
5
print(str(10)) # So does this!
But for some types of data, there is a separate repr conversion that is not
the same as str. The repr conversion translates a data object into its canon-
ical representation in source code—that is, how it would look inside a Python
program.
Here’s an example:
print(repr(10)) # This ALSO prints 10.
In this case, there’s no difference in what gets printed. But there is a dif-
ference with strings. Strings are stored in memory without quotation marks;
such marks are delimiters that usually appear only in source code. Furthermore,
escape sequences such as \n (a newline) are translated into special characters
when they are stored; again \n is a source-code representation, not the actual
storage.
Take the following string, test_str:
test_str = 'Here is a \n newline! '
Printing this string directly causes the following to be displayed:
Here is a
newline!
But applying repr to the string and then printing it produces a different
result, essentially saying, “Show the canonical source-code representation.”
This includes quotation marks, even though they are not part of the string
itself unless they’re embedded. But the repr function includes quotation
marks because they are part of what would appear in Python source code to
represent the string.
print(repr(test_str))
This statement prints
'Here is a \n newline.'
The %s and %r formatting specifiers, as well as the format method, enable
you to control which style of representation to use. Printing a string argument
without repr has the same effect as printing it directly. Here’s an example:
>>> print('{}'.format(test_str))
Here is a
newline!
Using the !r modifier causes a repr version of the argument to be used—
that is, the repr conversion is applied to the data.
>>> print('{!r}'.format(test_str))
'Here is a \n newline! '
The use of !r is orthogonal with regard to position ordering. Either may
be used without interfering with the other. So can you see what the following
example does?
>>> print('{1!r} loves {0!r}'.format('Joanie', 'ChaCha'))
'ChaCha' loves 'Joanie'
The formatting characters inside the curly braces do two things in this case.
First, they use position indexes to reverse “Joanie loves ChaCha”; then the !r
format causes the two names to be printed with quotation marks, part of the
canonical representation within Python code.
Note Ë Where !s or !r would normally appear, you can also use !a, which is
similar to !s but returns an ASCII-only string.
Ç Note
[[fill]align][sign][#][0][width][,][.prec][type]
The items here are mostly independent of each other. Python interprets
each item according to placement and context. For example, prec (precision)
appears right after a decimal point (.) if it appears at all.
When looking at the examples, remember that curly braces and colons are
used only when you use spec with the global format function and not the
format method. With the format function, you might include align, sign,
0, width, precision, and type specifiers, but no curly braces or colon.
Here’s an example:
s = format(32.3, '<+08.3f')
5
5.8.1 Print-Field Width
One of the commonly used items is print-field width, specified as an integer.
The text to be printed is displayed in a field of this size. If the text is shorter
than this width, it’s justified and extra spaces are padded with blank spaces by
default.
Placement: As you can see from the syntax display, the width item is in
the middle of the spec syntax. When used with the format method, width
always follows a colon (:), as does the rest of the spec syntax.
The following example shows how width specification works on two num-
bers: 777 and 999. The example uses asterisks (*) to help illustrate where the
print fields begin and end, but otherwise these asterisks are just literal charac-
ters thrown in for the sake of illustration.
n1, n2 = 777, 999
print('**{:10}**{:2}**'.format(n1, n2))
This prints
** 777**999**
The numeral 777 is right justified within a large print field (10). This
is because, by default, numeric data is right justified and string data is left
justified.
The numeral 999 exceeds its print-field size (2) in length, so it is simply
printed as is. No truncation is performed.
Width specification is frequently useful with tables. For example, suppose
you want to print a table of integers, but you want them to line up.
10
2001
2
55
144
2525
1984
It’s easy to print a table like this. Just use the format method with a print-
field width that’s wider than the longest number you expect. Because the data
is numeric, it’s right justified by default.
'{:5}'.format(n)
Print-field width is orthogonal with most of the other capabilities. The
“ChaCha loves Joanie” example from the previous section could be revised:
fss = '{1!r:10} loves {0!r:10}!!'
print(fss.format('Joanie', 'ChaCha'))
This prints
'ChaCha' loves 'Joanie' !!
The output here is similar output to the earlier “ChaCha and Joanie”
example but adds a print-field width of 10 for both arguments. Remember
that a width specification must appear to the right of the colon; otherwise it
would function as a position number.
[[fill]align]
Placement: these items, if they appear within a print-field specification,
precede all other parts of the syntax, including width. Here’s an example con-
taining fill, align, and width:
{:->24}
◗ The colon (:) is the first item to appear inside the print-field spec when you’re
working with the format method (but not the global format function).
◗ After the colon, a fill and an align character appear. The minus sign (-) is
the fill character here, and the alignment is right justification (>).
◗ After fill and align are specified, the print-field width of 24 is given.
Because the argument to be printed (' Hey Bill G, pick me!') is 20 char-
acters in length but the print-field width is 24 characters, four copies of the fill
character, a minus sign in this case, are used for padding.
5
The fill character can be any character other than a curly brace. Note that
if you want to pad a number with zeros, you can alternatively use the '0' speci-
fier described in Section 5.8.4, “The Leading Zero Character (0).”
The align character must be one of the four values listed in Table 5.3.
Note Ë Remember (and sorry if we’re getting a little redundant about this), all the
examples for the spec grammar apply to the global format function as well.
But the format function, as opposed to the format method, does not use curly
braces to create multiple print fields. It works on only one print field at a time.
Here’s an example:
print(format('Lady', '@<7')) # Print 'Lady@@@'
Ç Note
5
Notice how there’s an extra space in front of the first occurrence of 25,
even though it’s nonnegative; however, if the print fields had definite widths
assigned—which they do not in this case—that character would produce no
difference.
This next example applies the same formatting to three negative values (–25).
print('results>{: },{:+},{:-}'.format(-25, -25, -25))
This example prints the following output, illustrating that negative num-
bers are always printed with a minus sign.
results>-25,-25,-25
This prints
0000125 0000025156.
Here’s another example:
print('{:08}'.format(375)) # This prints 00000375
The same results could have been achieved by using fill and align char-
acters, but because you can’t specify fill without also explicitly specifying
align, that approach is slightly more verbose.
fss = '{:0>7} {:0>10}'
Although these two approaches—specifying 0 as fill character and specify-
ing a leading zero—are often identical in effect, there are situations in which
the two cause different results. A fill character is not part of the number itself
and is therefore not affected by the comma, described in the next section.
There’s also interaction with the plus/minus sign. If you try the following,
you’ll see a difference in the location where the plus sign (+) gets printed.
print('{:0>+10} {:+010}'.format(25, 25))
This example prints
0000000+25 +000000025
5
This example prints
The amount on the check was $***4,500,000
The print width of 12 includes room for the number that was printed,
including the commas (a total of nine characters); therefore, this example uses
three fill characters. The fill character in this case is an asterisk (*). The dollar
sign ($) is not part of this calculation because it is a literal character and is
printed as is.
If there is a leading-zero character as described in Section 5.8.4 (as opposed to
a 0 fill character), the zeros are also grouped with commas. Here’s an example:
print('The amount is {:011,}'.format(13000))
This example prints
The amount is 000,013,000
In this case, the leading zeros are grouped with commas, because all the
zeros are considered part of the number itself.
A print-field size of 12 (or any other multiple of 4), creates a conflict with
the comma, because an initial comma cannot be part of a valid number.
Therefore, Python adds an additional leading zero in that special case.
n = 13000
print('The amount is {:012,}'.format(n))
This prints
The amount is 0,000,013,000
But if 0 is specified as a fill character instead of as a leading zero, the zeros
are not considered part of the number and are not grouped with commas.
Note the placement of the 0 here relative to the right justify (>) sign. This time
it’s just to the left of this sign.
print('The amount is {:0>11,}'.format(n))
This prints
The amount is 0000013,000
.precision
Here are some simple examples in which precision is used to limit the total
number of digits printed.
pi = 3.14159265
phi = 1.618
5
22.100
1000.007
Notice how well things line up in this case. In this context (with the f type
specifier) the precision specifies not the total number of digits but the number
of digits just to the right of the decimal point—which are padded with trailing
zeros if needed.
The example can be combined with other features, such as the thousands
separator, which comes after the width but before precision. Therefore, in this
example, each comma comes right after 10, the width specifier.
fss = ' {:10,.3f}\n {:10,.3f}'
print(fss.format(22333.1, 1000.007))
This example prints
22,333.100
1,000.007
The fixed-point format f, in combination with width and precision, is
useful for creating tables in which the numbers line up. Here’s an example:
fss = ' {:10.2f}'
for x in [22.7, 3.1415, 555.5, 29, 1010.013]:
print(fss.format(x))
◗ The fill and align characters are * and <, respectively. The < symbol spec-
ifies left justification, so asterisks are used for padding on the right, if needed.
◗ The width character is 6, so any string shorter than 6 characters in length is
padded after being left justified.
◗ The precision (the character after the dot) is also 6, so any string longer
than 6 characters is truncated.
5
5.8.8 “Type” Specifiers
The last item in the spec syntax is the type specifier, which influences how
the data to be printed is interpreted. It’s limited to one character and has one
of the values listed in Table 5.5.
Placement: When the type specifier is used, it’s the very last item in the
spec syntax.
The next five sections illustrate specific uses of the type specifier.
5
5.8.11 Displaying Percentages
A common use of formatting is to turn a number into a percentage—for exam-
ple, displaying 0.5 as 50% and displaying 1.25 as 125%. You can perform that
task yourself, but the % type specifier automates the process.
The percent format character (%) multiplies the value by 100 and then
appends a percent sign. Here’s an example:
print('You own {:%} of the shares.'.format(.517))
This example prints
You own 51.700000% of the shares.
If a precision is used in combination with the % type specifier, the preci-
sion controls the number of digits to the right of the decimal point as usual—
but after first multiplying by 100. Here’s an example:
print('{:.2%} of {:.2%} of 40...'.format(0.231, 0.5))
This prints
23.10% of 50.00% of 40...
As with fixed-point format, if you want to print percentages so that they
line up nicely in a table, then specify both width and precision specifiers.
5
The way in which arguments are applied with this method is slightly differ-
ent from the way they work with the formatting operator (Section 5.3).
The difference is this: When you use the format method this way, the data
object comes first in the list of arguments; the expressions that alter format-
ting come immediately after. This is true even with multiple print fields. For
example:
>>> '{:{}} {:{}}!'.format('Hi', 3, 'there', 7)
'Hi there !'
Note that with this technology, strings are left justified by default.
The use of position numbers to clarify order is recommended. Use of these
numbers helps keep the meaning of the expressions clearer and more predictable.
The example just shown could well be revised so that it uses the following
expression:
>>> '{0:{1}} {2:{3}}!'.format('Hi', 3, 'there', 7)
'Hi there !'
The meaning of the format is easier to interpret with the position numbers.
By looking at the placement of the numbers in this example, you should be
able to see that position indexes 0 and 2 (corresponding to first and third argu-
ment positions, respectively) refer to the first and third arguments to format.
Chapter 5 Summary
The Python core language provides three techniques for formatting out-
put strings. One is to use the string-class formatting operator (%) on display
strings; these strings contain print-field specifiers similar to those used in the
C language, with “printf” functions.
The second technique involves the format function. This approach allows
you to specify not only things such as width and precision, but also thousands
place grouping and handling of percentages.
The third technique, the format method of the string class, builds on the
global format function but provides the most flexibility of all with multiple
print fields.
5
table that lines up floating-point numbers in a nice column?
7 What features of the format method do you need, at minimum, to print a
table that lines up floating-point numbers in a nice column?
8 Cite at least one example in which repr and str provide a different represen-
tation of a piece of data. Why does the repr version print more characters?
9 The format method enables you to specify a zero (0) as a fill character or as a
leading zero to numeric expressions. Is this entirely redundant syntax? Or can
you give at least one example in which the result might be different?
10 Of the three techniques—format operator (%), global format function, and
format method of the string class—which support the specification of
variable-length print fields?
2 Write a two-dimensional array program that does the following: Take integer
input in the form of five rows of five columns each. Then, by looking at the
maximum print width needed by the entire set (that is, the number of digits in
the biggest number), determine the ideal print width for every cell in the table.
This should be a uniform width, but one that contains the largest entry in the
table. Use variable-length print fields to print this table.
3 Do the same application just described but for floating-point numbers. The
printing of the table should output all the numbers in nice-looking columns.
Note Ë Regular expression syntax has a variety of flavors. The Python regular-
expression package conforms to the Perl standard, which is an advanced and
flexible version.
Ç Note
181
From the Library of Vineeth Babu
But what if you wanted to match a larger set of words? For example, let’s
say you wanted to match the following combination of letters:
✱ The asterisk (*) modifies the meaning of the expression immediately preced-
ing it, so the a, together with the *, matches zero or more “a” characters.
You can break this down syntactically, as shown in Figure 6.1. The literal
characters “c” and “t” each match a single character, but a* forms a unit that
says, “Match zero or more occurrences of ‘a’.”
ca*t
Match “c” exactly. Match “c” exactly.
The plus sign (+), introduced earlier, works in a similar way. The plus sign,
6
together with the character or group that precedes it, means “Match one or
more instances of this expression.”
r'string' or
r"string"
After prompting the user for input, the program then calls the match
function, which is qualified as re.match because it is imported from the re
package.
re.match(pattern, s)
If the pattern argument matches the target string (s in this case), the func-
tion returns a match object; otherwise it returns the value None, which con-
verts to the Boolean value False.
You can therefore use the value returned as if it were a Boolean value. If a
match is confirmed, True is returned; otherwise, False is returned.
Note Ë If you forget to include r (the raw-string indicator), this particular exam-
ple still works, but your code will be more reliable if you always use the r when
specifying regular-expression patterns. Python string interpretation does not
work precisely the way C/C++ string interpretation does. In those languages,
6
If you want to restrict positive results to exact matches—so that the entire
string has to match the pattern with nothing left over—you can add the spe-
cial character $, which means “end of string.” This character causes the match
to fail if any additional text is detected beyond the specified pattern.
pattern = r'\d\d\d-\d\d\d-\d\d\d\d$'
There are other ways you might want to refine the regular-expression pat-
tern. For example, you might want to permit input matching either of the fol-
lowing formats:
555-123-5000
555 123 5000
To accommodate both these patterns, you need to create a character set,
which allows for more than one possible value in a particular position. For
example, the following expression says to match either an “a” or a “b”, but not
both:
[ab]
It’s possible to put many characters in a character set. But only one of
the characters will be matched at a time. For example, the following range
matches exactly one character: an “a”, “b”, “c”, or “d” in the next position.
[abcd]
Likewise, the following expression says that either a space or a minus sign
(–) can be matched—which is what we want in this case:
[ -]
In this context, the square brackets are the only special characters; the two
characters inside are literal and at most one of them will be matched. The
minus sign often has a special meaning within square brackets, but not when
it appears in the very front or end of the characters inside the brackets.
Here’s the full regular expression we need:
pattern = r'\d\d\d[ -]\d\d\d[ -]\d\d\d\d$'
Now, putting everything together with the refined pattern we’ve come up
with in this section, here’s the complete example:
import re
pattern = r'\d\d\d[ -]\d\d\d[ -]\d\d\d\d$'
6
expression pattern. It’s a good idea to become familiar with all of them. These
include most punctuation characters, such as + and *.
◗ Any characters that do not have special meaning to the Python regular-
expression interpreter are considered literal characters. The regular-expression
interpreter attempts to match these exactly.
◗ The backslash can be used to “escape” special characters, making them into
literal characters. The backslash can also add special meaning to certain ordinary
characters—for example, causing \d to mean “any digit” rather than a “d”.
numbers. This pattern looks for three digits, a minus sign, two digits, another
minus sign, and then four digits.
import re
pattern = r'\d\d\d-\d\d-\d\d\d\d$'
1 2 3
c b
(Start) (Done!)
Figure 6.2. State machine for ca*b
The following list describes how the program traverses this state machine
to find a match at run time. Position 1 is the starting point.
◗ A character is read. If it’s a “c”, the machine goes to state 2. Reading any other
character causes failure.
◗ From state 2, either an “a” or a “b” can be read. If an “a” is read, the machine
stays in state 2. It can do this any number of times. If a “b” is read, the machine
transitions to state 3. Reading any other character causes failure.
◗ If the machine reaches state 3, it is finished, and success is reported.
This state machine illustrates some basic principles, simple though it is. In
particular, a state machine has to be compiled and then later traversed at run
6
time.
Note Ë The state machine diagrams in this chapter assume DFAs (determinis-
tic finite automata), whereas Python actually uses NFAs (nondeterministic
finite automata). This makes no difference to you unless you’re implementing
a regular-expression evaluator, something you’ll likely never need to do.
So if that’s the case, you can ignore the difference between DFAs and NFAs!
You’re welcome.
Ç Note
Here’s what you need to know: If you’re going to use the same regular-
expression pattern multiple times, it’s a good idea to compile that pattern into
a regular-expression object and then use that object repeatedly. The regex
package provides a method for this purpose called compile.
ntax
Key Sy
regex_object_name = re.compile(pattern)
Here’s a full example using the compile function to create a regular expres-
sion object called reg1.
import re
def test_item(s):
if re.match(reg1, s):
print(s, 'is a match.')
else:
print(s, 'is not a match!')
test_item('caab')
test_item('caaxxb')
1 2 3 4
c a b
(Start) (Done!)
Figure 6.3. State machine for ca+b
Given this pattern, “cb” is not a successful match, but “cab”, “caab”, and
“caaab” are. This state machine requires the reading of at least one “a”. After
that, matching further “a” characters is optional, but it can match as many
instances of “a” in a row as it finds.
2
a x
(Start) 1 4 (Done!)
y 3 z
Figure 6.4. State machine for (ax)|(yz)
6
order of evaluation. With these parentheses, the alteration operator is inter-
preted to mean “either x or y but not both.”
a(x|y)z
The parentheses and the | symbol are all special characters. Figure 6.5
illustrates the state machine that is compiled from the expression a(x|y)z.
x
1 2 3 4
a z
y
(Start) (Done!)
Figure 6.5. State machine for a(x|y)z
This behavior is the same as that for the following expression, which uses a
character set rather than alteration:
a[xy]z
Is there a difference between alteration and a character set? Yes: A charac-
ter set always matches one character of text (although it may be part of a more
complex pattern, of course). Alteration, in contrast, may involve groups lon-
ger than a single character. For example, the following pattern matches either
“cat” or “dog” in its entirety—but not “catdog”:
cat|dog
6
powerful as this language is, it can be broken down into a few major elements.
◗ Meta characters: These are tools for specifying either a specific character or
one of a number of characters, such as “any digit” or “any alphanumeric char-
acter.” Each of these characters matches one character at a time.
◗ Character sets: This part of the syntax also matches one character at a time—
in this case, giving a set of values from which to match.
◗ Expression quantifiers: These are operators that enable you to combine indi-
vidual characters, including wildcards, into patterns of expressions that can
be repeated any number of times.
◗ Groups: You can use parentheses to combine smaller expressions into larger
ones.
6
r'c[aeiou]t'
This matches any of the following:
cat
cet
cit
cot
cut
We can combine ranges with other operators, such as +, which retains its
usual meaning outside the square brackets. So consider
c[aeiou]+t
This matches any of the following, as well as many other possible strings:
cat
ciot
ciiaaet
caaauuuut
ceeit
Within a range, the minus sign (-) enables you to specify ranges of charac-
ters when the minus sign appears between two other characters in a character
range. Otherwise, it is treated as a literal character.
For example, the following range matches any character from lowercase
“a” to lowercase “n”:
[a-n]
This range therefore matches an “a”, “b”, “c”, up to an “l”, “m”, or “n”. If
the IGNORECASE flag is enabled, it also matches uppercase versions of these
letters.
The following matches any uppercase or lowercase letter, or digit. Unlike
“\w,” however, this character set does not match an underscore (_).
[A-Za-z0-9]
The following matches any hexadecimal digit: a digit from 0 to 9 or an
uppercase or lowercase letter in the range “A”, “B”, “C”, “D”, “E”, and “F”.
[A-Fa-f0-9]
Character sets observe some special rules.
◗ Almost all characters within square brackets ([ ]) lose their special meaning,
except where specifically mentioned here. Therefore, almost everything is
interpreted literally.
◗ A closing square bracket has special meaning, terminating the character set;
therefore, a closing bracket must be escaped with a backslash to be interpreted
literally: “\]”
◗ The minus sign (-) has special meaning unless it occurs at the very beginning
or end of the character set, in which case it is interpreted as a literal minus
sign. Likewise, a caret (^) has special meaning at the beginning of a range but
not elsewhere.
◗ The backslash (\), even in this context, must be escaped to be represented lit-
erally. Use “\\” to represent a backslash.
6
6.6.3 Pattern Quantifiers
All of the quantifiers in Table 6.3 are expression modifiers, and not expression
extenders. Section 6.6.4, discusses in detail what the implications of “greedy”
matching are.
The next-to-last quantifier listed in Table 6.3 is the use of parentheses for
creating groups. Grouping can dramatically affect the meaning of a pattern.
Putting items in parentheses also creates tagged groups for later reference.
The use of the numeric quantifiers from Table 6.3 makes some expres-
sions easier to render, or at least more compact. For example, consider the
phone-number verification pattern introduced earlier.
r'\d\d\d-\d\d\d-\d\d\d\d'
This can be revised as
r'\d{3}-\d{3}-\d{4}'
This example saves a few keystrokes of typing, but other cases might save
quite a bit more. Using these features also creates code that is more readable
and easier to maintain.
6
because of the parentheses; specifically, it’s the group “ab” that is repeated.
c(ab)+
Match “c” exactly.
Now let’s say you’re employed to write these tests. If you use regular expres-
sions, this job will be easy for you—a delicious piece of cake.
The following verification function performs the necessary tests. We can
implement the five rules by using four patterns and performing re.match
with each.
6
import re
pat1 = r'(\w|[@#$%^&*!]){8,}$'
pat2 = r'.*\d'
pat3 = r'.*[a-zA-Z]'
pat4 = r'.*[@#$%^$*]'
def verify_passwd(s):
b = (re.match(pat1, s) and re.match(pat2, s) and
re.match(pat3, s) and re.match(pat4, s))
return bool(b)
The verify_passwd function applies four different match criteria to a
target string, s. The re.match function is called with each of four different
patterns, pat1 through pat4. If all four matches succeed, the result is “true.”
The first pattern accepts any character that is a letter, character, or under-
score or a character in the range @#$%^&*! . . . and then it requires a match of
eight or more of such characters.
6
This example prints the following:
abbccc
a
bb
ccc
The group method, as you can see from this example, returns all or part of
the matched text as follows.
◗ group(0) returns the entire text that was matched by the regular expression.
◗ group(n), starting with 1, returns a matched group, in which a group is
delimited by parentheses. The first such group can be accessed as group(1),
the second as group(2), and so on.
import re
pat = r'(a+)(b+)(c+)'
m = re.match(pat, 'abbcccee')
for i in range(m.lastindex + 1):
print(i, '. ', m.group(i), sep='')
This example produces the following output:
0. abbccc
1. a
2. bb
3. ccc
In the code for this example, 1 had to be added to m.lastindex. That’s
because the range function produces an iterator beginning with 0, up to but
not including the argument value. In this case, the groups are numbered 1, 2,
3, so the range needs to extend to 3; and the way you do that is by adding 1 to
the end of the range.
Table 6.4 summarizes the attributes of a match object.
start(n) Returns the starting position, within the target string, of the group
referred to by n. Positions within a string are zero-based, but the
group numbering is 1-based, so start(1) returns the starting string
index of the first group. start(0) returns the starting string index of
all the matched text.
end(n) Similar to start(n), except that end(n) gets the ending position of
the identified group, relative to the entire target string. Within this
string, the text consists of all characters within the target string,
beginning with the “start” index, up to but not including the “end”
index. For example, start and end values of 0 and 3 means that the
first three characters were matched.
span(n) Returns the information provided by start(n) and end(n) but
returns it in tuple form.
lastindex The highest index number among the groups.
6
print('"', m.group(), '" found at ', m.span(), sep='')
In this case, the search string specifies a simple pattern: two or more digits.
This search pattern is easy to express in regular-expression syntax, using the
special characters introduced earlier in this chapter.
\d{2,}
The rest of the code uses the resulting match object, assigning that object
a variable name of m. Using the group and span methods of this object, as
described in Section 6.8, “Using the Match Object,” you can get information
about what was matched and where in the target string the match occurred.
The code in this example prints the following:
"23" found at (9, 11)
This successfully reports that the substring “23” was found by the search:
m.group() produced the substring that was matched, “23,” while m.span()
produced the starting and ending positions within the target string as the
tuple (9, 11).
Here, as elsewhere, the starting position is a zero-based index into the tar-
get string, so the value 9 means that the substring was found starting at the
tenth character. The substring occupies all positions up to but not including
the ending position, 11.
6
an underscore. Therefore, the pattern matches any word at least six characters
long.
Finally, let’s write a function useful for the Reverse Polish Notation cal-
culator introduced in Section 3.12. We’d like to break down input into a list
of strings, but we’d like operators (+, *, /, –) to be recognized separately from
numbers. In other words, suppose we have the input
12 15+3 100-*
We’d like 12, 15, 3, 100, and the three operators (+, –, and *) to each be rec-
ognized as separate substrings, or tokens. The space between “12” and “15”
is necessary, but extra spaces shouldn’t be required around the operators. An
easy solution is to use the re.findall function.
import re
s = '12 15+3 100-*'
print(re.findall(r'[+*/-]|\w+', s))
This example prints the following:
['12', '15', '+', '3', '100', '-', '*']
This is exactly what we wanted.
6
So the results, in this case, are wrong.
To get what was desired, use a two-part solution.
6
The re.search function reports the first successful match that was found.
Chapter 6 Summary
This chapter explored the basic capabilities of the Python regular-expression
package: how you can use it to validate the format of data input, how to search
for strings that match a specified pattern, how to break up input into tokens,
and how to use regular expressions to do sophisticated search-and-replace
operations.
Understanding the regular-expression syntax is a matter of understanding
ranges and wildcards—which can match one character at a time—and under-
standing quantifiers, which say that you can match zero, one, or any number
of repetitions of a group of characters. Combining these abilities enables you
to use regular expressions to express patterns of unlimited complexity.
In the next chapter, we’ll look at some more examples of regular-expression
use, as well as looking at non-greedy operators and the Scanner interface,
which builds on top of the Python regular-expression package.
6
Chapter 6 Review Questions
1 What are the minimum number of characters, and the maximum number of
characters, that can be matched by the expression “x*”?
2 Explain the difference in results between “(ab)c+” and “a(bc)+”. Which, if
either, is equivalent to the unqualified pattern “abc+”?
3 When using regular expressions, precisely how often do you need to use the
following statement?
import re
4 When you express a range using square brackets, exactly which characters
have special meaning, and under what circumstances?
5 What are the advantages of compiling a regular-expression object?
6 What are some ways of using the match object returned by functions such as
re.match and re.search?
7 What is the difference between using an alteration, which involves the vertical
bar (|), and using a character set, which involves square brackets?
8 Why is it important to use the raw-string indicator (r) in regular-expression
search patterns? In replacement strings?
9 Which characters, if any, have special meaning inside a replacement string?
215
From the Library of Vineeth Babu
In this table, name can be any nonconflicting name you choose, subject to
the standard rules for forming symbolic names.
7
But it does not return true for any of these:
1,00000
12,,1
0..5.7
To employ this regular-expression pattern successfully with re.findall,
so that you can find multiple numbers, two things need to be done.
First, the pattern needs to end with a word boundary (\b). Otherwise, it
matches two numerals stuck together, an outcome that, unfortunately, com-
promises one long number that is not valid.
1,20010
This number would be incorrectly accepted, because findall accepts
1,200 and then accepts 10, given the current pattern.
The solution is to use \b, the end-of-word meta character. To get a correct
match, the regular-expression evaluator must find an end-of-word transition:
This can be a space, a punctuation mark, end of line, or the end of the string.
There also remains the issue of tagged groups. The problem is that with the
following string (which now includes the word boundary), grouping is neces-
sary to express all the subpatterns.
r'\d{1,3}(,\d{3})*(\.\d*)?\b'
Let’s review what this means.
(?:expr)
This syntax treats expr as a single unit but does not tag the characters
when the pattern is matched.
Another way to look at this is to say, “To create a group without tagging it,
keep everything the same but insert the characters ?: right after the opening
parentheses.”
Here’s how this nontagging syntax works with the number-recognition
example:
pat = r'\d{1,3}(?:,\d{3})*(?:\.\d*)?\b'
In this example, the characters that need to be inserted are shown in bold
for the sake of illustration. Everything else in the regular-expression pattern
is the same.
7
7.3 Greedy Versus Non-Greedy Matching
One of the subtleties in regular-expression syntax is the issue of greedy versus
non-greedy matching. The second technique is also called “lazy.” (Oh what a
world, in which everyone is either greedy or lazy!)
The difference is illustrated by a simple example. Suppose we’re searching
or matching text in an HTML heading, and the regular-expression evaluator
reaches a line of text such as the following:
the_line = '<h1>This is an HTML heading.</h1>'
Suppose, also, that we want to match a string of text enclosed by two angle
brackets. Angle brackets are not special characters, so it should be easy to
construct a regular-expression search pattern. Here’s our first attempt.
pat = r'<.*>'
Now let’s place this into a complete example and see if it works.
import re
pat = r'<.*>'
the_line = '<h1>This is an HTML heading.</h1>'
m = re.match(pat, the_line)
print(m.group())
What we might expect to be printed is the text <h1>. Instead here’s what
gets printed:
<h1>This is an HTML heading.</h1>
As you can see, the regular-expression operation matched the entire line of
text! What happened? Why did the expression <.*> match the entire line of
text rather than only the first four characters?
The answer is that the asterisk (*) matches zero or more characters and
uses greedy rather than non-greedy (lazy) matching. Greedy matching says,
“Given more than one way of successfully matching text, I will match as much
text as I can.”
Take another look at the target string.
'<h1>This is an HTML heading.</h1>'
The first character in the search pattern is <, a literal character, and it
matches the first angle bracket in the target string. The rest of the expres-
sion then says, “Match any number of characters, after which match a closing
angle bracket (>).”
But there are two valid ways to do that.
◗ Match all the characters on the line up to the last character, and then match
the second and final closing angle bracket (>) (greedy).
◗ Match the two characters h1 and then the first closing angle bracket (>)
(non-greedy).
In this case, both approaches to matching are successful. When only one
match is possible, the regular-expression evaluator will either back up or con-
tinue until it finds a valid match. But when there is more than one matching
substring, greedy and non-greedy matching have different effects.
Figure 7.1 illustrates how greedy matching tags the entire line of text in this
example. It matches the first open angle bracket and doesn’t stop matching
characters until it reaches the last closing angle bracket.
Non-Greedy: <.*?>
7
ntax
Key Sy
7
# because of the first "?".
lst = re.findall(pat, s, flags=re.DOTALL)
print('There are', len(lst), 'sentences.')
This example prints
There are 3 sentences.
If greedy finding had been used instead but the rest of the code was kept the
same, the example would have reported that only 1 sentence was found.
The first question mark (?) in the regular-expression pattern indicated non-
greedy rather than greedy matching. In contrast, the question mark inside the
square brackets is interpreted as a literal character. As explained in Chapter 6,
almost all special characters lose their special meaning when placed in a char-
acter set, which has the form
[chars]
Note Ë The re.DOTALL flag causes the dot meta character (.) to recognize end-
of-line characters (\n,) rather than interpret them as the end of a string. To
make your code more concise, you can use the abbreviated version of the flag:
re.S.
Ç Note
(?=expr)
The regular-expression evaluator responds to the look-ahead pattern by
comparing expr to the characters that immediately follow the current posi-
tion. If expr matches those characters, there is a match. Otherwise, there is
no match.
The characters in expr are not tagged. Moreover, they are not consumed;
this means that they remain to be read again by the regular-expression evalua-
tor, as if “put back” into the string data.
Here are the criteria we need to correctly read a sentence from a longer
string of text.
First, begin reading characters by finding a capital letter.
Then read up to the next period, using non-greedy matching, provided that
either one of the following conditions is true.
7
Note Ë As you’ll see in the upcoming examples, looking ahead for an end of line
requires the re.MULTILINE flag to be correct in all cases.
Ç Note
import re
pat = r'[A-Z].*?[.!?](?= [A-Z]|$)'
m = re.findall(pat, s, flags=re.DOTALL | re.MULTILINE)
The variable m now contains a list of each sentence found. A convenient
way to print it is this way:
for i in m:
print('->', i)
This prints the following results:
-> See the U.S.A. today.
-> It's right here, not
a world away.
-> Average temp. is 66.5.
As we hoped, the result is that exactly three sentences are read, although
one has an embedded newline. (There are, of course, ways of getting rid of
that newline.) But other than that, the results are exactly what we hoped for.
Now, let’s review the flag settings re.DOTALL and re.MULTILINE. The
DOTALL flag says, “Match a newline as part of a ‘.’ expression, as in ‘.*’ or ‘.+’.”
The MULTILINE flag says “Enable $ to match a newline as well as an end-of-
string condition.” We set both flags so that a newline (\n) can match both
conditions. If the MULTILINE flag is not set, then the pattern will fail to read
complete sentences when a newline comes immediately after a period, as in
the following:
To be or not to be.
That is the question.
So says the Bard.
Without the MULTILINE flag being set, the look-ahead condition would fail
in this case. The look-ahead would mean, “Find a space followed by a capital
letter after the end of a sentence or match the end of the string.” The flag
enables the look-ahead to match an end of line as well as end of string.
What if the final condition for ending a sentence had not been written as a
look-ahead condition but rather as a normal regular-expression pattern? That
is, what if the pattern had been written this way:
r'[A-Z].*?[.!?] [A-Z]|$'
This is the same pattern, except that the final part of this is not written as a
look-ahead condition.
7
Look-ahead avoids consuming characters that you want to remain to be read.
The solution given in the previous chapter was to test each of these con-
ditions through four separate calls to re.match, passing a different pattern
each time.
While that approach is certainly workable, it’s possible to use look-ahead to
place multiple matching criteria in the same large pattern, which is more effi-
cient. Then re.match needs to be called only once. Let’s use the password-
selection problem to illustrate.
First, we create regular-expression patterns for each of the four criteria.
Then the patterns are glued together to create one long pattern.
pat1 = r'(\w|[!@#$%^&*+-]){8,12}$'
pat2 = r'(?=.*[a-zA-Z])' # Must include a letter.
pat3 = r'(?=.*\d)' # Must include a digit.
pat4 = r'(?=.*[!@#$%^&*+-])' # Must include punc. char.
Note Ë Remember, the minus sign (–) has special meaning when placed inside
square brackets, which create a character set, but not if this sign comes at the
beginning or end of the set. Therefore, this example refers to a literal minus sign.
Ç Note
The various patterns are joined together to create one large pattern. Now
we can test for password strength by a single call to re.match:
import re
passwd = 'HenryThe5!'
if re.match(pat, passwd):
print('It passed the test!')
else:
print('Insufficiently strong password.')
If you run this example, you’ll find that 'HenryThe5!' passes the test for
being a sufficiently strong password, because it contains letters, a digit, and a
punctuation mark (!).
(?!expr)
This negative look-ahead syntax says, “Permit a match only if the next
characters to be read are not matched by expr; but in any case, do not con-
sume the look-ahead characters but leave them to be read again by the next
match attempt.”
Here’s a simple example. The following pattern matches abc but only if not
followed by another instance of abc.
pat = r'abc(?!abc)'
If used with re.findall to search the following string, it will find exactly
one copy of abc:
s = 'The magic of abcabc.'
In this case, the second instance of abc will be found but not the first. Note
7
also that because this is a look-ahead operation, the second instance of abc is
not consumed, but remains to be read; otherwise, that instance would not be
found either.
Here’s the code that implements the example:
import re
pat = r'abc(?!abc)'
s = 'The magic of abcabc.'
m = re.findall(pat, s)
print(m)
Remember what this (admittedly strange) pattern says: “Match ‘abc’ but
only if it’s not immediately followed by another instance of ‘abc’.”
As expected, this example prints a group with just one instance of “abc,”
not two.
['abc']
7
If you remove newlines this way and run the example again, you’ll get this
output:
-> See the U.S.A. today.
-> It's right here, not a world away.
-> Average temp. is 70.5.
-> It's fun!
7
if matched against a name, will optionally recognize a middle initial but not
require it. So the following are all successfully matched:
Brian R. Overland
John R. Bennett
John Q. Public
Jane Austen
Mary Shelley
In every case, group(name) can be accessed, where name is 'first',
'mid', or 'last'. However, group('mid') in some cases—where there was
no match of that named group—will return the special value None. But that
can be tested for.
Therefore, we can write the following function to break down a name and
reformat it.
def reorg_name(in_s):
m = re.match(pat, in_s)
s = m.group('last') + ', ' + m.group('first')
if m.group('mid'):
s += ' ' + m.group('mid')
return s
By applying this function to each name entered, placing the result into a list,
and then sorting the list, we can store all the names in alphabetical last-name-
first format:
Austen, Jane
Bennett, John R.
Overland, Brian R.
Public, John Q.
Shelley, Mary
The use of named groups was helpful in this case, by giving us a way to
refer to a group—the middle initial and dot—that might not be matched at
all. In any case, being able to refer to the groups as “first,” “mid,” and “last”
makes the code clearer and easier to maintain.
As a final example in this section, you can use named groups to require
repeating of previously tagged sequences of characters. Chapter 6 showed
how you can use numbers to refer to the repetition of named groups.
pat = r'(\w+) \1'
The named-group version of this pattern is
pat = r'(?P<word>\w+) (?P=word)'
This pattern gets a positive match in the following function call:
m = re.search(pat, 'The the dog.', flags=re.I)
7
are numbers, but they could be any substrings that didn’t contain commas or
internal spaces.
Let’s apply this pattern to the RPN interpreter. You can use the re.split
function to split up text such as this:
s = '3 2 * 2 15 * + 4 +'
If you recall how RPN works, you’ll recognize that this is RPN for the
following:
(3 * 2) + (2 * 15) + 4
Let’s apply the regular-expression function to the target string, s:
toks = re.split(pat, s)
Printing toks, a list of tokens, produces
['3', '2', '*', '2', '15', '*', '+', '4', '+']
When the scanner is then run on a target string by calling scan, it returns a
series of objects as it was programmed to do. The beauty of this approach, as
you’ll see, is that you don’t have to worry about separators; you just look for
the tokens you want to find.
Here we summarize this part of the syntax. Unless you employ lambda
functions, this part of the syntax should appear after the functions are defined.
ntax
Key Sy
scanner_name = re.Scanner([
(tok_pattern1, funct1),
(tok_pattern2, funct2),
...
)]
In this syntax, each instance of tok_pattern is a regular expression
describing some kind of token to recognize. Each funct is a previously defined
callable or a lambda. If None is specified as the function, no action is taken for
the associated pattern; it is skipped over.
Before we show how to write the token-processing functions, here’s an
example written for the RPN project:
Note Ë In this example, it’s important that the floating-point pattern is listed
before the integer pattern. Otherwise, a floating-point number such as 11.502
will be read as an integer, 11, followed by a dot (.), followed by another integer.
Ç Note
Later, in Chapter 8, we’ll add variable names (also called identifiers or sym-
bols) to the RPN language. These are the variables within this RPN language.
scanner = re.Scanner ([
(r'[a-zA-Z]\w*', sc_ident),
(r'[*+/-]', sc_oper),
(r'\d+\.\d*', sc_float),
(r'\d+', sc_int),
(r'\s+', None)
])
Now, let’s look at how each of the functions is used.
7
ntax
Key Sy
function_name(scanner, tok_str)
The first argument, scanner, is a reference to the scanner object itself. You
aren’t required to do anything more with that argument, although it can be
used to pass in additional information.
The second argument, tok_str, is a reference to the substring containing
the token.
Here’s a full example that creates a scanner for a simple RPN interpreter.
import re
scanner = re.Scanner ([
(r'[*+/-]', sc_oper),
(r'\d+\.\d*', sc_float),
(r'\d+', sc_int),
(r'\s+', None)
])
With these definitions in place, we can now call the function scanner.scan.
That function returns a tuple with two outputs: the first is a list of all the
tokens returned by the functions; the second is a string containing text not
successfully scanned. Here are some examples:
print(scanner.scan('3 3+'))
This prints
([3, 3, '+'], '')
Notice that the numbers are returned as integers, whereas the operator, *,
is returned as a one-character string. Here’s a more complex example:
print(scanner.scan('32 6.67+ 10 5- *'))
This prints
([32, 6.67, '+', 10, 5, '-', '*'], '')
The scanner object, as you can see, returns a list of tokens, each having the
proper type. However, it does not yet evaluate an RPN string. We still have
a little work to do. Remember that the logic of evaluating RPN is as follows:
code
Pseudo
In the next section, we’ll show how to best implement this program logic
from within a Scanner object.
scanner = re.Scanner ([
(r'[*+/-]', sc_oper),
(r'\d+\.\d*', sc_float),
(r'\d+', sc_int),
(r'\s+', None)
])
To extend the RPN Interpreter application, we need to make each of the
three functions, sc_oper, sc_float, and sc_int, do its part. The final two
have to put numbers onto the stack. The sc_oper function, however, has to
do more: It has to call a function that pops the top two operands, performs the
operation, and pushes the result onto the stack.
Some of these functions can be made shorter by being written as lambda
functions. Lambdas, first introduced in Chapter 3, are anonymously named
functions created on the fly.
But the first line is going to require a more elaborate function that pops
operands and carries out the operation; the function of this lambda is to call
that more elaborate function, bin_op. So the code is now
scanner = re.Scanner ([
7
(r'[*+/-]', lambda s, t: bin_op(t)),
(r'\d+\.\d*', lambda s, t: the_stk.append(float(t))),
(r'\d+', lambda s, t: the_stk.append(int(t))),
(r'\s+', None)
])
def bin_op(tok):
op2, op1 = the_stk.pop(), the_stk.pop()
if tok == '+':
the_stk.append(op1 + op2)
elif tok == '*':
the_stk.append(op1 * op2)
import re
the_stk = [ ]
scanner = re.Scanner ([
(r'[*+/-]', lambda s, t: bin_op(t)),
(r'\d+\.\d*', lambda s, t: the_stk.append(float(t))),
(r'\d+', lambda s, t: the_stk.append(int(t))),
(r'\s+', None)
])
def bin_op(tok):
op2, op1 = the_stk.pop(), the_stk.pop()
if tok == '+':
the_stk.append(op1 + op2)
elif tok == '*':
the_stk.append(op1 * op2)
def main():
input_str = input('Enter RPN string: ')
tokens, unknown = scanner.scan(input_str)
if unknown:
print('Unrecognized input:', unknown)
else:
print('Answer is', the_stk.pop())
main()
Here is the sequence of actions.
◗ The main function calls scanner.scan, which finds as many tokens (opera-
tors or numbers or both) as it can.
◗ Each time the Scanner object finds such a token, it calls the appropriate func-
tion: bin_op or the append method of the_stk (which is actually a list).
We can revise this code so that it is a little more concise and clear, by pass-
ing operations rather than carrying out each separately.
To understand what’s going on in this version, it’s important to remember
that in Python, functions are first-class objects—that is, they are objects just
7
like any other. They can therefore be passed directly as arguments.
We can take advantage of that fact by using function objects (callables)
already defined for us in the operator package. To use these, we need to
import the operator package itself.
import operator
We can then refer to callables that define addition, subtraction, and so
on, for two binary operands. The operands are not part of the argument list,
which contains only a single callable. Instead, the operands will be provided
by the bin_op function—by popping values off the stack.
operator.add
operator.sub
operator.mul
operator.truediv
import re
import operator
the_stk = [ ]
scanner = re.Scanner ([
(r'[+]', lambda s, t: bin_op(operator.add)),
(r'[*]', lambda s, t: bin_op(operator.mul)),
(r'[-]', lambda s, t: bin_op(operator.sub)),
(r'[/]', lambda s, t: bin_op(operator.truediv)),
(r'\d+\.\d*', lambda s, t: the_stk.append(float(t))),
(r'\d+', lambda s, t: the_stk.append(int(t))),
(r'\s+', None)
])
def bin_op(oper):
op2, op1 = the_stk.pop(), the_stk.pop()
the_stk.append(oper(op1, op2))
def main():
input_str = input('Enter RPN string: ')
tokens, unknown = scanner.scan(input_str)
if unknown:
print('Unrecognized input:', unknown)
else:
print('Answer is', the_stk.pop())
main()
This last set of changes, you should be able to see, reduces the amount of
code by several lines.
Let’s review. By using this approach, adopting a Scanner object, what has
been gained?
We could have just used the regular expression function, re.findall, to
split up a line of input into tokens and then processed the tokens as part of a
list, one at a time, examining the token and deciding what function to call.
Chapter 7 Summary
In this chapter, we’ve seen many uses for the advanced features of the Python
regular-expression capability.
Two of the more useful features are nontagging groups and the look-ahead
capability. Nontagging groups are useful when you want to form a grammat-
ical unit (a group) but don’t want to store the characters for later use. It turns
out that the re.findall function is much easier to use, in some cases, if you
don’t tag the group. A nontagged group has this syntax:
(?:expr)
The regular-expression look-ahead feature is useful in many situations.
It provides a way to look at upcoming characters, match them or fail to
match them, but not consume any of them. This simply means that the next
regular-expression match attempt (after the look-ahead is completed) will
start from the current position. The look-ahead characters are put back into
the string to be read again.
This feature is so powerful that it enables you to use matching to check
for multiple conditions using a single call to re.match or other matching
function.
The look-ahead feature has the following syntax:
7
(?=expr)
Finally, this chapter introduced the Scanner class. Use of this feature gives
you maximum flexibility in reading tokens from a file or input string, trans-
forming each one into the desired type of data.
In Chapter 8, “Text and Binary Files,” we’ll reuse much of this grammar in
the context of the ongoing RPN interpreter project.
X0 FF 17 23 I walk the
2E 4A 9B 02 journey of
78 62 5E 44 1,000 miles.
245
From the Library of Vineeth Babu
Performance If a data file has a large amount of data and if it’s all numeric, then pro-
Tip grams that use binary format to deal with it (as opposed to text format,
the default) can frequently run several times faster. That’s because they spend
no time on costly numeric-to-text or text-to numeric conversions.
Ç Performance Tip
1 , 0 0 0 (sp) - 1 0 (sp)
1000 -10
8
Python download.
◗ Reading and writing bytes directly by encoding them into bytes strings.
◗ Using the struct package to standardize both number and string storage so
that it can be consistently read and written.
◗ Using the pickle package to read and write items as high-level Python
objects. (Try to say “Python pickle package” ten times fast.)
◗ Using the shelve package to treat the whole data file as one big data dictio-
nary made up of Python objects.
You can read and write bytes directly, by using bytes strings contain-
ing embedded hex codes. This is analogous to doing machine-language
programming.
Alternatively, you can use the struct package for converting common
Python built-in types (integers, floating-point, and strings) into “C” types,
placing them into strings, and writing them. This technique—unlike writing
raw bytes—handles difficulties such as packing Python variables into data
fields of specific sizes. In this way, when they are read back, the right number
of bytes are read. This approach is useful when you’re interacting with exist-
ing binary files.
When you create new binary files, to be read by other Python programs,
you can use the pickle package to “pickle” Python objects. Then you let the
package’s routines worry about how precisely to represent the object when it’s
stored in a file.
Finally, you can use the shelve package, which is built on top of pickling
and is even higher level. The shelving operation pickles data but treats an
entire file as one big dictionary. The location of any desired object, according
to its key, is looked up, and the object is found quickly through random access.
◗ Functions that start, end, or repeat processes: These include spawn, kill,
abort, and fork. The fork function spawns a new process based on an exist-
ing one.
◗ Functions that make changes to, or navigate through, the file/directory sys-
tem: These include rename, removedirs, chroot, getwcd (get current work-
ing directory), and rmdir (remove directory). Also included are listdir,
makedir, and mkdir.
The os and os.path packages can effectively check for the existence of
a file before you try to open it, as well as giving you the ability to delete files
from the disk. You might want to use that one with care.
The following IDLE session checks the working directory, switches to the
Documents subdirectory, and checks the current working directory again.
Then it checks for the existence of a file named pythag.py, confirming that it
exists. The session finally removes this file and confirms that the file has been
removed.
>>> import os
>>> os.getcwd()
'/Users/brianoverland'
>>> os.chdir('Documents')
>>> os.path.isfile('pythag.py')
True
>>> os.remove('pythag.py')
>>> os.path.isfile('pythag.py')
False
Checking for the existence of a file by calling the os.path.isfile func-
tion is often a good idea. Another useful function is os.listdir, which
returns a list of all the names of files in the current directory (by default) or of
a specified directory.
8
os.listdir()
One of the most common exceptions is raised by the attempt to open a non-
existent file for reading. That can easily happen because the user might mis-
type a character. The result is that the FileNotFoundError exception gets
raised.
ntax
Key Sy
try:
statement_block_1
except exception_class:
statement_block_2
If, during execution of statement_block_1, an exception is raised, that
exception causes the program to terminate abruptly unless the except clause
catches the exception by specifying a matching exception_class. If you
want the program to look for more than one type of exception, you can do so
by using multiple except clauses.
try:
statement_block_1
except exception_class_A:
statement_block_A
[ except exception_class_B:
statement_block_B ]...
In this case, the brackets are not intended literally but indicate optional items.
The ellipses (. . .) indicate that there may be any number of such optional
clauses.
There are also two more optional clauses: else and finally. You can use
either one or both.
ntax
Key Sy
try:
statement_block_1
except exception_class_A:
statement_block_A
[ except exception_class_B:
statement_block_B ]...
[ else:
statement_block_2 ]
[ finally:
statement_block_3 ]
The optional else clause is executed if the first statement block completes
execution with no exceptions. The finally clause, if present, is executed
after all the other blocks are, unconditionally.
try:
f = open(fname) # Attempt file open here.
except FileNotFoundError:
print('File could not be found. Re-enter.')
else:
print(f.read())
f.close()
break
Note that the pickle functions require you to import the pickle package
before using of its features.
import pickle
There are three methods available for reading from a file; all of them can be
used with text files.
ntax
Key Sy
str = file.read(size=-1)
str = file.readline(size=-1)
list = file.readlines()
The read method reads in the entire contents of the file and returns it as a
single string. That string can then be printed directly to the screen if desired.
If there are newlines, they are embedded into this string.
A size can be specified as the maximum number of characters to read.
The default value of –1 causes the method to read the entire file.
The readline method reads up to the first newline or until the size, if
specified, has been reached. The newline itself, if read, is returned as part of
the string.
Finally, the readlines method reads in all the lines of text in a file and
returns them as a list of strings. As with readline, each string read contains
a trailing newline, if present. (All strings would therefore have a newline,
except maybe the last.)
There are two methods that can be used to write to text files.
ntax
Key Sy
file.write(str)
file.writelines(str | list_of_str)
The write and writelines methods do not automatically append new-
lines, so if you want to write the text into the files as a series of separate lines,
you need to append those newlines yourself.
The difference between the two methods is that the write method returns
the number of characters or bytes written. The writelines method takes
two kinds of arguments: You can pass either a single string or a list of strings. 8
A simple example illustrates the interaction between file reading and writing.
with open('file.txt', 'w') as f:
f.write('To be or not to be\n')
f.write('That is the question.\n')
f.write('Whether tis nobler in the mind\n')
f.write('To suffer the slings and arrows\n')
This example writes out a series of strings as separate lines and then prints
the contents directly, including the newlines.
To be or not to be
That is the question.
Whether tis nobler in the mind
To suffer the slings and arrows
Reading this same file with either readline or readlines—each of which
recognizes newlines as separators—likewise reads in the newlines at the end
of each string. Here’s an example that reads in one line at a time and prints it.
with open('file.txt', 'r') as f:
s = ' ' # Set to a blank space initially
while s:
s = f.readline()
print(s)
The readline method returns the next line in the file, in which a “line”
is defined as the text up to and including the next newline or end of file. It
returns an empty string only if the end-of-file condition (EOF) has already
been reached. But the print function automatically prints an extra newline
unless you use the end argument to suppress that behavior. The output in this
case is
To be or not to be
file.seek(pos, orig)
file.seekable()
file.tell()
The seekable method is included in case you need to check on whether the
file system or device supports random access operations. Most files do. Trying
to use seek or tell without there being support for random access causes an
exception to be raised.
The seek method is sometimes useful even in programs that don’t use ran-
dom access. When you read a file, in either text or binary mode, reading starts
at the beginning and goes sequentially forward.
What if you want to read a file again, from the beginning? Usually you
8
won’t need to do that, but we’ve found it useful in testing, in which you want
to rerun the file-read operations. You can always use seek to return to the
beginning.
file_obj.seek(0, 0) # Go back to beginning of the file.
This statement assumes that file_obj is a successfully opened file object.
The first argument is an offset. The second argument specifies the origin
value 0, which indicates the beginning of the file. Therefore, the effect of this
statement is to reset the file pointer to the beginning of the file.
The possible values for offset are 0, 1, and 2, indicating the beginning,
current position, and end of the file.
Moving the file pointer also affects writing operations, which could cause
you to write over data you’ve already written. If you move to the end of the
file, any writing operations effectively append data.
Otherwise, random access is often most useful in binary files that have a
series of fixed-length records. In that case, you can directly access a record by
using its zero-based index and multiplying by the record size:
file_obj.seek(rec_size * rec_num, 0)
The tell method is the converse of seek. It returns an offset number that
tells the number of bytes from the beginning of the file. A value of 0 indicates
that the beginning of the file is in fact your current position.
file_pointer = file_obj.tell()
def bin_op(action):
op2, op1 = stack.pop(), stack.pop()
stack.append(action(op1, op2))
def main():
while True:
input_str = input('Enter RPN line: ')
if not input_str:
break
try:
tokens, unknown = scanner.scan(input_str)
if unknown:
print('Unrecognized input:', unknown)
else:
print(str(stack[-1]))
except IndexError:
print('Stack underflow.')
8
main()
Here is a sample session:
Enter RPN line: 25 4 *
100.0
Enter RPN line: 25 4 * 50.75-
49.25
Enter RPN line: 3 3* 4 4* + .5^
5.0
Enter RPN line:
scanner = re.Scanner([
(r"[ \t\n]", lambda s, t: None),
(r"-?(\d*\.)?\d+", lambda s, t:
stack.append(float(t))),
(r"\d+", lambda s, t: stack.append(int(t))),
(r"[+]", lambda s, t: bin_op(operator.add)),
(r"[-]", lambda s, t: bin_op(operator.sub)),
(r"[*]", lambda s, t: bin_op(operator.mul)),
(r"[/]", lambda s, t: bin_op(operator.truediv)),
(r"[\^]", lambda s, t: bin_op(operator.pow)),
])
def bin_op(action):
op2, op1 = stack.pop(), stack.pop()
stack.append(action(op1, op2))
def main():
a_list = open_rpn_file()
if not a_list:
print('Bye!')
return
def open_rpn_file():
'''Open-source-file function. Open a named
file and read lines into a list, which is
returned.
'''
while True:
try:
fname = input('Enter RPN source: ')
f = open(fname, 'r')
if not f: 8
return None
else:
break
except:
print('File not found. Re-enter.')
a_list = f.readlines()
return a_list
main()
Let’s further assume that there is a file in the same directory that is named
rpn.txt, which has the following contents:
3 3 * 4 4 * + .5 ^
1 1 * 1 1 * + .5 ^
Given this file and the new version of the RPN Interpreter program, here is
a sample session.
Enter RPN source: rppn.txt
File not found. Re-enter.
Enter RPN source: rpn.txt
5.0
1.4142135623730951
The program behaved exactly as designed. When a file RPN file name was
entered (rpn.txt), the program evaluated each of the lines as appropriate.
Notice that the first line of rpn.txt was left intentionally blank, as a test.
The program simply skipped over it, as designed.
The basic action of this version of the program is to open a text file, which
ideally contains syntactically correct statements in the RPN language. When
it manages to open a valid text file, the open_rpn_file function returns a list
of text lines. The main function then evaluates each member of this list, one
at a time.
But we’re just getting started. The next step is to expand the grammar of
the RPN language so that it enables values to be assigned to variables, just as
Python itself does.
symbol expression =
Note Ë In the previous example, if op1 does not refer to a variable name, then it
represents a syntax error.
Ç Note
sym_tab = { }
scanner = re.Scanner([
(r"[ \t\n]", lambda s, t: None),
(r"[+-]*(\d*\.)?\d+", lambda s, t:
stack.append(float(t))),
(r"[a-zA-Z_][a-zA-Z_0-9]*", lambda s, t:
stack.append(t)),
(r"\d+", lambda s, t: stack.append(int(t))),
(r"[+]", lambda s, t: bin_op(operator.add)),
(r"[-]", lambda s, t: bin_op(operator.sub)),
(r"[*]", lambda s, t: bin_op(operator.mul)),
(r"[/]", lambda s, t: bin_op(operator.truediv)),
(r"[\^]", lambda s, t: bin_op(operator.pow)),
(r"[=]", lambda s, t: assign_op()),
])
def assign_op():
'''Assignment Operator function: Pop off a name
and a value, and make a symbol-table entry. Remember
to look up op2 in the symbol table if it is a string.
'''
op2, op1 = stack.pop(), stack.pop()
if type(op2) == str: # Source may be another var!
op2 = sym_tab[op2]
sym_tab[op1] = op2
8
def bin_op(action):
'''Binary Operation evaluator: If an operand is
a variable name, look it up in the symbol table
and replace with the corresponding value, before
being evaluated.
'''
op2, op1 = stack.pop(), stack.pop()
if type(op1) == str:
op1 = sym_tab[op1]
if type(op2) == str:
op2 = sym_tab[op2]
stack.append(action(op1, op2))
def main():
a_list = open_rpn_file()
if not a_list:
print('Bye!')
return
def open_rpn_file():
'''Open-source-file function. Open a named
file and read lines into a list, which is
returned.
'''
while True:
try:
fname = input('Enter RPN source: ')
if not fname:
return None
f = open(fname, 'r')
break
except:
print('File not found. Re-enter.')
a_list = f.readlines()
return a_list
main()
8
evaluated, and the result is placed back on the stack. An exception is assign-
ment (=), which doesn’t place anything on the stack (although arguably, maybe
it should).
And there’s a new twist: If a variable name is popped off the stack, it’s
looked up in the symbol table, and the operand is replaced by the variable’s
value before being used as part of an operation.
Note Ë If you look through the code in this application, you may notice that the
symbol-look-up code is repetitive and could be replaced by a function call.
The function would have to be written in a sufficiently general way that it
would accommodate any operand, but that shouldn’t be hard. This approach
would only save a line here and there, but it’s a reasonable use of code refac-
toring, which gathers similar operations and replaces them with a common
function call. For example, right now the code uses
if type(op1) == str:
op1 = sym_tab[op1]
This could be replaced by a common function call, as follows:
op1 = symbol_look_up(op1)
Of course, you would need to define the function.
Ç Note
byte_str = file.read(size=-1)
file.write(byte_str)
A byte_str is a string having the special type bytes. In Python 3.0, it’s
necessary to use this type while doing low-level I/O in binary mode. This is a
string guaranteed to be treated as a series of individual bytes rather than char-
acter codes, which may or may not be more than one byte long.
To code a byte string, use the b prefix before the opening quotation mark.
with open('my.dat', 'wb') as f:
f.write(b'\x01\x02\x03\x10')
The effect of this example is to write four bytes into the file my.dat—
specifically, the hexadecimal values 1, 2, 3, and 10, the last of which is equal to
16 decimal. Notice that this statement uses the “wb” format, a combination of
write and binary modes.
You can also write out these bytes as a list of byte values, each value rang-
ing between 0 and 255:
f.write(bytes([1, 2, 3, 0x10]))
8
in the previous section—that’s a nonportable and difficult way to do things.
The struct package is an aid in packing and unpacking familiar built-in
types into strings of bytes. It includes a number of function calls.
ntax
Key Sy
import struct
bytes_str = struct.pack(format_str, v1, v2, v3...)
v1, v2, v3... = struct.unpack(format_str, bytes_str)
struct.calcsize(format_str)
The struct.pack function takes a format string (see Table 8.2) and a
series of one or more values. It returns a bytes string that can be written to a
binary file.
Table 8.2 lists C-language data types in the second column. Many other
languages usually have a concept of short and long integers and short and long
floating-point numbers that correspond to these types. (Python integers, how-
ever, have to be “packed,” as shown in this section.)
Note Ë The integer prefix can be applied to fields other than strings. For exam-
ple, '3f' means the same as 'fff'.
Ç Note
To write to a binary file using the struct package, follow these steps.
The process of reading from a binary file using the struct package is
similar.
Because these techniques deal with the low-level placement of bytes, there
are some special considerations due to big endian versus little endian and pad-
8
ding. But first, the next few subsections deal with specific problems:
def read_num(fname):
with open(fname, 'rb') as f:
bss = f.read(calcsize('h'))
t = struct.unpack('h', bss)
return t[0]
With these definitions in place, you can read and write individual inte-
gers to files, assuming these integers fit into the short-integer (16-bit) format.
Larger values may need a bigger data format.
Here’s an example:
write_num('silly.dat', 125)
print(read_num('silly.dat')) # Write the number 125.
def read_floats(fname):
with open(fname, 'rb') as f:
bss = f.read(calcsize('fff'))
return unpack('fff', bss)
The second line reads only 13 characters, as there are only 13 to read. It
prints
I'm Henry the
def read_var_str(fname):
with open(fname, 'rb') as f:
bss = f.read(calcsize('h'))
n = unpack('h', bss)[0]
bss = f.read(n)
return bss.decode('utf-8')
The write_var_str function has to do some tricks. First, it creates a
string format specifier of the form hnums. In the next example, that format
specifier is h24s, meaning, “Write (and later read) an integer followed by a
string with 24 characters.”
The read_var_str function then reads in an integer—in this case, 24—
and uses that integer to determine exactly how many bytes to read in. Finally,
these bytes are decoded back into a standard Python text string.
Here’s a relevant example:
write_var_str('silly.dat', "I'm Henry the VIII I am!")
print(read_var_str('silly.dat'))
These statements print
I'm Henry the VIII I am!
def read_rec(fname):
with open(fname, 'rb') as f:
bss = f.read(calcsize('9s10sf'))
bname, baddr, rating = unpack(
'9s10sf', bss)
name = bname.decode('utf-8').rstrip('\x00')
addr = baddr.decode('utf-8').rstrip('\x00')
return name, addr, rating
Here’s a sample usage:
write_rec('goofy.dat', 'Cleo', 'Main St.', 5.0)
print(read_rec('goofy.dat'))
These statements produce the following tuple, as expected:
('Cleo', 'A Str.', 5.0)
Note Ë The pack function has the virtue of putting in internal padding as
needed, thereby making sure that data types align correctly. For example, 8
four-byte floating-point values need to start on an address that’s a multiple of
4. In the preceding example, the pack function adds extra null bytes so that
the floating-point value starts on a properly aligned address.
However, the limitation here is that even though using the pack function
aligns everything within a single record, it does not necessarily set up correct
writing and reading of the next record. If the last item written or read is a
string of nonaligned size, then it may be necessary to pad each record with
bytes. For example, consider the following record:
bss = pack('ff9s', 1.2, 3.14, 'I\'m Henry'.
encode('utf-8'))
Padding is a difficult issue, but depending on the system the code is run-
ning, occasionally you have to worry about it. The Python official specifica-
tion says that a write operation will be compatible with the alignment of the
last object written. Python will add extra bytes if needed.
So to align the end of a structure to the alignment requirement of a particu-
lar type (for example, floating point), you end the format string with the code
for that type; but the last object can, if you want, have a repeat count of 0. In
the following case, that means you need to write a “phantom” floating-point
value to guarantee alignment with the next floating-point type to be written.
bss = pack('ff9s0f', 1.2, 3.14,
'I\'m Henry'.encode('utf-8'))
Ç Note
◗ If you look closely at the byte arrangement, both this example and the previ-
ous code (if you look at the bytes string) reveal the use of little-endian byte
arrangement: Within an integer field, the least significant digits are placed
first. This happens on my system, because it is a Macintosh using a Motorola
processor. Each processor may use a different standard.
◗ Second, because the long integer (equal to 100, or hex value d) must start on
a 32-bit border, 2 bytes of padding are placed between the second argument
and the third. The note at the end of the previous section mentioned this issue.
One of the things that can go wrong is trying to read a data file when the
processor used to write the data used big-endian byte arrangement when your
system uses little-endian, and vice versa. Therefore, the struct functions
enable you to exercise some control by specifying big or little endian at the
beginning of the format string. Table 8.3 lists the low-level modes for han-
8
dling binary data.
For example, to pack two long integers into a string of bytes, specifically
using little-endian storage, use the following statement:
with open('junk.dat', 'wb') as f:
bstr = struct.pack('<hhl', 1, 2, 100)
datalen = f.write(bstr)
type: int
pickle.dump(n)
10
type: list
pickle.dump(a_list)
[2, 5, 17]
type: str
pickle.dump(a_str)
'Greetings!'
The beauty of this protocol is that when you read items back into your pro-
gram, you read them as full-fledged objects. To inquire the type of each object
read, you can use the type function or simply pass the object to the print
function.
8
This example prints
<class 'list'> [1, 2, 3]
<class 'str'> Hello!
<class 'float'> 2.3
Pickling is easy to use in part because—in contrast to reading simple
sequences of bytes—the effect is to load a Python object in all its glory. You
can do many things with the object, including taking its type and, if it’s a col-
lection, its length.
if type(a)==list:
print('The length of a is ', a)
The only real limitation to pickling is that when you open a file, you may
not know how many objects have been written. One solution is to load as
many objects as you can until the program raises an EOFError exception.
Here’s an example:
loaded = []
with open('goo.dat', 'rb') as f:
while True:
try:
item = pickle.load(f)
except EOFError:
print('Loaded', len(loaded), 'items.')
break
print(type(item), item)
loaded.append(item)
The beauty of this interface is that for very large data sets, it’s potentially
far more fast and efficient than ordinarily picking, or almost any other access
technique. The shelving interface will not, at least for large data sets, read in
the entire dictionary; rather, it will look at an index to determine the location
of a value, and then automatically seek to that location.
Note Ë By default, when you use a shelf to access, say, stuff['Brian'], what
you get is a copy, and not the original data. So, for example, if my_item is a
list, the following does not cause changes to the file:
d[key].append(my_item)
However, the following statements do cause changes:
data = d[key]
data.append(my_item)
d[key] = data
Ç Note
Chapter 8 Summary
Python supports flexible, easy techniques for reading and writing to both text
files and binary files. A binary file is a file that is not intended to express all
data as printable characters but instead is used to store numeric values directly.
Binary files have no universally recognized format. Determining a format,
and writing data out in that format, is an important issue in working with
binary. With Python, several high-level options are available.
The struct package enables you to read and write Python values by trans-
lating them into fixed-size, regular data fields. The pickle package enables
you to read and write fully realized Python objects to disk. Finally, the shelve
interface lets you treat the disk file as one large data dictionary, in which the
keys must be strings.
Python also supports interaction with the file systems through the os pack-
age, which includes the os.path subpackage. These packages provide func-
tions for finding and removing files, as well as reading the directory system.
From within IDLE, you can use help(os) and help(os.path) to learn about
the capabilities.
and performance rating (on a scale of 1 to 10). The “write” program should
prompt the user for any number of such records until the user indicates that
they want to quit. The “read” program should read all the records into a list.
5 Write the same read and write programs, but this time use the pickle
interface.
class class_name:
statements
The statements consist of one or more statements, indented. You can’t
write zero statements, but you can use the pass keyword as a no-op; this is
useful as a placeholder, when you want to define what the class does later.
For example, we could define a Car class this way:
class Car:
pass
We could also define a Dog and a Cat class:
class Dog:
pass
class Cat:
pass
285
From the Library of Vineeth Babu
So what do you do with a class in Python? Simple. You can create any num-
ber of instances of that class, also called instances. The following statements
create several instances of Car:
car1 = Car()
car2 = Car()
car3 = Car()
Or you can create instances of the Dog class:
my_dog = Dog()
yr_dog = Dog()
So far, none of these instances does anything. But that’s about to change.
The first thing we can do with a class is to create variables for the class as a
whole. These become class variables, and they are shared by all its instances.
For example, suppose Car was defined this way:
class Car:
accel = 3.0
mpg = 25
Now, printing any instance of Car will produce these values.
print('car1.accel = ', car1.accel)
print('car2.accel = ', car2.accel)
print('car1.mpg = ', car1.mpg)
print('car2.mpg = ', car2.mpg)
These statements print
car1.accel = 3.0
car2.accel = 3.0
car1.mpg = 25
car2.mpg = 25
But here’s the twist: Any one of the instances of Car can be given its own
value for the variable, accel. Doing this overrides the value of the class vari-
able, which has the value 3.0. We can create an instance, my_car, and assign a
value for accel.
my_car = Car()
yr_car = Car()
my_car.accel = 5.0
Figure 9.1 illustrates this relationship. In the my_car object, accel has
become an instance variable; in yr_car,
Car class
accel = 3.0
accel = 5.0
object.var_name = value
For example, we can create a class named Dog, create an instance, and then
give that instance several attributes—that is, instance variables.
class Dog:
pass
Three data variables are now attached to the object named my_dog. They
are name, breed, and age, and they can all be accessed as such.
print('Breed and age are {} and {}.', my_dog.breed,
my_dog.age)
This statement prints
class class_name:
def _ _init_ _(self, args):
statements
The word self is not a keyword but rather the name of the first argument,
which is a reference to the individual object. This argument could be any legal
name, but it’s a universal convention to use self.
The args are arguments—separated by commas if there is more than
one—passed to the object when it’s first created. For example, we could revise
the Dog class definition to include an _ _init_ _ method.
class Dog:
def _ _init_ _(self, name, breed, age):
self.name = name
self.breed = breed
self.age = age
Now when an object of class Dog is created, it must be given three argu-
ments, which are then passed to the _ _init_ _ method. Here’s an example:
top_dog = Dog('Handsome Dan', 'Bulldog', 10)
class class_name:
def _ _init_ _(self, val1, val2, ...):
self.instance_var1 = val1
self.instanct_var2 = val2
...
Python actually uses the _ _new_ _ method to create objects, but most of the
time, you’ll want to stick to writing and implementing the _ _init_ _ method
to perform initialization. There are two major exceptions.
◗ When you want to use some special technique for allocating memory. That’s
an advanced technique not covered in this book, but there are usually few peo-
ple who need to use it.
◗ When you attempt to subclass an immutable or built-in class. This is a more
common problem, and it’s handled in the next chapter, in Section 10.12,
“Money and Inheritance.”
classes. The issue is that a class must be defined before it’s instantiated. To
instantiate a class means to use it to create an object.
Here’s a situation that poses a problem in forward reference to a class.
class Marriage:
def _ _init_ _(self):
self.wife = Person('f')
self.husband = Person('m')
class Person:
def _ _init_ _(self, gender):
self.gender = gender
This silly program fails, but not because it’s silly. You should be able to see
the problem: The first few lines are executed, causing the class Marriage to
be defined and therefore come into existence as a class; but the sixth line then
instantiates the class by trying to create an actual object called a_marriage.
That, in turn, shouldn’t be a problem. But when the object comes into exis-
tence and its _ _init_ _ method is called, that method tries to create a couple
of objects called wife and husband, objects of the Person class. And that’s
the problem. The Person class has not yet been defined and cannot be used to
create new objects.
The solution is clear: Just move the sixth line to the end of the file. In that
way, both classes are defined before either is instantiated.
a_marriage = Marriage()
In general, forward reference to classes are not a problem if you follow a
few rules.
◗ Make sure that all classes are defined before any of them are instantiated.
That’s the main rule.
◗ Show extreme caution about classes that instantiate each other or (God forbid)
a class that creates an instance of itself. Although there’s a trick that enables
you to pull that off, it’s an area in which you really ought not to venture. Fools
rush in where wise men fear to tread.
◗ However, classes containing other classes (in one direction), or those contain-
ing references to instances of classes, are generally not a problem. Beware of
mutual dependencies, however.
class class_name:
def method_name(self, arg1, arg2, arg3...):
statements
obj_name = class_name()
obj_name.method_name(arg1, arg2, arg3...)
Note that the definition of a method—but not the call to that method—
includes the hidden first argument, self, so the definition has one more argu-
ment than the function call.
For example, consider the following class definition.
class Pretty:
Note, also, that within a method, the instance itself is always identified as
self, and the instance variables are identified as self.name.
def pr(self):
print('_ _z = ', self._ _z)
Given this class definition, the following statements are perfectly valid and
do exactly what you’d expect.
o = Odd()
o.x # 10
o.y # 20
But the following expression causes an exception to be raised:
o._ _z # Error!
This last expression raises an error because Python replaces _ _z with a
mangled name, generated from a combination of the class name and the vari-
able name.
But _ _z is still accessible, without mangling, within method definitions
of the same class, and that is why the pr method still works. Variable and
method names are always accessible within the same class. But remember that
in Python, such intraclass references need to be qualified with self.
9.7 Inheritance
Python provides support for inheritance, also known as “subclassing.” Sup-
pose you have a class, Mammal, that contains most of the methods you need
to use in your program. However, you need to add or change a few of these
methods. For example, you might want to create a Dog class whose instances
can do anything a Mammal instance can do, plus more things.
The syntax for single inheritance with one base class is shown first.
ntax
Key Sy
class class_name(base_class):
statements
The effect is to create a new class, class_name, which inherits all the class
variables and methods belonging to base_class. The statements can add
new variables and method definitions, as well as override existing definitions.
Every variable and method name in Python is polymorphic. Names are not
resolved until run time. Consequently, you can call any method of any object,
and it will be correctly resolved.
For example, the following class hierarchy involves a base class, Mammal,
and two subclasses, Dog and Cat. The subclasses inherit the _ _init_ _ and
call_out methods from Mammal, but each implements its own version of
the speak method.
class Mammal:
def _ _init_ _(self, name, size):
self.name = name
self.size = size
def speak(self):
print('My name is', name)
def call_out(self):
self.speak()
self.speak()
self.speak()
9
class Dog(Mammal):
def speak(self):
print('ARF!!')
class Cat(Mammal):
def speak(self):
print('Purrrrrrr!!!!')
◗ The _ _init_ _ and _ _new_ _ methods, which are automatically called to ini-
tialize and create an object. These were covered in Section 9.3.
◗ Object representation methods, including _ _format_ _, _ _str_ _, and
_ _repr_ _. These are covered in Sections 9.10.1 and 9.10.2.
◗ The format function attempts to call the _ _format_ _ method for an object,
and it passes an optional format specifier. Implementing this method enables
a class to return a formatted string representation. The default action is to call
the _ _str_ _ method.
◗ The print function calls an object’s _ _str_ _ method to print the object;
that is, it calls the _ _str_ _ method for that object’s class. If _ _str_ _ is not
defined, the class’s _ _repr_ _ method is called by default.
◗ The _ _repr_ _ method returns a string containing the canonical expression
of an object as it’s represented in Python code. This method often does the
9
same thing as _ _str_ _, but not always. This method is called directly by
IDLE, or when r or !r is used.
◗ Finally, the _ _repr_ _ method of the object class—which is the ultimate
base class—may be called as the final default action. This method prints a
simple statement of the object’s class.
str.format method
format function _ _format_ _ method
defaults to
defaults to
defaults to
_ _object._ _rpr_ _
The following example shows how a theoretical Point class could be written
to support both the _ _str_ _ and _ _repr_ _ methods, as well as _ _init_ _.
For the sake of illustration, the _ _str_ _ and _ _repr_ _ method return a
string in a slightly different format.
class Point:
big_prime_1 = 1200556037
big_prime_2 = 2444555677
self.x = x
self.y = y
◗ To make objects of your class sortable with regard to other objects, define
a less than (<) operation. For example, to put objects of your class into a
Now let’s look at how the rules of symmetry work in Python comparisons
and how this enables us to create classes for which < is defined in both direc-
tions, thereby making the objects mutually sortable with any kind of object.
For example, _ _add_ _ for a particular class is invoked when one of its
objects is added to another object. This assumes, however, that the object is
the left operand; if the object is the right operand, then its reflection method
may be called—in this case, _ _radd_ _.
The following example utilizes the Fraction class from the fractions
package. But if that package were not supported by Python, you could write
such a class yourself.
import fractions
f = fractions.Fraction(1, 2)
print(f + 1) # Calls Fraction._ _add_ _
print(2 + f) # Calls Fraction._ _radd_ _
As mentioned earlier, the _ _add_ _ method could be supported by any
class that recognizes the addition operator (+), even if it’s used for an opera-
tion such as string concatenation.
A hypothetical Point class can provide many good examples of how to
implement these magic methods for arithmetic operators.
class Point:
def _ _init_ _(self, x, y):
self.x = x
self.y = y
Class(args)
This syntax creates a new instance of Class, initializing it with the spec-
ified args, which in turn are passed to the class’s _ _init_ _ method, if
defined.
Note Ë If there’s any chance that an instance of the class might be combined in
an operation with another type, and if that type does have code supporting the
interaction, then a binary-operation method should return NotImplemented
whenever it doesn’t support the types; this gives the operand on the right side
9
of the operation a chance to implement the operation. See Section 9.10.6 for
more information.
Ç Note
Suppose you have an expression adding two objects together, each of a dif-
ferent class:
fido = Dog()
precious = Cat()
print(fido + precious)
Python evaluates the expression fido + precious by first checking to
see whether the Dog class implements an _ _add_ _ method. There are several
possibilities for what happens next.
◗ The left operand implements an _ _add_ _ method and returns a value other
than NotImplemented. Then no method of the right operand needs to be
called.
◗ The left operand (or rather its class) does not implement an _ _add_ _ method
at all. In that case, Python checks to see whether the right operand imple-
ments an _ _radd_ _ method.
◗ The left operand implements an _ _add_ _ method, but that method decides it
does not support interaction with an object like that on the right. Presumably,
the _ _add_ _ method has checked the type of the right operand and decided,
“I don’t support addition (+) with objects of this class.” In that case, it should
return NotImplemented. If so, Python checks to see whether the right oper-
and implements an _ _radd_ _ method.
In most cases, the reverse-order methods are close echoes of their forward-
order (left operand) versions. For example, it’s easy to write reverse-order
9
When implementing these methods as true in-place operators (so that the
data object in memory is modified), you should follow this procedure: First,
modify the contents of the instance through which the method is called—that
is, variables accessed through the self argument. Second, return a reference
to the object by using
return self
For example, here’s how the Point class might define the _ _iadd_ _ and
_ _imul_ _ methods:
def _ _iadd_ _(self, other):
self.x += other.x
self.y += other.y
return self
The following class definition illustrates the use of the Point class with sim-
ple definitions for several of these methods.
class Point:
def _ _init_ _(self, x = 0, y = 0):
self.x = x
self.y = y
The following IDLE session illustrates the use of these conversions. User
input is shown in bold.
>>> p = Point(1, 2.5)
>>> int(p)
3
>>> float(p)
3.5
def pop(self):
return self.mylist.pop()
def peek(self):
9
return self.mylist[-1]
def peek(self):
return self[-1]
Given these few lines of code, this Stack class can carry out all the opera-
tions of the more elaborate class definition shown earlier.
This solution—inheritance—works only when you choose to build your
collection class on top of an existing class, such as list or dict.
◗ Passing a call to _ _iter_ _ along to a collection object contained within the target.
This is the simplest solution. It’s essentially letting someone else handle the job.
◗ Implementing both _ _iter_ _ and _ _next_ _ in the collection class itself.
9
The _ _iter_ _ method returns self in this case, as well as initializing the
iteration settings. However, such a solution makes it impossible to support
more than one loop at a time.
◗ Responding to the _ _iter_ _ method by creating a custom iterator object
whose entire purpose is to support an iteration through the collection class.
This is the most robust, and recommended, approach.
The next example illustrates the second approach, because it is, in most
cases, relatively simple. To use this approach for the Stack class introduced
earlier, add the following method definitions:
def _ _iter_ _(self):
self.current = 0
return self
type(object)
For example, you can test a data object or variable directly to see whether it
has integer type.
n = 5
if type(n) == int:
print('n is integer.')
isintance(object, class)
isintance(object, tuple_of_classes)
The first version determines the class of the object and then returns True if
the object’s class is either the same as the class argument or is derived from
this argument—that is, the object must have a class identical to, or derived
from, the second argument.
The second version of this syntax is the same, except that it enables you to
include a tuple of (that is, an immutable list of) multiple classes.
Here’s an example of the first syntax:
n = 5
if isinstance(n, int):
print('n is an integer or derived from it.')
Here’s an example of the second syntax. This technique enables you to test
whether n contains any integer or floating-point number.
if isinstance(n, (int, float)):
print('n is numeric.')
Remember that because of the use of isinstance, rather than type, n
need not have int or float type; a type derived from int or float is suffi-
cient. Such types are uncommon, but you could create one by subclassing.
So, for example, suppose you wanted to enable Point objects to support
multiplication by both other Point objects and by numbers. You could do that
by defining an _ _mul_ _ method as follows:
def _ _mul_ _(self, other):
if type(other) == Point:
newx = self.x * other.x
newy = self.y * other.y
return Point(newx, newy)
elif type(other) == int or type(other) == float:
newx = self.x * other
9
>>> d = Dog()
>>> setattr(d, 'breed', 'Great Dane')
>>> getattr(d, 'breed')
'Great Dane'
But actual examples will almost always pass a variable containing a string
when using getattr. Here’s an example:
>>> field = 'breed'
>>> getattr(d, field)
'Great Dane'
Chapter 9 Summary
Python provides a flexible and powerful means to do object oriented program-
ming. The basic concept, that of a class, is essentially a user-defined type. But,
as with other object oriented programming systems (OOPS!), such a type can
include any number of method definitions. A method is a function defined in a
9
The self argument never explicitly appears in any method call; however,
it must always appear in any method definition intended to be called through
individual instances. The name self is a reference to the object itself.
Python is extremely polymorphic, due to the fact that variable and function
names are never resolved until run time—that is, until a statement is executed.
Therefore, any number of classes can define attributes of the same name, but
the correct code for the particular object is always correctly accessed.
One of the most distinctive features of Python is that any class may avail
itself of magic methods: method names that have a special meaning to Python
and are automatically invoked under special circumstances. For example,
the _ _init_ _ method is invoked when an instance of the class is initialized.
Magic method names are always characterized by having both leading and
trailing double underscores ( _ _). Therefore, if you avoid using such names
yourself, there is no possibility of naming conflicts.
This chapter presented many of the magic methods supported in Python,
including _ _init_ _ and methods that support arithmetic and other
operations.
◗ The Decimal class, which is a “fixed-point” data type that can hold decimal
fractions, such as 0.02, precisely and without error.
◗ The Money class, which you can download or develop yourself. For the sake
of illustration, this chapter takes the latter approach: developing this class
ourselves.
327
From the Library of Vineeth Babu
◗ The Fraction class, which can store fractions such as one-third or one-seventh
precisely and without any rounding errors, something that is not possible with
the other classes.
◗ The complex class, which represents complex numbers from the world of
higher math. Such numbers have both a “real” and an “imaginary” part.
If you’re not familiar with the use of complex numbers from higher mathe-
matics, don’t worry. You can safely ignore these numbers unless you’re doing
the sort of work that requires it. If you’re one of these people, you already
know it.
None of these classes requires you to download anything from the Internet,
and the complex class doesn’t even require anything to be imported. It’s a
built-in class, just as int, float, and str are.
>>> print(d + d + d)
0.3
This example does what you’d expect, but you should already see a twist to
it. The Decimal variable, d, was initialized with a text string. It might seem
much more natural to initialize it with a floating-point value. But look what
happens if you do.
>>> d = Decimal(0.1)
>>> print(d)
0.10000000000000000555111512312578...
This result must seem strange. But there’s a reason for it.
When 0.1 is used to initialize, a floating-point value (type float) is con-
verted to Decimal format. As stated, Decimal can store 0.1 with absolute
precision. But in this case, it first has to be converted from floating point; and
the problem is, the floating-point value already contains the rounding error
within it. This is eating the fruit of a poisoned tree.
How do we get around this problem? Initializing from a string is the best
solution. Using "0.01" as the initializer says, “I want the decimal realization
of what this string represents”—that is, the value without rounding errors.
Let’s look at another example.
>>> d = Decimal('0.1')
>>> print(d + d + d)
0.3
This gives the right answer. Contrast it with the floating-point version.
>>> print(0.1 + 0.1 + 0.1)
0.30000000000000004
Here’s another example. The following use of floating-point arithmetic
shows an even more obvious error that the use of Decimal solves.
>>> print(0.1 + 0.1 + 0.1 - 0.3)
5.551115123125783e-17
>>> d1, d3 = Decimal('0.1'), Decimal('0.3')
>>> print(d1 + d1 + d1 - d3)
0.0
The Decimal class maintains precision. For example, if you perform arith-
metic on instances of Decimal with two places of precision, including trailing
zeros, those two places are maintained, as you can see here:
>>> d1, d3 = Decimal('0.10'), Decimal('0.30')
>>> d1 + d3
Decimal('0.40')
Note Ë If you give an object to the print function, then, by default, it prints the
standard string representation of the number. In the case of Decimal objects,
this representation is a simple sequence of digits, with a decimal point as
appropriate.
1.00
However, if you give a Decimal object as direct input in the IDLE environ-
ment, it prints the canonical representation, which includes the type name and
quotation marks:
Decimal('1.00')
Ç Note
There are some other quirks of behavior of the Decimal class worth noting.
If you multiply two of these objects together, the precision is not maintained
but increased. Here is an example:
>>> d1, d3 = Decimal('0.020'), Decimal('0.030')
>>> print(d1 * d3)
0.000600
However, you can always adjust the precision of such an object by using
the round function, which readjusts the number of digits to the right of the
decimal point (getting rid of trailing zeros), as well as rounding figures up or
down. Here’s an example:
>>> print(round(d1 * d3, 4))
0.0006
>>> print(round(d1 * d3, 3))
0.001
10
◗ You can multiply integers with Decimal objects freely, as well as add them.
◗ You can also initialize directly and precisely from an integer:
d = Decimal(5)
◗ Adding or multiplying a Decimal object by a floating-point value is an error.
To perform such an operation, you convert the floating point to a Decimal
object—for example, converting from a floating-point value and then round-
ing. Otherwise, arithmetic operations between the two types cause runtime
errors.
So, for example, you can do the following, interacting with integers:
>>> d = Decimal(533)
>>> d += 2
>>> print(round(d, 2))
535.00
>>> d2 is d
False
The as_tuple method gives major clues to the internal structure of such
an object.
>>> d = Decimal('15.0')
>>> d.as_tuple()
DecimalTuple(sign=0, digits=(1, 5, 0), exponent=-1)
Here is what this suggests about the internal structure of the object.
And in fact, you can use this same information, if you choose, to construct
a Decimal object directly. Place a tuple inside parentheses, and then use the
information to initialize an object:
>>> d = Decimal((0, (3, 1, 4), -2))
>>> print(d)
3.14
The general structure of such a tuple—a tuple that fully describes the state
of a Decimal object—is shown here.
ntax
Key Sy
total = Decimal('0.00')
while True:
s = input('Enter amount in dollars and cents (#.##): ')
if not s:
break
d = Decimal(s)
d = round(d, 2)
total += d
Given this choice, inheritance is probably the better way to go; it’s also more
in keeping with the spirit of object orientation, which says that the relationship
“A is a kind of B, only more specialized,” is really an inheritance relationship.
Decimal str
dec_amt units
class Money():
But if this is all you can do, it’s not impressive. The next thing to add is the
ability to print Money objects in a meaningful and automatic way. Right now,
if you print m1, it’s not very useful.
>>> print(m1)
<_ _main_ _.Money object at 0x103cc6f60>
So, for example, the value for the USDCAD key is 0.75, meaning that a
Canadian-dollar figure is multiplied by 0.75 to get its equivalent in U.S. dol-
lars. Now the final version of the function can apply the currency-exchange
rate whenever two different currencies are added together.
The dictionary stores the exchange rates as Decimal objects, thereby mak-
ing the subsequent arithmetic easier to perform.
def _ _add_ _(self, other):
'''Money add function.
Supports two Money objects added together; if
the second has a different currency unit, then
exchange rate must be applied before adding the
two amounts together. Apply rounding of 2.
'''
if self.units != other.units:
r = Money.exch_dict[self.units + other.units]
m1 = self.dec_amt
m2 = other.dec_amt * r
m = Money(m1 + m2, self.units)
else:
m = Money(self.dec_amt + other.dec_amt,
self.units)
m.dec_amt = round(m.dec_amt, 2)
return m
Let’s step through how this function works. As the comments (or rather,
the doc string) point out, an exchange rate may be applied before the amounts
are added together, assuming the units are not the same (such as U.S. dollars
versus Canadian dollars). Although exchange rates are expressed as floating
point in most locations, we store those rates here as Decimal objects, so that
fewer conversions need to be done.
r = Money.exch_dict[self.units + other.units]
m1 = self.dec_amt
m2 = other.dec_amt * r
m = Money(m1 + m2, self.units)
In either case—whether an exchange rate is applied or whether it isn't—we
also want a rounding factor of 2 to be applied, so that the money is always
expressed with two digits of precision past the decimal point.
m.dec_amt = round(m.dec_amt, 2)
The new Money object, m, is finally returned by the _ _add_ _ function.
Note Ë This function definition works correctly, of course, as long as the three
supported currencies are used. If units other than USD, CAD, or EUR are used, a
KeyError exception results whenever mixed currencies are added.
Ç Note
Putting it all together, here’s the complete Money class. It’s not really com-
plete, of course, because there are many operations we still could add, such as
subtraction and multiplication by integers.
from decimal import Decimal
class Money():
'''Money Class.
Stores both a Decimal amount and currency units. When
objects are added, exchange rate will be applied if
the currency units differ.
'''
exch_dict = {
'USDCAD': Decimal('0.75'), 'USDEUR': Decimal('1.16'),
'CADUSD': Decimal('1.33'), 'CADEUR': Decimal('1.54'),
'EURUSD': Decimal('0.86'), 'EURCAD': Decimal('0.65')
}
if self.units != other.units:
r = Money.exch_dict[self.units + other.units]
m1 = self.dec_amt
m2 = other.dec_amt * r
m = Money(m1 + m2, self.units)
else:
m = Money(self.dec_amt + other.dec_amt,
self.units)
m.dec_amt = round(m.dec_amt, 2)
return m
That’s the (for now) complete class definition—although, as mentioned,
there are many operations you might want to add.
money_calc()
The final line of this code, which executes the function, makes it into a
complete program.
There’s a subtlety to this function. It’s desirable to let the first choice of cur-
rency (the units entered for the first line) determine the currency used for the
final answer. This gives the user control of the results. Perhaps you want the
results to be expressed in Canadian dollars or Euros, for example; you simply
need to make sure the first entry uses those units.
The problem is, we’re keeping a running total, and the usual way of keep-
ing a running total is to start with an initial zero value. Here’s an example:
amt = Money('0')
The problem here is that right now, USD is the default value for units; there-
10
fore, this initial choice, through the logic of the program, would predetermine
that every result of this program is expressed in U.S. dollars.
What we’d like to do instead is to let the user determine the currency of the
final results based on the first entry. But that means that we can’t start with an
initial zero value; it has to be set by the user.
Therefore, the variable n is used to record how many entries have been
made. If and only if an item is the first entry, the variable amt is created for the
first time.
if n == 0:
amt = m
else:
amt += m
n += 1
Note that addition assignment is supported, for Money as well as integers.
This is a general feature of Python. If there’s an _ _add_ _ function for the
class, you get both + and += operators supported for free, even though you
didn’t write an _ _iadd_ _ function. (However, as explained in Chapter 9, you
can’t take advantage of the fact that += is an in-place operation.)
When the program runs, it prompts the user for a series of values, just as
other adding machine applications in this book have done. When the user
enters an empty string (by just pressing Enter), the function breaks the loop
and then gives the total.
Here’s a sample session.
Enter money value: 1.05
Enter money value: 2.00 CAD
Enter money value: 1.5 EUR
Enter money value: 1.00
Enter money value: 2.5 CAD
Enter money value:
Total is 7.16 USD
Notice how this session successfully added three different kinds of curren-
cies. The final result is expressed in terms of U.S. dollars because the first
entry, by default, was in U.S. dollars.
Here’s a sample session that gives the result in Canadian dollars:
Enter money value: 1.50 CAD
Enter money value: 1.75 CAD
Enter money value: 2.00 USD
Enter money value: 1.00 USD
Enter money value:
Total is 7.24 CAD
default_curr = 'USD'
Then we need to alter the _ _init_ _ function. This is trickier than it
sounds, because although you can refer to class variables from within a
method definition, you can’t use such a reference in the argument list. So the
following causes an error:
# This causes an ERROR!
def _ _init_ _(self, v='0', units=Money.default_curr):
It’s frustrating that we can’t do this. However, the following definition of
the _ _init_ _ function works perfectly well, by replacing the default value
(an empty string) with the value stored in default_curr.
def _ _init_ _(self, v='0', units=''):
self.dec_amt = Decimal(v)
if not units:
self.units = Money.default_curr
else:
self.units = units
With the changes (shown in bold) made to the _ _init_ _ function, the class
variable, default_curr, now becomes in effect the default value for units.
10
Finally, the money_calc function can easily be altered so that the units
entered for the first item become the new default setting for the class. One line
of code needs to be added, about three-quarters of the way through the loop.
if n == 0: # If this is first entry...
amt = m # Create amt!
Money.default_curr = m.units
With this change, the application now enables the user to specify a default
different from U.S. dollars. All they have to do is specify the new default in the
first money object they enter. For example, the user in the following sample
session causes Canadian dollars (CAD) to be the default.
Enter money value: 1.0 CAD
Enter money value: 2.05
Enter money value: .95
Enter money value: 2
Enter money value:
Total is 6.00 CAD
In this case, it’s easy to see that both the units used for the total, and the
units used as the default currency, are Canadian dollars, and not U.S. dollars.
And in this next sample session, you can see that the default remains Cana-
dian dollars, even if a different currency is entered in the middle of the series.
Enter money value: 2.0 CAD
Enter money value: -1
Enter money value: 10 USD
Enter money value: 5.01
Enter money value: -5.01
Enter money value:
Total is 14.30 CAD
You can see that all the Canadian amounts cancel out except for one Cana-
dian dollar. A figure of 10 U.S. dollars was also entered. But the final result
is printed in Canadian dollars—because the first figure was in CAD. So,
although the sum contains 10 U.S. dollars, it’s converted to the equivalent in
CAD, plus the one Canadian dollar that was not canceled out, giving you 10
U.S. dollars in Canadian dollars (13.30), plus one Canadian dollar (1.00), for
a grand total of 14.30.
You should note that changing the default units for the class is a little tricky;
such a change affects all subsequent uses of the class as long as the program is
running. (However, it does not affect future running of the program.) But if
you show a little care, this shouldn’t be a problem.
class Money(Decimal):
m = Money('0.11', 'USD')
10
print(m, m.units)
Note Ë We can further generalize this code as follows, in which d is data in the
base class, and other_data is data in the subclass, which should be initial-
ized in _ _init_ _.
class MyClass(MySuperClass):
def _ _new_ _(cls, d, other_data):
return super(MyClass, cls)._ _new_ _(cls, d)
Ç Note
Note Ë It may seem unreasonable that Python doesn’t let you use the easy
approach to subclassing another type, as shown earlier with the hypothetical
superclass Thingie.
There are a number of reasons that’s not feasible in Python. For one thing,
if a superclass is immutable, that means its data can never be changed after
it’s created. Also, some built-in classes make use of the _ _new_ _ function to
initialize values, in addition to other actions, so that calling upon the super-
class’s _ _init_ _ function is inadequate. The basic rule is this: If subclass-
ing a built-in type the ordinary way doesn’t work, you might need to subclass
_ _new_ _.
Ç Note
int int
numerator denominator
10
Some issues arise, but these are all handled smoothly by the class. For
example, 1/2, 2/4, and 100/200 are all mathematically equivalent. But thanks
to internal methods, these are all reduced to the same internal representation
automagically. Here’s an example. First, we need to import the class.
from fractions import Fraction
Be sure to enter this statement exactly as shown. The word fractions is
lowercase and plural; the word Fraction is uppercase and singular! Why the
inconsistency, we’re not sure.
In any case, after the class is imported, it can be used to deal with Fraction
objects in a consistent, highly convenient way. Let’s look again at the problem
of dealing with 1/2, 2/4, and 100/200.
fr1 = Fraction(1, 2)
fr2 = Fraction(2, 4)
fr3 = Fraction(100/200)
print('The fractions are %s, %s, & %s.' % (fr1, fr2, fr3))
This example prints
The fractions are 1/2, 1/2, & 1/2.
All these Fraction objects are displayed as the same quantity, because
they’re automatically reduced to their simplest form.
>>> if fr1 == fr2 and fr2 == fr3:
print('They are all equal!')
Note Ë By using one of the shortcuts pointed out in Chapter 4, you can replace
the condition in this example by chaining the comparisons, producing a
shorter version.
>>> if fr1 == fr2 == fr3:
print('They are all equal!')
Ç Note
Fractions can be specified in other ways. For example, if only one integer
is given during initialization, the class stores it as that integer divided by 1
(which is a ratio, of course). Here’s an example:
>>> fr1 = Fraction(5)
>>> print(fr1)
5
Therefore, 1/2, 1/3, and 5/12, when added together, produce 5/4. You can
verify for yourself that this answer is correct.
total = Fraction('0')
while True:
s = input('Enter fraction (press ENTER to quit): ')
s = s.replace(' ', '') # Elim. spaces, just in case.
If that’s a problem, the only thing to be said is “Turn back now, or else aban-
don hope, all ye who enter.” But professional mathematicians have worked
out a series of techniques for dealing with such numbers.
Still with us? Okay. The first thing to be said about Python complex num-
bers is that you can write them as literal numbers. Here’s an example:
z = 2.5 + 1.0j
At first glance, z looks like a real number that is the sum of 2.5 and 1.0
times a variable j. But it’s not. It’s a single object, in which the real portion is
2.5 and the imaginary portion is 1.0.
As with other classes we’ve looked at, the complex class produces objects
that themselves are made up of smaller parts. Figure 10.3 displays the struc-
ture of a complex-number object.
float float
real imag
complex type
Figure 10.3. Structure of a Python complex number
3j
Note Ë You might think that spacing changes things here, that entering 0 + 3j
with internal spaces omitted, resulting in 0+3j, changes the interpretation of
the expression. It does not.
Ç Note
Even the expression 3j can be misleading if you’re not careful. Any such
expression is actually part of a complex number.
>>> z = 3j
>>> print(z.real)
0.0
You can, if you choose, have complex numbers with the imaginary portion
currently set to zero. But the use of j ensures complex type.
>>> z = 2 + 0j
>>> print(z)
(2+0j)
>>> print(z.real, z.imag)
2.0 0.0
And here’s another caveat: When you’re writing code that includes complex
numbers, it’s a good idea to avoid making j a variable.
You can convert other numbers to complex, although the imaginary part
will be assumed to be zero. But complex numbers cannot be converted to these
other types (instead, you must assign from .real and .imag portions); they
also cannot be compared to each other or to other numbers by using >, <, >=,
or <=.
z = complex(3.5) # This is valid; z.imag will be 0.
x = float(z) # Not supported!
x = z.real # But this is valid.
This should give you a good grounding in complex numbers in Python,
although most of the discussion has been about input and output formats,
along with the interpretation of literals.
Mathematically, complex numbers are not difficult to handle, given that
floating-point math is already well supported. Addition is obvious, and multi-
plication follows these rules.
◗ Multiply the four parts together, using distribution to get four results.
◗ There will be a real portion (real times real).
◗ There will be two portions with one factor of j each. Add these together to get
the new imaginary portion.
That’s how it’s done! When you understand these simple rules, complex
math is not such a mystery, after all.
Chapter 10 Summary
Most programming, or at least much of it, focuses on working with integers
and floating-point numbers, but for certain areas of the data-processing indus-
try, other data types may work better. Foremost among these is a Decimal, or
fixed-point type, which can hold dollar-and-cents figures with more precision
and accuracy than other data types can.
This chapter has shown that Python’s support for alternative data formats
is very strong. You can easily utilize the Decimal, Fraction, and complex
classes in your own programs, without having to download anything off the
Internet; the complex type doesn’t even require importing.
You can also come up with your own classes, building on the existing ones.
And, although you can download a Money class from the Internet, this chap-
ter showed how to start creating your own Money class, using the techniques
introduced in Chapter 9, “Classes and Magic Methods.”
But not everything is as easy as it looks. Inheriting from an immutable class
such as Decimal requires a particular coding technique shown in this chapter.
a floating-point value?
5 How easy is it to combine Decimal objects with integers in an arithmetic
expression?
359
From the Library of Vineeth Babu
11
A series of random numbers should exhibit certain behaviors.
These are easy qualities to test. By running tests with a different number of
trials, you should be able to see the ratio of predicted hits to actual hits gets
closer to 1.0. Here’s a function designed to test these qualities.
import random
def do_trials(n):
hits = [0] * 10
for i in range(n):
a = random.randint(0, 9)
hits[a] += 1
for i in range(10):
fss = '{}: {}\t {:.3}'
print(fss.format(i, hits[i], hits[i]/(n/10)))
This function begins by creating a list with 10 elements. Each of these ele-
ments holds a count of hits: For example, hits[0] will store the number of
times a 0 is generated, hits[1] will store the number of times a 1 is generated,
hits[2] will store the number of times a 2 is generated, and so on.
The first loop generates n random numbers, in which each number is an
integer in the range 0 to 9. The elements in the hits list are then updated as
appropriate.
for i in range(n):
a = random.randint(0, 9)
hits[a] += 1
The key statement within this loop, of course, is the call to random.randint,
which (in this case) produces an integer in the range 0 to 9, inclusive, with a
uniform probability of getting any of the various values.
The second loop then prints a summary of the results, showing how many
times each number 0 to 9 was generated and how that number matches against
the predicted number of hits, which is n/10 in each case.
In the following session, the function is used to generate and record the
results of 100 trials.
>>> do_trials(100)
0: 7 0.7
1: 13 1.3
2: 10 1.0
3: 4 0.4
4: 11 1.1
5: 10 1.0
6: 7 0.7
7: 11 1.1
8: 12 1.2
9: 15 1.5
This run of 100 trials shows that n equal to 100 isn’t nearly enough to get con-
vincingly uniform results. The ratio of actual hits to predicted hits goes from a
low of 0.4 to a high of 1.5. But running1000 trials produces more even results.
>>> do_trials(1000)
0: 103 1.03
1: 91 0.91
2: 112 1.12
3: 102 1.02
4: 110 1.1
5: 101 1.01
6: 92 0.92
7: 96 0.96
8: 87 0.87
9: 106 1.06
Here the ratios of actual hits to expected hits (n/10) are much closer, on
the whole, to 1.0. They get closer still if we increase the number of trials to
77,000.
>>> do_trials(77000)
0: 7812 1.01
1: 7700 1.0
11
3: 7840 1.02
4: 7762 1.01
5: 7693 0.999
6: 7470 0.97
7: 7685 0.998
8: 7616 0.989
9: 7736 1.0
Remember, the ratios of expected hits (one-tenth of all the trials) to actual
hits (ranging from 7470 to 7840) comprise the third column.
Although this approach is not entirely scientific, it’s sufficient to confirm
the three qualities we expected to see in random-number behavior. Each of the
10 possible values (0 through 9) is produced roughly one-tenth of the time,
variation is seen, and, as the number of trials increases, variation grows
smaller as a percentage of the number of trials. And that’s what we wanted!
Here’s a sample session. Assume that the function call random.randint(1, 50)
returns the value 31. The user doesn’t learn that this value has been selected
until the end of the game.
Enter guess: 25
Too low. Enter guess: 37
Too high. Enter guess: 30
Too low. Enter guess: 34
Too high. Enter guess: 32
Too high. Enter guess: 31
Success! You win.
This game can be improved in a couple of ways. First, it should ask users
whether they want to play again after each game. Second, if users get bored
during any given round of the game, they should be able to exit early. Here’s
the improved version.
import random
def play_the_game():
n = random.randint(1, 50)
while True:
guess = int(input('Enter guess (0 to exit): '))
if guess == 0:
print('Exiting game.')
break
elif guess == n:
print('Success! You win.')
break
elif guess < n:
print('Too low.', end=' ')
else:
print('Too high.', end=' ')
while True:
play_the_game()
ans = input('Want to play again? (Y or N): ')
if not ans or ans[0] in 'Nn':
break
11
The shuffle function is one of the most useful in the random package. This
function, as you might guess, is especially useful for simulating a deck of
cards, and you’d be right. But it’s extensible to other situations as well.
The action of shuffle is to rearrange the order of a list so that any element
can appear in any position. The number of elements does not change, nor do
the number of duplicate items, if any. So, for example, suppose you shuffle the
following list:
kings_list = ['John', 'James', 'Henry', 'Henry', 'George']
Next, we use random.shuffle to randomize the order.
random.shuffle(kings_list)
If you now print the list, you’ll see that no matter how the shuffling went,
there are still two Henrys, and one each of John, James, and George. The
order, however, will almost certainly change.
The shuffling algorithm is a fairly universal one.
code
Pseudo
class Deck():
def deal(self):
if self.size - self.current_card < 1:
random.shuffle(self.card_list)
self.current_card = 0
print('Reshuffling...!!!')
self.current_card += 1
return self.card_list[self.current_card - 1]
The value “dealt” by a deck, by the way, can be turned into a playing card
with a unique combination of rank and suit.
ranks = ['2', '3', '4', '5', '6', '7', '8', '9',
'10', 'J', 'Q', 'K', 'A']
suits = ['clubs', 'diamonds', 'hearts', 'spades' ]
my_deck = Deck(52)
for i in range(12):
for i in range(5):
d = my_deck.deal()
r = d % 13
s = d // 13
print(ranks[r], 'of', suits[s])
print()
The Deck class has some limitations. When the deck is reshuffled, there will
still be some cards in play—that is, cards still on the table. Those do not get
shuffled back in. Instead, the shuffled deck is created from the discard pile only.
11
new hand is dealt do the cards in play join the discard pile. This creates a rela-
tionship between the deck, the cards in play, and the discard pile, as shown in
Figure 11.1.
Cards in
Discard List
play
New
Hand
Deal Reshuffle
Card_list
(Remaining
Deck)
At one time, this is how the game of blackjack (also known as twenty-one)
was played in casinos. Occasionally it still is: one standard deck, dealt all the
way down to the last card, and then reshuffled.
We can rewrite the Deck object as follows.
import random
class Deck():
def deal(self):
if len(self.card_list) < 1:
random.shuffle(self.discards_list)
self.card_list = self.discards_list
self.discards_list = []
print('Reshuffling...!!!')
new_card = self.card_list.pop()
self.cards_in_play_list.append(new_card)
return new_card
def new_hand(self):
self.discards_list += self.cards_in_play_list
self.cards_in_play_list.clear()
This class definition has one new method, new_hand, which should be
called whenever a hand is finished and all the cards currently in play are put
into the discards. Then the deck should add the cards currently in play to
discard_list and clear cards_in_play_list.
The changes to the deal method are more involved. Now, instead of just
shuffling the card_list, which normally contains all the cards in the deck,
only the discard pile is shuffled. The resulting list is then transposed with
card_list; this becomes the new deck to draw from. Then discard list is
cleared.
If there is a reshuffle while cards are still on the table, those cards will not
be reshuffled, so the resulting deck size may not be the same. But then how
do those cards in play ever get back into the deck? Simple. They will be added
to the discards at the end of the current hand and then eventually reshuffled
back into the deck.
Note Ë You might want to make further changes to this class, based on chang-
ing rules of blackjack in Las Vegas casinos. For example, you might want to
accommodate the six-deck “shoe” that most casinos use. That’s actually just
a matter of allocating the right deck size; it doesn’t alter the code shown here.
You also might want to revise some of the methods so that the dealer has a
way to reshuffle early (for example, by writing a new method to do just that).
Ç Note
11
self.card_list = [num + suit
for suit in '\u2665\u2666\u2663\u2660'
for num in 'A23456789TJQK'
for deck in range(n_decks)]
self.cards_in_play_list = []
self.discards_list = []
random.shuffle(self.card_list)
Note that this version of the program creates a deck that’s a multiple of the
standard 52-card deck. Creating “decks” that have multiple decks within them
might be a good way of simulating a six-deck “shoe” played in Las Vegas.
Given this version of the _ _init_ _ method, the Deck object now contains
representations of cards that appear as follows, if you were to print them all.
Ay 2y 3y 4y 5y 6y 7y 8y 9y Ty Jy Qy Ky
Az 2z 3z 4z 5z 6z 7z 8z 9z Tz Jz Qz Kz
A' 2' 3' 4' 5' 6' 7' 8' 9' T' J' Q' K'
A9 29 39 49 59 69 79 89 99 T9 J9 Q9 K9
Here’s a complete version of the revised Deck class, along with a small program
that prints a hand of five cards (as in Poker). This version assumes a six-deck
shoe, although you can easily revise it to use only one deck.
# File deck_test.py
# ---------------------------------------
import random
class Deck():
def deal(self):
if len(self.card_list) < 1:
random.shuffle(self.discards_list)
self.card_list = self.discards_list
self.discards_list = []
print('Reshuffling...!!!')
new_card = self.card_list.pop()
self.cards_in_play_list.append(new_card)
return new_card
def new_hand(self):
self.discards_list += self.cards_in_play_list
self.cards_in_play_list.clear()
11
NUMBER OF
STANDARD DEVIATIONS PERCENT OF POPULATION (AS PREDICTED)
One 68 percent, on average, should fall within one standard
deviation of the mean.
Two 95 percent, on average, should fall within two standard
deviations of the mean.
Three 99.7 percent, on average, should fall within three
standard deviations of the mean.
Here’s how to read Table 11.2. As an example, suppose you have a normal
distribution with a mean of 100 and a standard deviation of 20. You should
expect, in the long run, about 68 percent of the numbers produced by the
normalvariate function to fall within 80 and 120. You should expect 95 per-
cent of the numbers produced to fall within 40 and 160.
Yet with all the probability distributions in the random package, they are
just that: probability distributions. In the short run especially, nothing is cer-
tain. For a given trial, the probability a number will fall into the range 40
to 160, given the conditions outlined here, is 95 percent; there’s a 5 percent
change of falling outside the range.
But that’s not saying that such occurrences cannot happen. Events with
only a 5 percent probability can and do happen all the time. And events with
probabilities of 1 in a million or less happen every day—every time someone
wins the lottery!
Therefore, if you take only a few sample results, you may not see anything
that looks like a bell-shaped curve. Fortunately, because of the Law of Large
Numbers, demonstrated in Section 11.3, “Testing Random Behavior,” if you
take many sample values, you should see behavior that is fairly predictable.
The following program is designed to take advantage of the Law of Large
Numbers by allowing for an arbitrarily large number of sample results, scal-
ing down the numbers so that they can be easily graphed, and then printing
the resulting graph.
import random
def pr_normal_chart(n):
hits = [0] * 20
for i in range(n):
x = random.normalvariate(100, 30)
j = int(x/10)
11
Figure 11.2. Normal distribution after 500 trials
But this figure used only 500 trials, which is not that large a sample for sta-
tistical purposes; it should reveal the general pattern but deviate significantly
in places, and it does.
In Figure 11.3, the number of trials is increased from 500 to 199,000.
Because of the scaling written into the function, the overall number of aster-
isks to be printed does not significantly change. But now the shape conforms
much more closely to a mathematically perfect bell curve.
With samples larger than 199,000 (200,000 or so), you should continue to
get results that—at this rough level of granularity—look like a mathemati-
cally perfect bell curve.
11
Although a function containing yield doesn’t seem to return an object, it
does: It returns a generator object, also called an iterator. The generator object
is what actually yields values at run time.
So—and this is the strange part—the function describes what the generator
does, but the generator itself is actually an object returned by the function!
Admittedly, this is a little counterintuitive.
Here’s a simple random-number generator, which produces floating-point
values in the range 0 to roughly 4.2 billion, the size of a four-byte integer.
import time
def gen_rand():
p1 = 1200556037
p2 = 2444555677
max_rand = 2 ** 32
r = int(time.time() * 1000)
while True:
n = r
n *= p2
n %= p1
n += r
n *= p1
n %= p2
n %= max_rand
r = n
yield n
The result is a random-number generator that (and you can verify this
yourself) seems to meet the obvious statistical tests for randomness quite well.
It is still a relatively simple generator, however, and in no way is intended to
provide the best possible performance. It does observe some basic principles of
randomness.
With this generator function defined, you can test it with the following
code:
>>> gen_obj = gen_rand()
>>> for i in range(10): print(next(gen_obj))
1351029180
211569410
1113542120
1108334866
538233735
1638146995
1551200046
1079946432
1682454573
851773945
11
CATEGORY DESCRIPTION
Hyperbolic functions The hyperbolic-function category includes hyperbolic versions of the trigonomet-
ric and inverse-trigonometric functions. The names are formed by placing an “h”
on the end of the name, giving sinh, cosh, and tanh.
Logarithmic functions The math package provides a flexible set of logarithmic calculations, including
support for a variety of bases. These functions are the inverse of exponentiation.
They include log2, log10, and log, for finding logs of base 2, 10, and e, respec-
tively. The last can also be used with any base you specify.
Conversion to integer Several functions enable conversion of floating-point numbers to integer, includ-
ing both floor (always rounding down) and ceil (always rounding up).
Miscellaneous These include pow (power, or exponentiation) and square root, sqrt.
The last two data objects in Table 11.4 are provided for full support of
all the states of a floating-point coprocessor. These values are rarely used
in Python, however, because the language does not allow you to get infinity
through division by zero; such an action raises an exception in Python.
The value math.pi, however, is widely used in math and science applications.
Here’s a simple one: Get the diameter of a circle and return its circumference.
from math import pi
def get_circ(d):
circ = d * pi
print('The circumference is', circ)
return circ
One notable omission from this list of constants is the mathematical value
phi, also known as the golden ratio. But this value is relatively easy to produce
yourself: It’s 1 plus the square root of 5, the result of which is then divided by 2.
import math
phi = (1 + math.sqrt(5))/ 2
Or, without the use of the math package, you could calculate it this way:
phi = (1 + 5 ** 0.5)/ 2
In either case, its closest approximation in Python is 1.618033988749895.
Hypotenuse Opposite
Side
Adjacent Side
Figure 11.4. A right triangle
11
as follows. In Python, as in most other programming languages and librar-
ies, these three functions are implemented as sin, cos, and tan functions,
respectively.
sine(θ) = opposite side <B> / hypotenuse <C>
cosine(θ) = adjacent side <A> / hypotenuse <C>
tangent(θ) = opposite side <B> / adjacent side <A>
So, for example, if the opposite side were one half the length of the adjacent
side, then the tangent would be 0.5.
What has this got to do with the height of trees? Plenty. Consider the follow-
ing scenario: A human observer is stationed 1,000 feet from the base of a tree.
He doesn’t know the height of the tree, but he’s certain about the distance to
the base, because this has been measured before. Using his trusty sextant, he
measures the angle of the top of the tree above the horizon. This gives him an
angle, θ.
Figure 11.5 illustrates this scenario.
1,000 feet
Figure 11.5. Figuring the height of a tree
Now it takes only a little algebra to come up with the correct formula.
Remember the formula for a tangent function.
tangent(θ) = opposite side <B> / adjacent side <A>
Multiplying both sides by A and rearranging, we get the following rule of
calculation.
opposite side <B> = tangent(θ) * adjacent side <A>
So, to get the height of the tree, you find the tangent of the angle of eleva-
tion and then multiply by the distance to the base, which in this case is 1,000
feet. Now it’s easy to write a program that calculates the height of a tree.
def main():
while True:
d = float(input('Enter distance (0 to exit): '))
if d == 0:
print('Bye!')
break
a = float(input('Enter angle of elevation: '))
print('Height of the tree is', get_height(d, a))
main()
The core of the program is the line that does the calculation:
return tan(radians(angle)) * dist
Although this is a simple program, it does have one subtlety, or gotcha.
All the Python trig functions use radians. They do not use degrees unless the
degrees are first converted.
A full circle is defined as having 360 degrees; it also is defined as having
2*pi radians. So if the user is going to use degrees—which most people use
in real life—you need to apply the math.radians function to convert from
degrees to radians (or else just multiply by 2*pi/360).
Here’s a sample session:
Enter distance (0 to exit): 1000
Enter angle of elevation: 7.4
Height of the tree is 129.87732371691982
Enter distance (0 to exit): 800
Enter angle of elevation: 15
Height of the tree is 214.35935394489815
Enter distance (0 to exit): 0
Bye!
Note Ë In this example, we used the variation on import statement syntax that
imports specific functions. This is often a good approach if you are sure that
there won’t be conflicts with the particular names that were imported.
from math import tan, radians
Ç Note
11
Other functions from the math package that are frequently useful are the log-
arithmic functions, listed in Table 11.5.
Now, to understand logarithms using base 10, we only need to reverse the
columns (see Table 11.7). You should notice from these tables how slowly the
exponent increases as the amount does. Logarithmic growth is very slow—
and is always overtaken by simple linear growth.
Remember that some of the results in Table 11.7 are approximate. For
example, if you take the base-10 logarithm of 31,622.777, you’ll get a number
very close to 4.5.
11
number, 2, and then guess either 1 or 3 for the next guess. This guarantees you
need never take more than two guesses, even though there are three values.
With more than three, we might need another guess. But two guesses are suf-
ficient for N = 3.
The next “ceiling” should occur at N = 7, and you should see why. It’s
because you can guess the middle value, 4, and then—if this guess is not suc-
cessful—limit yourself to the top three numbers (requiring two more guesses)
or the bottom three numbers (also requiring two more guesses). Therefore,
three guesses are enough for N = 7.
If you think about it, each step up, requiring an additional guess, can be
obtained by doubling N at the last step and adding 1. For example, Figure 11.6
illustrates how the number of guesses needed increases from 1 to 2 to 3, as N
increases from 1 to 3 to 7.
1st choice
2nd choice 2nd choice
1 2 3 4 5 6 7
The numbers in the leftmost column of Table 11.8 list ranges of numbers
in the game; not all numbers are listed, but only those that “step up”—that
is, require an additional guess. Each of these is an upper limit—so that for
N = 15 or less, for example, up to four guesses are required. But for any range
greater than 15, more guesses are required.
The numbers in the leftmost column are each 1 less than a power of 2. When
added to 1, they correspond to a power of 2. Therefore, to get the numbers in
the rightmost column—which contains the number of guesses needed—you
must take logarithm base 2 of N+1.
The final step is to round upward to the nearest integer, because the num-
ber of steps taken must be an integer, and not floating point. The correct for-
mula is
Maximum guesses needed = ceiling(log-base-2(N + 1))
Writing the program, with the aid of the math package, is now easy.
from math import log2, ceil
Chapter 11 Summary
11
This chapter explored how two of the most commonly used packages—
random and math—can be used in practical ways in your programs. Both of
these packages come preinstalled with the Python download.
The random package provides a variety of distributions. The most com-
monly used ones were explored in this chapter: randint, returning a random
integer with uniform distribution across a given range; shuffle, which rear-
ranges the contents of a list as if it were a deck of cards; and normalvariate.
The classic normal probability distribution tends to generate values close to
the specified mean. Outlier values are always possible, but the farther a value
is from the mean, the less frequently it’s generated.
The chapter then showed how to use some of the most common functions
from the math package, including tan, which calculates tangents, and the
logarithmic functions. The Python implementation of logarithms includes
log10, log2, and finally log, which calculates logarithms in any base.
387
From the Library of Vineeth Babu
(in which you operate on an array or large portions of that array at the same
time), and high-level support for creating and maintaining multidimensional
arrays.
Each of these is a
ref1 ref2 ref3 reference to any kind
12
of object.
10 20 [5, 12, 7, 3]
As Figure 12.2 shows, an array is simpler in its design. The array object itself is
just a reference to a location in memory. The actual data resides at that location.
Because the data is stored in this way, elements must have a fixed length.
They also need to have the same type. You can’t store a random Python inte-
ger (which may take up many bytes of memory, in theory), but you can have
integers of fixed length.
array object
10 20 30
Arrays store data more compactly than lists do. However, indexing arrays
turns out to be a bit slower than indexing Python lists, because Python list-
indexing is heavily optimized behavior.
One of the advantages to using the array package is that if you interact with
other processes or C-language libraries, they may require that you pass data in a
contiguous block of memory, which is how the arrays in this chapter are stored.
To use the array package, import it and then create an array by calling
array.array to allocate and initialize an array object. For example, here’s
how to get a simple array of the numbers 1, 2, 3:
import array
a = array.array('h', [1, 2, 3])
Note the use of 'h' here as the first argument. It takes a single-character
string that specifies the data type—in this case, 16-bit (2-byte) integers (lim-
iting the range to plus or minus 32K). We could create a larger array by using
the range function.
import array
a = array.array('h', range(1000))
This works, but notice that you could not create an array of numbers from
1 to 1 million this way (or 0 to 999,999) without increasing the size of the
data type from short integer ('u') to long integer ('l'), because otherwise
you would exceed what can be stored in a 16-bit integer array.
import array
a = array.array('l', range(1_000_000))
Warning: Don’t try printing this one, unless you’re prepared to wait all day!
At this point, you might object that integers in Python are supposed to
be “infinite” or, rather, that the limits on integers are astronomical. That’s
correct, but you give up this flexibility when you deal with fixed-length
structures.
One of the limitations of the array package and its array type is that it
supports one-dimensional arrays only.
Note Ë On a Macintosh system, problems may sometimes arise because Python 2.0
may be preloaded. You may download numpy as described in this section but find
that it is not available for use with IDLE, possibly because the version numbers are
not in sync. If that happens, start IDLE by typing idle3 from within Terminal.
idle3
12
Ç Note
Note Ë The data type created by the standard numpy routines is called ndarray.
This stands for “N-dimensional array.”
Ç Note
But why use numpy at all? To understand why, consider the problem of add-
ing up a million numbers—specifically, 1 to 1,000,000.
If you’re mathematically inclined, you may know there’s an algebraic for-
mula that enables you to do this in your head. But let’s assume you don’t know
this formula. You can agree that the task is a good benchmark for the speed of
a language. Here’s how you’d sum up the numbers most efficiently using the
core Python language:
a_list = list(range(1, 1_000_001))
print(sum(a_list))
That’s not bad by the standard of most languages. Here is the numpy-based
code to do the same thing. Notice how similar it looks.
import numpy as np
a = np.arange(1, 1_000_001)
print(sum(a))
In either case, the answer should be 500,000,500,000.
def benchmarks(n):
t1 = time()
t2 = time()
print('Time taken by Python is', t2 - t1)
t1 = time()
a = np.arange(1, n + 1) # Numpy!
tot = np.sum(a)
t2 = time()
print('Time taken by numpy is ', t2 - t1)
If this function is used to sum the first ten million numbers, here are the
results, measured in seconds:
>>> benchmarks(10_000_000)
Time taken by Python is 1.2035150527954102
Time taken by numpy is 0.05511116981506348
Wow, that’s a difference of almost 24 to 1. Not bad!
Performance If you isolate the time of doing the actual addition—as opposed to creat-
Tip ing the initial data set—the contrast is significantly greater still: about
60 times as fast. Creating these more accurate benchmarks is left as an exer-
cise at the end of the chapter.
Ç Performance Tip
12
But this section serves as an introduction to the most common ways of creat-
ing numpy arrays, as summarized in Table 12.1.
The sections that follow provide details. Many of these functions enable
you to specify a dtype argument, which determines the data type of each and
every element in a numpy array. This feature lets you create arrays of different
base types. A dtype specifier may be either (1) one of the symbols shown in
Table 12.2 or (2) a string containing the name. In the former case, the symbol
should usually be qualified:
import numpy as np
np.int8 # Used as a dtype
'int8' # Also used as a dtype
The last line of Table 12.2 creates a fixed-length string type. Strings
shorter than this length can be assigned to elements of this type. But strings
that are longer are truncated. For an example, see Section 12.5.8, “The ‘full’
Function.”
12
default, 'K', means to preserve the storage of the source data, whatever it is.
'C' means to use row-major order (which is what the C language uses), and
'F' means to use column-major order (which is what FORTRAN uses).
As an example, you can initialize from a Python list to create a one-
dimensional array of integers.
import numpy as np
a = np.array([1, 2, 3])
You can just as easily create a two-dimensional array, or higher, by using a
multidimensional Python list (a list of lists):
a = np.array([[1, 2, 3], [10, 20, 30], [0, 0, -1]])
print(a)
Printing this array within IDLE produces
array([[ 1, 2, 3],
[ 10, 20, 30],
[ 0, 0, -1]])
numpy is designed to handle arrays with smooth, rectangular shapes. If you
use higher-dimensional input that is “jagged,” the array conversion must com-
pensate by constructing as regular an array as well as it can.
So from within IDLE, you write this code:
>>> import numpy as np
>>> a = np.array([[1, 2, 3], [10, 20, 300]])
>>> a
array([[ 1, 2, 3],
[ 10, 20, 300]])
But here’s what happens if the second row is made longer than the first:
>>> a = np.array([[1, 2, 3], [10, 20, 300, 4]])
>>> a
array([list([1, 2, 3]), list([10, 20, 300, 4])],
dtype=object)
Now the array is forced into being a one-dimensional array of objects (each
one being a list) rather than a true two-dimensional array.
12
Five elements (and not four) were required to get this result, because by
default, the linspace function includes the endpoint as one of the values.
Therefore, num was set to 5. Setting it to 6 gets the following results:
>>> a = np.linspace(0, 1.0, num=6)
>>> a
array([0., 0.2, 0.4, 0.6, 0.8, 1. ])
You can specify any number of elements, as long as the element is a positive
integer. You can specify any data type listed in Table 12.2, although some
are more difficult to accommodate. (The bool type produces unsatisfying
results.) Here’s an example:
>>> np.linspace(1, 5, num=5, dtype=np.int16)
array([1, 2, 3, 4, 5], dtype=int16)
In this case, integers worked out well. However, if you specify a range that
would normally require floating-point values and use an integer type, the func-
tion has to convert many or all of the values to integer type by truncating them.
The dtype argument determines the data type of each element. By default,
it is set to 'float'. (See Table 12.2 for a list of dtype settings.)
The order argument determines whether the array is stored in row-major
or column-major order. It takes the value 'C' (row-major order, as in C) or
'F' (column-major order, as in FORTRAN). C is the default.
The following example creates a 2 × 2 array made up of 16-bit signed
integers.
import numpy as np
a = np.empty((2, 2), dtype='int16')
Displaying this array in IDLE (and thereby getting its canonical representa-
tion) produces
array([[0, 0],
[0, -3]], dtype=int16)
Your results may vary, because the data in this case is uninitialized and
therefore unpredictable.
Here’s another example. Remember that although the numbers may look
random, don’t rely on this “randomness.” It’s better to consider such uninitial-
ized values to be “garbage.” This means don’t use them.
a = np.empty((3, 2), dtype='float32')
Displaying this array in IDLE produces
array([[1.4012985e-45, 2.3509887e-38],
[9.1835496e-41, 3.5873241e-43],
[1.4012985e-45, 2.3509887e-38]], dtype=float32)
12
Here’s an example:
a = np.eye(4, dtype='int')
Displaying this array in IDLE produces
array([[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]])
Or we can create a floating-point version, using the dtype default,
'float', and making it somewhat larger: 6 × 6 instead of 4 × 4.
a = np.eye(6)
The result looks like this when displayed in IDLE:
array([[1., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0.],
[0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 1.]])
Arrays like this have a number of uses, but basically, they provide a way to
do batch processing on large arrays when you want to do something special
with coordinate pairs that match the identity relationship, R = C.
[[1, 1, 1],
[1, 1, 1]]], dtype=int16)
Finally, here’s a one-dimensional array of Booleans. Notice that all the 1
values are realized as the Boolean value True.
>>> a = np.ones(6, dtype=np.bool)
>>> a
array([ True, True, True, True, True, True])
This last kind of array—a Boolean array set to all-True values—will prove
useful when running the Sieve of Eratosthenes benchmark to produce prime
numbers.
Note Ë The name of this function is tricky, because the English word “zeros”
can also be spelled “zeroes.” Remember to use the shorter spelling, zeros,
only. Ah, English spelling—never mastered even by native English speakers!
Ç Note
12
Here is a simple example creating a 3 × 3 two-dimensional array using the
default float type.
>>> import numpy as np
>>> a = np.zeros((3,3))
>>> a
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
Here’s another example, this time creating a 2 × 2 × 3 array of integers.
>>> a = np.zeros((2, 2, 3), dtype=np.int16)
>>> a
array([[[0, 0, 0],
[0, 0, 0]],
[[0, 0, 0],
[0, 0, 0]]], dtype=int16)
Finally, here’s a one-dimensional array of Booleans. Notice that all the zero
values are realized as the Boolean value False.
>>> a = np.zeros(5, dtype=np.bool)
>>> a
array([False, False, False, False, False])
If the dtype argument is either omitted or set to None, the function uses the
data type of the fill_value, which is required for this function.
Here’s a simple example creating a 2 × 2 array in which each element is set
to 3.14.
>>> import numpy as np
>>> a = np.full((2, 2), 3.14)
>>> a
array([[3.14, 3.14],
[3.14, 3.14]])
Here’s another example; this one creates an array of eight integers, each set
to 100.
>>> a = np.full(8, 100)
>>> a
array([100, 100, 100, 100, 100, 100, 100, 100])
This final example takes advantage of the fact that you can create a numpy
array of strings—provided that all these strings observe a fixed maximum
size.
>>> a = np.full(5,'ken')
>>> a
array(['ken', 'ken', 'ken', 'ken', 'ken'], dtype='<U3')
After the array is created in this way, with strings of size 3, each such string
has, in effect, a maximum size. You can assign a longer string to any of these
array elements, but it will be truncated.
a[0] = 'tommy' # Legal, but only 'tom' is assigned.
12
ize an array by calling another function that works as if it were transforming
indexes into arguments.
ntax
Key Sy
def simple(n):
return n + 1
[[0 1 2],
[0 1 2],
[0 1 2]]
These are identity arrays along specific axes. In the first array, each element
is equal to its row index; in the second array, each element is equal to its col-
umn index.
The implementation of fromfunction operates on the arrays. As a result,
the callable argument (the other function being called) is executed only
once! But it is executed on one or more arrays—one for each dimension—
enabling batch processing.
If you use fromfunction the way it was designed to be used, this underly-
ing implementation works. But if you do unorthodox things, strange results
are possible. Consider the following code, which should produce a 3 × 3 array.
a = np.fromfunction(lambda r, c: 1, (3, 3), dtype='int')
12
Suppose you want to create the classic multiplication table for numbers from 1
to 10. There is more than one way to do that with numpy. You could create an
empty array, for example, and assign values to the elements.
With numpy, you could also do something similar. You could create an
array initialized to all-zero values, for example, and then write a nested loop
to assign R * C to each element—actually, it’s (R+1)*(C+1).
By far the most efficient approach would be to use fromfunction to create
an array that called a function to generate the values, without writing any loops
at all. This is the numpy philosophy: As much as possible, let the package do all
the work by using batch operations. You should be writing relatively few loops.
Here’s the solution:
import numpy as np
You can improve the appearance by getting rid of the brackets in the dis-
play. That’s relatively easy to do if we first convert to a string and then use the
str class replace method.
s = str(a)
s = s.replace('[', '')
s = s.replace(']', '')
s = ' ' + s
As mentioned in Chapter 4, replacing a character with the empty string
is a convenient way to purge all instances of a character. This example calls
the replace method to get rid of both kinds of brackets. Finally, a space is
inserted at the front of the string to make up for the loss of two open brackets.
Now, printing this string produces a pleasing display.
>>> print(s)
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100
12
A * n Array with n multiplied with each element of A.
n ** A A number raised to the power of each element of A, producing another
array with the results.
A ** n Each element in A is raised to the power n.
A / n Array with n dividing into each element of A.
A // n Array with n dividing into each element of A but using ground division.
12
contains the squares of all the elements in A. We can do this by multiplying
A by itself, which does a member-by-member multiplication of each element.
>>> C = A * A
>>> print(C)
[[ 0. 1. 4. 9.]
[ 16. 25. 36. 49.]
[ 64. 81. 100. 121.]
[144. 169. 196. 225.]]
Keep in mind there’s no requirement that numpy arrays have a perfect
square or perfect cubic shape—only that they’re rectangular. You can always
reshape an array. For example, the 4 × 4 array just shown can be reshaped into
a 2 × 8 array.
>>> print(C.reshape((2,8)))
12
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50]
We want to zero out all the numbers that are not prime.
12
The result, clearly, is to produce the middle two rows. Now, how can we get
the middle two columns? Actually, that turns out to be almost as easy.
>>> A[:, 1:3]
array([[ 2, 3],
[ 6, 7],
[10, 11],
[14, 15]])
Take another look at that array expression.
A[:, 1:3]
The colon before the comma says, “Select everything in this dimension”—
in this case, the row dimension. The expression 1:3 selects all the columns
beginning with index 1 (the second column) up to but not including index 3
(the fourth column). Therefore, the expression says, “Select all rows, with the
intersection of columns 1 up to but not including column 3—that is, the sec-
ond and third columns.”
The general syntax for indexing and slicing an array of N dimensions is
shown here.
ntax
Key Sy
[[0 1 0]
[0 1 0]
[0 1 0]]
12
neighbor_count = np.sum(G[1:4, 1:4]) - G[2, 2]
The result is 2. In the Game of Life, that would indicate that the cell in the
middle is “stable”: In the next generation, it will experience neither a birth nor
a death event.
One way to use this array is to zero out all elements that don’t meet the
condition of being greater than 4, by multiplying the two arrays—B and
(B>4)—together.
>>> print(B * (B > 4))
[[0 0 0]
[0 5 6]
[7 8 9]]
When working with Boolean arrays, you should note that the use of paren-
theses is often critical, because comparison operators have low precedence.
But an even better use of a Boolean array is to use it as a mask—in which
case it selects in elements with a corresponding True value in the mask, and
selects out elements with a corresponding False value.
Using a Boolean array as a mask produces a one-dimensional array, regard-
less of the shape of the array operands.
ntax
Key Sy
12
◗ B < 7 is another Boolean array, again of the same shape as B.
◗ The expression (B > 2) & (B < 7) uses binary AND (&) to achieve an “and”
effect between these two Boolean arrays.
◗ The resulting Boolean array is assigned to the variable B2. This array will con-
tain True and False values, in effect produced by Boolean operations on the
two arrays which serve as operands.
You can then apply the mask to B itself to get a one-dimensional array of
results in which each element is greater than 2 and less than 7.
>>> print(B[B2])
[3 4 5 6]
In this next example, bitwise OR is used to create a Boolean array from an
“or” operation. That resulting Boolean array is then applied as a mask to B,
and the final result selects all elements of B that are either equal to 1 or greater
than 6.
>>> print(B[ (B == 1) | (B > 6)]) # "OR" operation
[1 7 8 9]
The result of these steps is a Boolean array. For each index number higher
than 2, corresponding to a True element, add that index number to results.
Here is an obvious way to implement this algorithm as a Python function:
def sieve(n):
b_list = [True] * (n + 1)
for i in range(2, n+1):
if b_list[i]:
for j in range(i*i, n+1, i):
b_list[j] = False
primes = [i for i in range(2, n+1) if b_list[i]]
return primes
Can we do better with numpy? Yes. We can improve performance by taking
advantage of slicing and Boolean masking. In keeping with the general flavor
of the algorithm, we use an array of Booleans, indexed from 2 to N–1.
import numpy as np
def np_sieve(n):
# Create B, setting all elements to True.
B = np.ones(n + 1, dtype=np.bool)
B[0:2] = False
for i in range(2, n + 1):
if B[i]:
B[i*i: n+1: i] = False
return np.arange(n + 1)[B]
So where does this implementation of the algorithm manage to do better?
The function still has to loop through the array, one member at a time, look-
ing for each element with the value True. This indicates that the index number
is prime, because its corresponding element in the Boolean array has not been
eliminated yet.
But the inner loop is replaced by a slice operation that sets every element
in the slice to False. Assuming there are many elements, we can perform all
these operations more efficiently with a batch operation rather than a loop.
B[i*i: n+1: i] = False
The other advanced technology used here is a Boolean mask to produce
the final results: a numpy array from 0 to n, inclusive, which after the masking
operation will contain only the prime numbers in that range.
return np.arange(n + 1)[B]
12
for example, if you’re only interested in speed.
import numpy as np
import time
def np_sieve(n):
t1 = time.time() * 1000
B = np.ones(n + 1, dtype=np.bool)
B[0:2] = False
for i in range(2, n + 1):
if B[i]:
B[i*i: n+1: i] = False
P = np.arange(n + 1)[B]
t2 = time.time() * 1000
print('np_sieve took', t2-t1, 'milliseconds.')
You can put in similar lines of timing code to benchmark the non-numpy
version.
What the benchmarks show is that for relatively small numbers, the numpy
version takes more time, and not less, than the other version. But for large N,
especially greater than 1,000, np_sieve starts pulling ahead. Once N gets
greater than 10,000 or so, the numpy version takes half the time the other ver-
sion odes. That may not be the spectacular results we were looking for, but it’s
an increase in speed of 100 percent. Not bad.
Is this section playing fair? Yes. It’s admittedly possible to implement the
non-numpy version, sieve, by using more lists and more list comprehen-
sion. However, we’ve found that such attempts at code enhancement actually
make the function run more slowly. Therefore, for large N, the numpy version
remains the high-speed champ.
Table 12.6 lists the statistical-analysis functions for numpy arrays. Each of
these works by calling the corresponding method for the ndarray class; so
you can use either the function or the method version.
These functions have a series of important arguments, which we’ll cover
later.
12
This statement creates an array of 100,000 elements, each of which is a ran-
dom floating-point value, and it does it in a fraction of a second. Even more
astonishing is the speed with which numpy statistical functions process this
array. The following IDLE session demonstrates how quickly you can get stats
on this large data set.
>>> import numpy as np
>>> import numpy.random as ran
>>> A = ran.random(100000)
>>> np.mean(A)
0.49940282901121
>>> np.sum(A)
49940.282901121005
>>> np.median(A)
0.5005147698475437
>>> np.std(A)
0.2889516828729947
If you try this session yourself, you should experience the response times as
instantaneous, even the standard deviation.
Most of these stats are straightforward in meaning. Because the probability
distribution in this case is a uniform distribution from 0.0 to 1.0, you’d reason-
ably expect the mean to be close to 0.5, which it is: approximately 0.4994. The
sum is exactly 100,000 times that, or about 49,940. The median is not the
same as the mean, although you’d expect it to be close to the center of values,
which it is: just over 0.50.
The standard deviation is what statisticians would predict for a uniform
distribution like this: just under 0.29. So roughly 60 percent of the values fall
within one standard deviation (plus or minus) of the mean.
Using numpy saves you from having to do this calculation yourself, but
it’s useful to review how standard deviation is calculated and what it means.
Assume A and A2 represent arrays, and i refers to elements:
A2 = (i – mean(A)) ^ 2, for all i in A.
std(A) = sqrt(mean(A2))
◗ Figure out the average of the elements in array A. This is also called the mean.
◗ Subtract the mean from each element in A to create a new array full of
“deviations.”
◗ In this array of deviations, square each member, and call the resulting
array A2.
◗ Find the average of all elements in A2, take the square root of the result, and
voila! You have produced the standard deviation of the array you started with.
Although numpy provides the standard-deviation function for free, it’s use-
ful to see what it would take to produce the result through the standard batch
operations. First, getting A2 would be easy enough: Subtracting the mean of
A (a scalar) from A itself gives us an array filled with deviations. All these are
then squared.
A2 = (A – mean(A)) ** 2
Having obtained this new array, we need only get the square root of the
mean of the deviations.
result = (mean(A2)) ** 0.5
Or, in terms of Python code, it requires the np qualifier to call the mean
function:
>>> A2 = (A - np.mean(A)) ** 2
>>> result = (np.mean(A2)) ** 0.5
>>> result
0.2889516828729947
>>> np.std(A)
0.2889516828729947
The results, as you can see, are precisely the same in both cases—calculat-
ing standard deviation “the hard way” and getting it from np.std—which is
good evidence that the numpy routines are following the same algorithm.
It’s instructive to run the standard deviation function on an even larger
array—say, an array of 1 million random numbers—and the equivalent code
in Python, using standard lists.
Now comes the interesting part: If you benchmark this technique with a list
of 1 million elements in size, against a numpy version with an array containing
the same data, the numpy version—getting the standard deviation directly—
beats out the non-numpy version by a factor of more than 100 to 1!
12
def get_std1(ls):
t1 = time.time()
m = sum(ls)/len(ls)
ls2 = [(i - m) ** 2 for i in ls]
sd = (sum(ls2)/len(ls2)) ** .5
t2 = time.time()
print('Python took', t2-t1)
def get_std2(A):
t1 = time.time()
A2 = (A - np.mean(A)) ** 2
result = (np.mean(A2)) ** .5
t2 = time.time()
print('Numpy took', t2-t1)
def get_std3(A):
t1 = time.time()
result = np.std(A)
t2 = time.time()
print('np.std took', t2-t1)
A = ran.rand(1000000)
get_std1(A)
get_std2(A)
get_std3(A)
Running all three gets the following results, expressed in parts of a sec-
ond. Remember this is the time taken to get standard deviation for 1 million
elements.
Python took 0.6885709762573242
Numpy took 0.0189220905303955
np.std took 0.0059509277343750
You can see how enormous the gains in performance are, as we go from
Python lists, to numpy arrays, to finally using numpy to get standard deviation
directly, with a single function call.
The increase in speed from not using numpy at all, compared to using
np.std (the numpy standard deviation function) is well over 100 to 1. Now
that’s greased lightning!
12
As you learned in the previous section, the numpy statistical functions can
be used to study this array as one large source of data. If np.mean is applied
directly to the array, for example, it gets the mean of all 20 elements.
>>> np.mean(A)
7.75
Likewise, we can sum the data or get the standard deviation.
>>> np.sum(A)
93
>>> np.std(A) # standard deviation
4.780603169754489
The fun part comes when we collect statistics along an axis: either the row
or the column dimension. Such operations enable the treatment of a numpy
array as if it were a spreadsheet, containing totals for each row and column.
However, it’s easy to get the axis confused. Table 12.7 should help clarify.
For even higher dimensional arrays, the axis settings can run higher. The
axis settings can even be tuples.
Although it may be confusing at first, the way to approach the word “axis”
is to think of it like a Cartesian coordinate system, as the name suggests. Look
at A again.
[[ 4 13 11 8]
[ 7 14 16 1]
[ 4 1 5 9]]
The argument setting axis=0 refers to the first axis, which means rows
(because row-major order is assumed). Therefore, to sum along axis=0 is to
sum along the traditional X axis. Summing as it goes, the function sums each
column in turn, starting with the lowest-numbered column and moving right.
The result is
[15 28 32 18]
The argument setting axis=1 refers to the second axis, which means col-
umns. Therefore, to sum along axis=1 is to sum along the traditional Y axis.
In this case, the sums start with the lowest-numbered row and move down-
ward. The result is
[36 38 19]
When summation is done along the X axis, the numpy package collects data
on the other dimensions. So, although axis=0 refers to rows, columns are
being summed. Figure 12.3 illustrates how this works.
axis=1
0 1 2 3 4 10
0 1 2 3 4 10
0 1 2 3 4 10
0 1 2 3 4 10
axis=0 0 4 8 12 16
Let’s take another example; this one is easier to see how its effects work.
Let’s start with an array in which each element is equal to its column number.
B = np.fromfunction(lambda r,c: c, (4, 5),
dtype=np.int32)
Printing this array produces
[[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]
[0 1 2 3 4]]
12
array([ 0, 4, 8, 12, 16])
>>> np.sum(B, axis = 1) # col, totaling rows.
array([10, 10, 10, 10])
This is admittedly confusing because axis=0, which should refer to rows,
actually sums all the dimensions except rows (in this case, columns). And
axis=1 actually sums all the dimensions except columns (in this case, rows).
Can we use this data to produce something like a spreadsheet? What we’d
like to do is to total all the rows, for example, and then concatenate the results
onto the array, using the results as an additional column.
Let’s start by, once again, getting both the starting array and the row-by-
two totals.
B = np.fromfunction(lambda r,c:c, (4, 5), dtype=np.int32)
The following statement then glues on the row, along the bottom of B1.
B2 = np.r_[B1, [B_cols]]
Printing B2 prints the following results:
[[ 0 1 2 3 4 10]
[ 0 1 2 3 4 10]
[ 0 1 2 3 4 10]
[ 0 1 2 3 4 10]
[ 0 4 8 12 16 40]]
So there we have it: transformation of an ordinary array into a spreadsheet
display that includes totals of all rows and columns along the bottom and the
right.
The whole procedure can be placed in a function that will operate on any
two-dimensional array.
def spreadsheet(A):
AC = np.sum(A, axis = 1)
A2 = np.c_[A, AC]
AR = np.sum(A2, axis = 0)
return np.r_[A2, [AR] ]
For example, suppose you have the following array:
>>> arr = np.arange(15).reshape(3, 5)
>>> print(arr)
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
Here’s what happens if you use the spreadsheet function and print the
results:
>>> print(spreadsheet(arr))
[[ 0 1 2 3 4 10]
[ 5 6 7 8 9 35]
[ 10 11 12 13 14 60]
[ 15 18 21 24 27 105]]
The spreadsheet function can be altered to print summary statistics for
other operations, such as mean, median, standard deviation (std), and so on.
Chapter 12 Summary
The numpy package supports manipulation and statistical analysis of large
data sets, with abilities that go far beyond those of standard Python arrays.
But this chapter, long though it is, has only begun to explore those abilities.
12
One simple test of performance speed is to add up a large set of numbers.
In the test of adding up all the numbers from 1 to 1 million, the numpy version
of the program beats the ordinary version by a factor of 10 to 1. But when the
benchmark is run on the manipulation of data and not on array creation, the
contrast is much greater still.
The numpy package provides many ways to create a standard numpy array,
called an ndarray, or an “N-dimensional array.” The type is distinguished by
the ease with which you can create arrays of more than one dimension.
This numpy type has built-in support for statistical analysis, including addi-
tion, mean, median, and standard deviation. You can perform these opera-
tions on rows, columns, and slices.
Much of the power of the numpy ndarray type stems from the ability to take
slices of these arrays, either one-dimensional or higher-dimensional, and then
perform sophisticated batch operations on them—that is, doing many calcula-
tions at once. The slicing ability extends smoothly to any number of dimensions.
This numpy type has built-in support for statistical analysis, including
addition, mean, median, and standard deviation. You can perform these oper-
ations on rows, columns, and slices.
In the next chapter, Chapter 13, we’ll explore more advanced capabilities
that are built on top of numpy standard types (ndarray), particularly the abil-
ity to plot mathematical equations.
431
From the Library of Vineeth Babu
numpy.info(numpy.function_name)
For example, the following commands, given from within IDLE, provide
manageable chunks of information on specific numpy functions:
13
import numpy as np
np.info(np.sin)
np.info(np.cos)
np.info(np.power)
Most of the functions listed here are designed to operate on a single numpy
array. A few functions have variations. The numpy power function takes at
least two arguments: X and Y. Either or both can be an array; but if they are
both arrays, they must have the same shape. The effect of the function is to
raise each X value to a power specified by the corresponding element of Y.
For example, the following statements raise each of the elements in array A
to the power of 2 (that is, to square each element).
>>> import numpy as np
>>> A = np.arange(6)
>>> print(A)
[0 1 2 3 4 5]
>>> print(np.power(A, 2))
[ 0 1 4 9 16 25]
Other functions are often used in conjunction with the numpy linspace
function, which in turn is heavily used in plotting equations, as you’ll see in
Section 13.3, “Plotting Lines with ‘numpy’ and ‘matplotlib.’”
For example, the following statements combine the linspace function
with the sin function and the constant, pi, to get a series of 10 values that
reflects the value of the sine function as its inputs increase from 0 to pi and
then decrease back to 0.
>>> A = np.linspace(0, np.pi, 10)
>>> B = np.sin(A, dtype='float16')
>>> print(B)
[0.000e+00 3.420e-01 6.431e-01 8.657e-01 9.849e-01
9.849e-01 8.662e-01 6.431e-01 3.416e-01 9.675e-04]
In this example, the data type float16 was chosen so as to make the num-
bers easier to print. But a still better way to do that is to use some of the for-
matting techniques from Chapter 5. Now the results are easier to interpret.
>>> B = np.sin(A)
>>> print(' '.join(format(x, '5.3f') for x in B))
0.000 0.342 0.643 0.866 0.985 0.985 0.866 0.643 0.342
0.000
This small data sample demonstrates the behavior of the sine function. The
sine of 0 is 0, but as the inputs increase toward pi/2, the results approach 1.0;
then they approach 0 again as the inputs increase toward pi.
Note Ë If neither the pip nor pip3 command worked, check how you spelled
matplotlib. The spelling is tricky. To see the range of commands possible, type
pip help
Ç Note
13
Two of the major functions used for plotting include plt.plot and plt.show.
The syntax shown here is a simplified view; we’ll show more of it later.
ntax
Key Sy
A = np.linspace(0, 2 * np.pi)
plt.plot(A, np.sin(A))
plt.show()
If you’ve downloaded both numpy and matplotlib, and if you enter this
code as shown, your computer should display the window shown in Figure 13.1.
The window remains visible until you close it.
1.00
0.75
0.50
0.25
0.00
–0.25
–0.50
–0.75
–1.00
0 1 2 3 4 5 6
This is simple code, but let’s step through each part of it. The first thing the
example does is import the needed packages.
import numpy as np
import matplotlib.pyplot as plt
Next, the example calls the numpy linspace function. This function,
remember, generates a set of values, including the two specified endpoints, to
get a total of N evenly spaced values. By default, N is 50.
A = np.linspace(0, 2 * np.pi)
Therefore, array A contains floating-point numbers beginning with 0, end-
ing in 2 * pi, and 48 other values evenly spaced between these two values.
The call to the plot function specifies two arrays: A, which contains 50
values along the X axis, and a second array, which contains the sine of each
element in A.
plt.plot(A, np.sin(A))
The function looks at each element in A and matches it with a correspond-
ing value in the second array, to get 50 (x, y) pairs. Finally, the show function
tells the software to display the resulting graph onscreen.
A = np.linspace(0, 2 * np.pi)
plt.plot(A, np.cos(A))
plt.show()
13
In this version, each value in A is matched with its cosine value to create an
(x, y) pair. Figure 13.2 shows the resulting graph.
1.00
0.75
0.50
0.25
0.00
–0.25
–0.50
–0.75
–1.00
0 1 2 3 4 5 6
Starting with the value 0 would be a problem, because then 1/N would
cause division by 0. Instead, let’s start with the value 0.1 and run to values as
high as 10. By default, 50 values are generated.
A = np.linspace(0.1, 10)
Now it’s easy to plot and show the results by using A and 1/A to provide val-
ues for the (x, y) pairs. Each value in A gets matched with its reciprocal.
plt.plot(A, 1/A)
plt.show()
Figure 13.3 shows the results.
10
0 2 4 6 8 6
The function creates points by combining values from A and 1/A. So, for
example, the first (x, y) pair is
(0.1, 10.0)
The second point is formed in the same way, combining the next value in A
with its reciprocal in the second set. Here are some points that could be plotted.
(0.1, 10.0), (0.2, 5.0), (0.3, 3.3)...
A less interesting, but illustrative, example is to plot a handful of points and
connect them. Let’s specify five such values:
(0, 1)
(1, 2)
13
plt.plot([1, 2, 4, 5, 3])
In either case, calling the show function puts the graph on screen, as illus-
trated by Figure 13.4.
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
Note that you don’t necessarily have to use ascending values. You can use
any points to create arbitrary lines. Here’s an example:
plt.plot([3, 4, 1, 5, 2, 3], [4, 1, 3, 3, 1, 4])
The points to be plotted would be
(3, 4), (4, 1), (1, 3), (5, 3), (2, 1), (3, 4)
Those points, in turn, form a pentagram (more or less), as shown in Fig-
ure 13.5. All the points are plotted, and then line segments are drawn between
one point and the next.
4.0
3.5
3.0
2.5
2.0
1.5
1.0
The final example in this section shows that you can graph formulas as
complex as you like. This is the beauty of being able to operate directly on
numpy arrays. It’s easy to graph complex polynomials. Here’s an example:
import numpy as np
import matplotlib.pyplot as plt
A = np.linspace(-15, 20)
plt.plot(A, A ** 3 - (15 * A ** 2) + 25)
plt.show()
These statements, when run, graph a polynomial as shown in Figure 13.6.
2000
-2000
-4000
-6000
–15 –10 –5 0 5 10 15 20
13
A = np.linspace(0, 2 * np.pi)
plt.plot(A, np.sin(A))
plt.plot(A, np.cos(A))
plt.show()
Alternatively, two plot statements could have been combined into one by
placing four arguments in a single statement.
plt.plot(A, np.sin(A), A, np.cos(A))
In either case, Python responds by displaying the graph shown in Figure 13.7.
1.00
0.75
0.50
0.25
0.00
–0.25
–0.50
–0.75
–1.00
0 1 2 3 4 5 6
1.00
0.75
0.50
0.25
0.00
–0.25
–0.50
–0.75
–1.00
0 1 2 3 4 5 6
This formatting creates a dramatic contrast. But we can create even more
of a contrast by specifying a style for the sine curve. The ^ format symbol
specifies that the curve will be made up of tiny triangles.
While we’re at it, let’s bring in another plotting function, title. This sim-
ple function is used to specify a title for the graph before calling the show
function. The xlabel and ylabel functions specify labels for the axes.
13
1.00
0.75
0.50
0.25
Y-AXIS
0.00
–0.25
–0.50
–0.75
–1.00
0 1 2 3 4 5 6
X-AXIS
If you get help on the plt.plot function, it provides you a reference to all
the formatting characters. These characters can be combined in strings such
as 'og', meaning “Use small green circles for this line.”
The characters used to specify colors are listed in Table 13.2.
Remember that you can specify any combination of color and shape. Here’s
an example:
'b^' # Use blue triangles.
Note Ë Yet another technique for differentiating between lines is to use labels,
along with a legend, to show how the information corresponds to lines of par-
ticular color and/or style. This technique is explained in Chapter 15, “Getting
Financial Data off the Internet.”
Ç Note
This is an amazing fact, especially when you consider that it implies that
13
compound growth of .001% on a single dollar must eventually overtake a
steady income of a million a year! This is quite true, by the way, but it would
take lifetimes for the compounding fund to overtake the million-dollar fund.
This dynamic is easy to show with a graph. Start by creating a numpy array,
A, representing values along an axis of time. Set it to measure the passing of
60 years.
A = np.linspace(0, 60, 60)
Then we plot a linear-growth function of $2 a year versus a compound-growth
function using 10 percent a year—which is mathematically equivalent of rais-
ing the number 1.1 to a power, N, where N is the number of years.
2 * A # Formula for increase of $2 a year
1.1 ** A # Formula for growth of 10% a year
We’ll use a format string to specify that the first curve is made of little cir-
cles, for the sake of contrast (“o”).
plt.plot(A, 2 * A, 'o', A, 1.1 ** A)
Alternatively, the two curves could be created by separate statements—
with additional spaces inserted here for clarity’s sake.
plt.plot(A, 2 * A, 'o') # +$2 a year (with circles)
plt.plot(A, 1.1 ** A) # Compound 10% a year
Next, let’s specify some useful labels and finally show the graph.
plt.title('Compounded Interest v. Linear')
plt.xlabel('Years')
plt.ylabel('Value of Funds')
plt.show()
Figure 13.10 displays the results. For the first 30 years, the linear fund ($2
a year) outpaces the compound fund. However, between years 30 and 50, the
accelerating growth of plan B becomes noticeable. Plan B finally equals and
surpasses the linear fund shortly before year 50.
300
250
200
Value of Funds
150
100
50
0 10 20 30 40 50 60
Years
So if you have 50 years to wait, plan B is the better choice. Eventually, plan B
greatly outperforms plan A if you can wait long enough. The compound growth
will reach thousands, even millions, of dollars long before the other plan does.
Note Ë The labels along the X axis start in year 0 and run to year 60. Section
13.12, “Adjusting Axes with ‘xticks’ and ‘yticks’,” shows how these years
could be relabeled—for example, by starting in the year 2020 and running to
the year 2080—without changing any of the underlying data or calculations.
Ç Note
IQ_A = np.array(IQ_list)
13
Graphing this data as a histogram is the easiest step of all, because it
requires only one argument. The hist function produces this chart. Then, as
usual, the show function must be called to actually put the results onscreen.
plt.hist(IQ_A)
plt.show()
Wow, that was easy! There are some additional arguments, but they’re optional.
One of the main reasons for providing the show function, by the way, is so that
the graph can be tweaked in various ways before being displayed onscreen. The
following example creates the histogram, gives it a title, and finally shows it.
plt.hist(IQ_A)
plt.title('IQ Distribution of Development Team.')
plt.show()
Figure 13.11 displays the resulting graph.
0
90 100 110 120 130 140
This graph reveals a good deal of information. It shows that the frequency
of IQ scores increases until 110, at which point it drops off. There’s a bulge
again around 140.
A more complete syntax is shown here.
ntax
Key Sy
8000
6000
4000
13
2000
0
60 80 100 120 140
You might wonder: Can you present this data as a completely smooth line
rather than as a series of bars?
You can. numpy provides a histogram function that enables you to gen-
erate frequency numbers for a series of subranges (bins). The general syntax,
shown here, displays the two most important arguments.
ntax
Key Sy
◗ The first element contains the number of values from A that occur in the first
bin (the first subrange).
◗ The second element contains the number of values from A that occur in the
second bin (the second subrange).
◗ And so on.
The value returned is actually a tuple. The first element of this tuple con-
tains the frequency numbers we want to plot. The second element contains the
exact edges of the bin. So, to get the data we need, use the following syntax:
ntax
Key Sy
plt.histogram(A, bins)[0]
A = np.random.standard_normal(2000000)
A = A * 10 + 100
B = np.histogram(A, 50)[0]
plt.plot(B)
plt.show()
This code specifies no argument for the “X-axis” array; instead, it’s han-
dled by default. The plotting software uses 0, 1, 2 . . . N–1 for the X coordi-
nates, where N is the length of the B array, which contains the result of the
histogram function.
Figure 13.13 shows the results produced by this example.
160000
140000
120000
100000
80000
60000
40000
20000
0 10 20 30 40 50
The resulting figure is a smooth, pleasing curve. The X axis, however, may
not be what you expect. The numbers along the X axis show the bin numbers.
13
That may sound complicated, but it’s only a matter of adding a couple of
lines of code. In this case, X represents the edges of the bins—and then X is
modified to contain the median point of those bins. In this way, the frequen-
cies get plotted against values in the distribution (centered at 100) rather than
bin numbers.
import numpy as np
import matplotlib.pyplot as plt
A = np.random.standard_normal(2000000)
A = A * 10 + 100
B, X = np.histogram(A, 50)
X = (X[1:]+X[:-1])/2 # Use bin centers rather than edges.
plt.plot(X, B) # Plot against values rather than
plt.show() # bin numbers.
The X values are calculated by getting the median value of each subrange—
by taking the bottom and top edges (which are one off in position) and averag-
ing them. The expression X[1:] shifts one position, because it starts with the
second element. The expression X[:-1] excludes the last element to make the
lengths equal.
X = (X[1:]+X[:-1])/2 # Use bin centers rather than edges
If you look at the revised plot of the histogram (Figure 13.14), you can see it
plots values centered at 100 with a standard deviation of 10. A standard devi-
ation of 10 means that roughly 95 percent of the area of the curve should fall
within two deviations (80 to 120), and more than 99 percent of the area should
fall within three standard deviations (70 to 130).
160000
140000
120000
100000
80000
60000
40000
20000
The histogram function has other uses. For example, you can use it to replace
some of the code in Chapter 11, demonstrating the Law of Large Numbers. The
example in that section collected data in a series of bins. The histogram function
does the same thing as the code in that section but does it many times as fast.
Performance You should observe through this chapter and Chapter 12 that many
Tip numpy functions echo actions that can be performed in Chapter 11,
“The Random and Math Packages,” by importing random and math. But the
numpy versions, especially with large data sets (such as the 2,000,000 random
numbers generated for the most recent examples) will be many times as fast.
So when you have a choice, prefer to use numpy, including its random sub-
package, for large numeric operations.
Ç Performance Tip
Some of the other arguments are occasionally useful. To learn about all of
them, you can get help from within IDLE.
>>> np.info(np.histogram)
13
Y = sine(theta)
theta
X = cos(theta)
Each point on the circle corresponds to an angle, which we call theta. For
example, the point on the circle that is 90 degrees counterclockwise from the
starting point has a value of 90 degrees—or rather, the equivalent in radians
(pi / 2). Figure 13.15 shows the bug having traveled about 42 degrees (roughly
equal to 0.733 radians).
At each point on the circle, the X coordinate of the bug’s position is given by
cosine(theta)
Likewise, the Y coordinate of the bug’s position is given by
sine(theta)
By tracking the bug’s journey, we get a set of points corresponding to a trip
around the circle. Each point on this journey corresponds to the following (x, y)
coordinates:
(cosine(theta), sine(theta))
Therefore, to graph a complete circle, we get a set of points corresponding
to many angles on this imaginary trip, from 0 to 2 * pi (equal to 360 degrees).
Then we graph the resulting (x, y) pairs. And we’ll get 1,000 data points to get
a nice, smooth curve.
import numpy as np
import matplotlib.pyplot as plt
1.00
0.75
0.50
0.25
0.00
–0.25
–0.50
–0.75
–1.00
13
The first argument, array_data, is a collection containing a relative size
for each category. The labels argument is a collection of strings that label
the corresponding groups referred to in the first argument. The colors argu-
ment is a collection of strings specifying color, using the values listed earlier in
Table 13.2. And all the collections must have the same length.
This is a simple function to use once you see an example. Suppose you have
data on the off-hours activities of your development team, and you want to see
a chart. Table 13.4 summarizes the data to be charted in this example.
It’s an easy matter to place each column of data into its own list. Each list
has exactly four members in this case.
A_data = [3.7, 2.5, 1.9, 0.5]
A_labels = ['Poker', 'Chess', 'Comic Books', 'Exercise']
A_colors = ['k', 'g', 'r', 'c']
Now we plug these figures in to a pie-chart plot, add a title, and display.
The aspect ratio of the pie chart can be fixed using a plt.axis('equal')
statement, just as we did for the circle; otherwise, the pie will appear as an
ellipse rather than a circle.
import numpy as np
import matplotlib.pyplot as plt
Exercise
Chess
Comic Books
numpy.dot(A, B, out=None)
A and B are two arrays to be combined to form a dot product; the out argu-
ment, if specified, is an array of the correct shape in which to store the results.
The “correct shape” depends on the size of A and B, as explained here.
13
The dot product of two one-dimensional arrays is simple. The two arrays
must have the same length. The action is to multiply each element in A to its
corresponding element in B, and then sum those products, producing a single
scalar value.
D. P. = A[0]*B[0] + A[1]*B[1] + ... + A[N-1] * B[N-1]
Here’s an example:
>>> import numpy as np
>>> A = np.ones(5)
>>> B = np.arange(5)
>>> print(A, B)
[1. 1. 1. 1. 1.] [0 1 2 3 4]
>>> np.dot(A, A)
5.0
>>> np.dot(A, B)
10.0
>>> np.dot(B, B)
30
You should be able to see that the dot product of B with B is equal to 30,
because that product is equal to the sum of the squares of its members:
D. P. = 0*0 + 1*1 + 2*2 + 3*3 + 4*4
= 0 + 1 + 4 + 9 + 16
= 30
We can generalize this:
D. P.(A, A) = sum(A * A)
The dot product between a couple of two-dimensional arrays is more
complex. As with ordinary multiplication between arrays, the shapes must
be compatible. However, they need only match in one of their dimensions.
Here is the general pattern that describes how a dot product works with two-
dimensional arrays:
(A, B) * (B, C) => (A, C)
Consider the following 2 × 3 array, combined with a 3 × 2 array, whose dot
product is a 2 × 2 array.
A = np.arange(6).reshape(2,3)
B = np.arange(6).reshape(3,2)
C = np.dot(A, B)
print(A, B, sep='\n\n')
print('\nDot product:\n', C)
[[0 1 2]
[3 4 5]]
[[0 1]
[2 3]
[4 5]]
Dot product:
[[10 13]
[28 40]]
Here’s the procedure.
◗ Multiply each item in the first row of A by each corresponding item in the first
column of B. Get the sum (10). This becomes C[0,0].
◗ Multiply each item in the first row of A by each corresponding item in the sec-
ond column of B. Get the sum (13). This becomes C[0,1].
◗ Multiply each item in the second row of A by each corresponding item in the
first column of B. Get the sum (28). This becomes C[1,0].
◗ Multiply each item in the second row of A by each corresponding item in the
second column of B. Get the sum (40). This becomes C[1,1].
You can also take the dot product of a one-dimensional array combined
with a two-dimensional array. The result is that the array shapes are evalu-
ated as if they had the following shapes:
(1, X) * (X, Y) => (1, Y)
13
>>> print(np.dot([10, 15, 30], B))
[150, 205]
Can we come up with intuitive, real-world examples that show the useful-
ness of a dot product? They abound in certain kinds of math and physics, such
as three-dimensional geometry. But there are simpler applications. Let’s say
you own a pet shop that sells three kinds of exotic birds. Table 13.5 shows the
prices.
Let’s further suppose that you have tracked sales figures for two months, as
shown in Table 13.6.
What you’d like to do is get the total bird sales for these two months.
Although it’s not difficult to pick out the data, it’s easier to take the dot prod-
uct and let the function np.dot do all the math for you.
Figure 13.18 shows how to obtain the first element in the result: 150. Mul-
tiply each of the sales figures by the corresponding sales figure for the first
month.
0 1
2 3
10 15 30
4 5
0 1
2 3
10 15 30
4 5
numpy.outer(A, B, out=None)
The action of the function is to calculate the outer product of arrays A and
B and return it. The out argument, if included, specifies a destination array in
which to place the results. It must already exist and be of the proper size.
To obtain the outer product, multiply each element of A by each element of
13
B, in turn, to produce a two-dimensional array. In terms of shape, here’s how
we’d express the relationship:
(A) * (B) => (A, B)
Simply put, the outer product contains every combination of A * B, so that
if C is the result, then C[x, y] contains A[x] multiplied by B[y].
Here’s a relatively simple example:
>>> import numpy as np
>>> A = np.array([0, 1, 2])
>>> B = np.array([100, 200, 300, 400])
>>> print(np.outer(A, B))
[[ 0 0 0 0]
[100 200 300 400]
[200 400 600 800]]
In this example, the first element of A is multiplied by each element of B to
produce the first row of the result; that’s why every number in that row is 0,
because 0 multiplied by any value is 0. The second element of A (which is 1) is
multiplied by each element of B to produce the second row, and so on for the
third row.
One obvious use for the outer product is a problem we solved in Chapter 11,
“The Random and Math Packages”: how to create a multiplication table. The
numpy package supports an even simpler solution, and one that is faster in any
case.
>>> A = np.arange(1,10)
>>> print(np.outer(A, A))
Wow, that’s pretty simple code! The result is
[[ 1 2 3 4 5 6 7 8 9]
[ 2 4 6 8 10 12 14 16 18]
[ 3 6 9 12 15 18 21 24 27]
[ 4 8 12 16 20 24 28 32 36]
[ 5 10 15 20 25 30 35 40 45]
[ 6 12 18 24 30 36 42 48 54]
[ 7 14 21 28 35 42 49 56 63]
[ 8 16 24 32 40 48 56 64 72]
[ 9 18 27 36 45 54 63 72 81]]
As in Chapter 11, we can use some string operations to clean up the result,
eliminating the square brackets.
s = str(np.outer(A, A))
s = s.replace('[', '')
s = s.replace(']', '')
print(' ' + s)
We can, if we choose, combine these four statements into two, more com-
pact, statements.
s = str(np.outer(A, A))
print(' ' + s.replace('[', '').replace(']', ''))
Finally, this produces
1 2 3 4 5 6 7 8 9
2 4 6 8 10 12 14 16 18
3 6 9 12 15 18 21 24 27
4 8 12 16 20 24 28 32 36
5 10 15 20 25 30 35 40 45
6 12 18 24 30 36 42 48 54
7 14 21 28 35 42 49 56 63
8 16 24 32 40 48 56 64 72
9 18 27 36 45 54 63 72 81
13
np.tensordot(A, B [,out]) Compute the tensor dot product of A and B.
np.kron(A, B) Compute the Kronecker product of A and B.
np.linalg.det(A) Compute the linear-algebra determinant of A.
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Make data
ua = np.linspace(0, 2 * np.pi, 100)
va = np.linspace(0, np.pi, 100)
X = 10 * np.outer(np.cos(ua), np.sin(va))
Y = 10 * np.outer(np.sin(ua), np.sin(va))
Z = 10 * np.outer(np.ones(np.size(ua)), np.cos(va))
Most of the calculation here involves getting the sine and cosine of angles
as they run from 0 to 2 * np.pi and then multiplying the results by taking
outer products. Finally, a set of three-dimensional points are described by the
three arrays X, Y, and Z, and the software graphs the surface of the sphere
from that. Figure 13.20 shows the resulting graph.
10.0
7.5
5.0
2.5
0
–2.5
–5.0
–7.5
–10.0
10.0
7.5
5.0
2.5
–10.0 –7.5 0
–5.0–2.5 –2.5
0 –5.0
2.5 5.0 –7.5
7.5 10.0 –10.0
13
◗ The amount borrowed is $250,000.
Given this data, we can easily use the numpy pmt function to determine the
monthly payment that will be required.
>>> import numpy as np
>>> np.pmt(0.065 / 12, 20 * 12, 250000)
-1863.93283878775
Therefore, the monthly payment, rounded to the nearest cent, is $1,863.93.
This amount is expressed as a negative number, because it represents the net
cash flow.
We can write a function enabling the user to tweak the interest rate, years,
and amount to determine the monthly payment, as follows.
import numpy as np
def monthly_payment():
'''Calculate monthly payment, after
getting input data and calling np.pmt.'''
# Print results
payment= -1 * np.pmt(rate, nper, pv)
print('The monthly payment is: $%.2f' % payment)
1.0
0.9
Percentage Applied to Principal
0.8
13
0.7
0.6
0.5
0.4
0.3
0.2
0 50 100 150 200 250 300 350
Months
Note Ë The np.arange function generates values up to but not including the
endpoint. That’s why these examples use the endpoints 1.1 and 361 instead of
1.0 and 360.
Ç Note
All that you need to do now is add these two statements (plt.yticks and
plt.xtticks) to the program code in the previous section. Let’s also change
the X-axis label to “Years.”
plt.xlabel('Years')
Now Figure 13.22 produces a nice-looking result.
100%
Percentage Applied to Principal
80%
60%
40%
20%
0
2020 2025 2030 2035 2040 2045 2050
Years
13
array(['To', 'be', 'orNotToBe'], dtype='<U9')
What this example tells us is that Words has been created as an array with
three members, each a string in which the type is U9. Therefore, each element
is exactly large enough to hold Python strings, each of which can hold at most
nine characters.
You can always assign a string value that’s shorter than nine characters. But
if you attempt to assign a longer string, it’s truncated to nine characters in this
case.
>>> Words[0] = 'My uncle is Caesar.'
>>> Words[0]
'My uncle '
However, you can optionally specify a longer maximum length for strings,
such as 20.
>>> Words = np.array(('To', 'be', 'orNotToBe'),
dtype = 'U20')
In this example, the length of 'orNotToBe' would determine the maxi-
mum string length by default, but instead the length was specifically deter-
mined by 'U20'. Now you can assign a longer string to an element without its
being truncated.
>>> Words[0] = 'My uncle is Caesar.'
>>> Words[0]
'My uncle is Caesar.'
Before we leave the topic of strings, note that Un denotes a Unicode string,
which accepts standard Python strings. Sn denotes bytes strings.
Very often, when handling large amounts of information, you’ll want to
store records that combine numeric and string data. To do that with numpy
arrays, you’ll need to create structured arrays, storing a combination of data
types. The dtype field enables you to create such structures by using a name
to identify each field.
ntax
Key Sy
13
[ 33 1 103 -2]
Finally, we can collect all the values in the 'color' field and get a list of
strings:
>>> C = X['color']
>>> print(C)
['Red' 'Blue' 'Yellow' 'Blue']
Any of these columns, by the way, can be changed as a group. For example,
to zero out the entire 'b' column, use the following statement, which alters
the contents of X.
>>> X['b'] = 0
◗ IQ, as an integer
◗ Height, as a floating-point number
◗ Age, an integer
◗ Last performance rating (from 0.0 to 4.0), as floating point
◗ College or university, a Unicode string
Here are the contents. Such a file is often called a comma-separated value
(CSV) file.
101, 70.5, 21, 2.3, Harvard
110, 69.5, 22, 3.1, MIT
130, 76.0, 21, 3.5, Cal Tech
120, 72.5, 29, 3.7, Yale
120, 73.2, 33, 2.9, Harvard
105, 68.0, 35, 3.0, U. of Wash.
107, 74.0, 44, 2.7, Tacoma Comm. College
140, 67.0, 30, 3.1, Oregon State
100, 72.5, 31, 2.0, UCLA
Remember, this data set could be thousands of records long, but we’re using
10 records for the sake of illustration.
The first thing we need to do is create a list of tuples to represent the struc-
ture of each record. The name of each column is included, so that we can refer
to it later.
dt=[('IQ', 'i2'), ('Height', 'f4'), ('Age', 'i2'),
('Rating', 'f4'), ('College', 'U30')]
Any of these fields can be a different size. For example, if you needed inte-
gers larger than 2 bytes in size, you could use 4-byte integers ('i4'). And if
you wanted to store more precise floating-point numbers, you could use 'f8'.
But the cost of having such fields is to cause the array to have a bigger foot-
print in memory.
Let’s stick with the settings we’ve used. The following syntax shows how to
read a text file into a numpy array. There are other arguments, which you can
look up online. Some of those enable you to skip specified rows or columns.
ntax
Key Sy
13
(140, 67. , 30, 3.1, ' Oregon State'),
(100, 72.5, 31, 2. , ' UCLA')],
dtype=[('IQ', 'i2'), ('Height', 'f4'),
('Age', 'i2'), ('Rating', 'f4'),
('College', 'U30')])
There’s at least one quirk in this example. The strings all have a leading
space. That’s because the delimiter was only a comma. There are several ways
to solve this problem. The simplest way is probably to make the delimiter into
a combination comma and space (', ').
team_a = np.loadtxt('team_data.txt', dt, delimiter=', ')
You can now isolate columns as you choose and manipulate them or ana-
lyze the data, as we demonstrated in the previous section.
iq_a = team_a['IQ']
ht_a = team_a['Height']
age_a = team_a['Age']
rat_a = team_a['Rating']
Here’s a printout of the iq_a array, containing all the elements taken from
the IQ field of each row:
[101 110 130 120 120 105 107 140 100]
You can analyze this data by using the numpy statistical functions.
print('Mean IQ of the dev. team is %.2f.' %
np.mean(iq_a))
print('Std. dev. of team\'s IQ is %.2f.' % np.std(iq_a))
These statements, when executed, print the following:
Mean IQ of the dev. team is 114.78.
Std. dev. of team's IQ is 12.95.
One of the interesting things you can do with multiple columns is find the
Pearson correlation coefficient. This measures the relationship of two arrays
of equal length. A positive correlation means that the more you get of A, the
more you get of B. A perfect correlation (1.0) would mean a perfect linear rela-
tionship, in which a 10 percent increase in one quantity is always accompanied
by a 10 percent increase in the other.
Conversely, –1.0 is a perfect negative correlation: the more you have of one,
the less you get of the other.
What is the correlation between height and IQ on the development team?
You can determine that through the following calculation:
>>> np.corrcoef(iq_a, ht_a)[0, 1]
-0.023465749537744517
This result suggests there’s a negative correlation between IQ and height on
this development team, but it’s a tiny one. If the most important thing is to have an
IQ, the shorter guys have an advantage, but very slightly. The correlation is close
to 0.0, showing that the two sets of data—IQ and height—are only mildly related.
Note that the return value of the np.corrcoef function is actually a 2 × 2
array. To convert this to a single figure, use the index [0,1] or [1,0].
You can optionally manipulate an array before writing it back out. For
example, suppose you want to change the performance rating system so that
instead of running from 0.0 to 5.0, it runs from 0.0 to 10.0. You can do that by
multiplying the whole column by 2.
team_a['Rating'] *= 2
You can also append new rows of data at any time by using the np.append
function. Here’s an example:
new_a = np.array((100, 70, 18, 5.5, 'C.C.C.'), dtype=dt)
team_a = np.append(team_a, new_a)
Finally, you can write an array back out to a text file by using the savetxt
function, which has a number of arguments.
ntax
Key Sy
Chapter 13 Summary
The range of what you can do with the numpy package is amazing. These last
two chapters have been devoted to that topic, and we have yet to exhaust it.
This chapter gave an introduction to plotting two-dimensional graphs. The
basic idea is that you use the plot function of the matplotlib package and
pass two numeric arrays of equal length. The plotting software combines each
element in the first array with its corresponding element in the second array,
and from these combinations gets a sequence of (x, y) pairs. These pairs are
13
plotted as points, with the plot function drawing lines to connect them as
smoothly as possible.
But that’s only the beginning. Using other functions, you can create pie
charts, histograms, and other figures. And although the math of geometrical
surfaces in three dimensions is more complex, the basic principles for plotting
apply to creating three-dimensional shapes as well.
This chapter also showed how financial projections and linear-algebra
operations are supported by the numpy package, including the ability to graph
functions. Finally, the chapter ended by showing how to store fixed-length
records in numpy arrays, as well as reading and writing them to text files.
Chapter 15, “Getting Financial Data off the Internet,” will show how to get
financial information from the web, download it, and graph it.
477
From the Library of Vineeth Babu
with a package. The program is always started by running the main module.
After opening the main module (from within IDLE), open the Run menu and
choose Run Module.
There are some general guidelines for importing, which, if you follow,
should minimize your chances of getting into trouble.
◗ Import another source file by using an import statement, just as you would
with packages, without, however, using the .py extension.
◗ You can import functions as well as module-level variables (that is, variables
not local to a function). You should refer to imported variables through their
qualified names. For example, if e is defined in a module my_math.py, then,
from the main module, refer to it as my_math.e.
◗ Note this exception: If you are never, ever going to change the value of a vari-
able, it’s safe to refer to it directly. For example, if pi is a constant, you can
refer to it simply as pi in another module.
◗ Avoid mutual importation. If mod_a.py imports mod_b.py, then mod_b.py
shouldn’t import mod_a.py. Furthermore, any circular “loop” of importa-
tion should be avoided. If A imports B and if B imports C, then C should not
import A.
def print_this(str1):
print('Print this %s.' % str1)
Now run the run_me.py file directly. You don’t need to do anything else
to get the other module to be part of the project, assuming it’s in the same
directory.
The action of the import statement is to run the contents of printstuff.py,
which contains one function definition. The action of the def statement is to
create the function as a callable object.
Note Ë Remember that in an import statement, the .py file extension is always
implied, and it would be a mistake to include it. The rest of the name must
14
obey the standard rules for forming file names.
Ç Note
◗ From within IDLE, choose the New File command from the File menu. IDLE
responds by providing a plain-text, editable window. Enter the following.
(Comments don’t need to be entered, but the rest should be entered exactly as
shown.)
# File run_me.py ----------------------------
def print_this(str1):
print('Print this %s.' % str1)
◗ Save this file as printstuff.py.
◗ Open the Windows menu, and choose run_me.py. (This switches focus back
to the first file, or rather, module.)
◗ Run this file by choosing the Run Module command from the Run menu.
#----------------------------------------------
# File printstuff.py
x, y = 100, 200
14
z = 100
print_z_val()
z = 0
def print_z_val():
print('Value of z is', z)
If you run this program, it says that the value of z is still 0, which makes it
look like the change to z in the main module (run_me.py) was ignored. What
really happened is that the statement z = 100 created a version of z local to
the main module. The problem is corrected as soon as z = 100 in run_me.py
is changed to
foo_vars.z = 100
This statement causes run_me.py to use the version of z defined in
foo_vars.py and nowhere else; therefore, the assignment of 100 to z now
affects foo_vars.z.
Here’s another complete example involving exactly two source files: a main
program module and another module, poly.py.
# File do_areas.py ----------------------------------
def main():
r = float(input('Enter radius:'))
print('Area of circle is', poly.get_circle_area(r))
x = float(input('Enter side:'))
print('Area of square is', poly.get_square_area(x))
main()
def get_circle_area(radius):
pi = 3.141593
return 3.141592 * radius * radius
def get_square_area(side):
return side * side
14
from module_name import sym1, sym2, sym3...
For example, in the example at the end of the previous section, the func-
tion names get_circle_area and get_square_area can be made directly
available by using the from/import syntax. Here is the result.
# File do_areas.py ----------------------------------
def main():
r = float(input('Enter radius:'))
print('Area of circle is', get_circle_area(r))
x = float(input('Enter side:'))
print('Area of square is', get_square_area(x))
main()
def get_circle_area(radius):
pi = 3.141593
return 3.141592 * radius * radius
def get_square_area(side):
return side * side
Note Ë The first time a source file is imported within a given project, Python
executes the code in that file. Remember that executing a def statement causes
Python to create a function at run time as a callable object. Unlike C or C++,
Python does not create a function during a “compilation” phase. Instead, cre-
ation of functions is dynamic and can happen at any time during running of
the program.
That’s why it’s necessary for Python to run the code in the named module;
or rather, it runs all the module-level code, which ideally should perform vari-
able assignments or function definitions (or both). After a module is executed
in this way, it’s not executed again, no matter how many times it’s imported.
The functions that are created, of course, can be called any number of
times. But the initial action of a def statement is only to create the function as
a callable object, and not yet to call it. That distinction is important.
Ç Note
pr_nice('x', x)
pr_nice('y', y)
x = 1000
y = 500
z = 5
14
def pr_nice(s, n):
print('The value of %s is %i.' % (s, n))
print('And z is %i.' % z)
When run, the program run_me.py prints
The value of x is 1000.
And z is 5!
The value of y is 500.
And z is 5!
As this example demonstrates, the import * syntax causes all module-level
symbols defined in module2.py to be accessible from this module.
But do all the symbols in module2.py really need to be visible in run_
me.py? In this case, they don’t. The module-level variable z is used in the
function definition for pr_nice, but it need not be visible to the main module.
When you use the version of import that uses the asterisk (*), Python
allows you to control access through the use of the _ _all_ _ symbol in the
module itself. Here’s how it works.
◗ If the module does not assign a value to the special symbol _ _all_ _, then
the importer of the module sees all the module-level (that is, global) symbols,
exactly as you’d expect.
◗ If the module does assign a value to _ _all_ _, then only the listed symbols are
visible to the importer of the module.
This syntax implies that all the symbolic names listed for this statement are
placed in strings, and generally, this means names in quotation marks. (See
the upcoming example.)
Consider the previous example. The names x, y, and z are all visible to
the importing module, run_me.py, as is the function name, pr_nice. But z
didn’t need to be visible as an import, because it’s visible internally within the
module named module2. Therefore, the module could have been written this
way:
# File module2.py -----------------------------
x = 1000
y = 500
z = 10
Note Ë The effect described in this section—of making names with a leading
underscore (_) more difficult to import—is separate from the name mangling
14
that takes place when you’re attempting to access a private attribute (signified
by leading double underscores) from outside a class.
See Section 9.6, “Public and Private Variables,” for more information on
name mangling.
Ç Note
_name
When a name beginning with an underscore is created in a Python module,
it’s not accessible to another module that imports it using the import * syntax.
from mod_a import *
For example, the following program causes errors if run_me.py is run,
because it assumes that _a and _ _b are accessible, but they’re not.
# File run_me.py -----------------------------
print(_a)
print(_ _b)
print(_a)
print(_ _b)
x = 1
y = 2
print(dir())
The output of the program is
My name is _ _main_ _.
14
obtained by taking its file name without the .py extension. A module’s name
is not changed unless it has become the main module—that is, the first mod-
ule to be run.
There are some important consequences of these rules. First, when a mod-
ule attempts to import the main module, it potentially creates two copies of
every symbol in the main module, because now (in this example) there exists
both _ _main_ _ and mod_a.
The name _ _main_ _ can be useful. Sometimes, you may want to test all
the functions in a module, even though it’s not normally the main module.
In that case, you might run it stand-alone, in which case it would become the
main module.
For this reason, the following code is common in professional Python. It
directs Python to run a module-level statement only if the file is actually serv-
ing as the main module.
# File start_me_up.py -----------------------------
def call_me_first():
print('Hi, there, Python!')
module-level code that will call the functions. That’s the point of testing
_ _name_ _ to see whether it is equal to _ _main_ _.
To put all this another way, this use of _ _main_ _ is an extremely useful
testing tool. It enables you to test differently modules individually. Then,
when the overall program is run, modules will no longer be run independently
but only as called by the main program, as usual.
run_me.py
import mod_a
mod_a.py mod_b.py
def funcA(n):
if n > 0:
mod_b.funcB(n - 1)
def funcB(n):
print(n)
mod_a.funcA(n)
14
This program works, or at least it does until it gets more complex. All the
main module does is import the two modules (mod_a and mod_b) and then
run one of the functions. Although the functions are mutually dependent—
each calls the other—there is an appropriate exit condition, and so the pro-
gram runs fine, producing the following result:
4
3
2
1
0
Wonderful! Except that this code is easy to break. Suppose you add an
innocent-looking statement to mod_a.py, producing the following (with a
new statement shown in bold):
# File mod_a.py ------------------------------------
import mod_b
mod_b.funcB(3)
def funcA(n):
if n > 0:
mod_b.funcB(n - 1)
If you save the change to mod_a.py and rerun the program, starting with
run_me.py, the program fails. The error message states that mod_a “has no
attribute funcA.”
run_me.py
common.py
14
place in a Python source file named rpn_io.py.
Here’s the resulting program from Chapter 8, but now reorganized into
two files.
#File rpn.py --------------------------------------
import re
import operator
from rpn_io import *
sym_tab = { }
scanner = re.Scanner([
(r"[ \t\n]", lambda s, t: None),
(r"-?(\d*)?\.\d+", lambda s, t:
stack.append(float(t))),
(r"-?\d+", lambda s, t: stack.append(int(t))),
(r"[a-zA-Z_][a-zA-Z_0-9]*", lambda s, t:
stack.append(t)),
def assign_op():
'''Assignment Operator function: Pop off a name
and a value, and make a symbol-table entry.
'''
op2, op1 = stack.pop(), stack.pop()
if type(op2) == str: # Source may be another var!
op2 = sym_tab[op2]
sym_tab[op1] = op2
def bin_op(action):
'''Binary Operation evaluator: If an operand is
a variable name, look it up in the symbol table
and replace with the corresponding value, before
being evaluated.
'''
op2, op1 = stack.pop(), stack.pop()
if type(op1) == str:
op1 = sym_tab[op1]
if type(op2) == str:
op2 = sym_tab[op2]
stack.append(action(op1, op2))
def main():
a_list = open_rpn_file()
if not a_list:
print('Bye!')
return
main()
#File rpn_io.py
#------------------------------------------
def open_rpn_file():
'''Open-source-file function. Open a named
14
file and read lines into a list, which is
returned.
'''
while True:
try:
fname = input('Enter RPN source: ')
if not fname:
return None
f = open(fname, 'r')
break
except:
print('File not found. Re-enter.')
a_list = f.readlines()
return a_list
This version of the program is functionally identical to the one in Chapter 8,
which reads RPN scripts from a file and uses a symbol table, sym_tab, to
store variable names created as a result of assignments (=).
For example, this program should be able to read a text file, such as one
containing the following RPN script, run it as a program, and print the result.
a_var 3 =
b_var 5 =
a_var b_var * 1 +
If you try entering this into a text file (let’s call it rpn_junk.txt), run the
program, and enter rpn_junk.txt as the file name when prompted, you
should get the correct result, 16.
Note Ë If you create the RPN source file within IDLE, don’t be surprised if
IDLE adds a .py extension.
Ç Note
Notice the import statement that was added to the main module:
from rpn_io import *
Because the rpn_io module contains only one function, the possibility of
name conflicts is low. But you could import more selectively if you chose.
from rpn_io import open_rpn_file
For the sake of illustration, and because it makes for a better overall design,
we’ll place the code to implement these four directives into the rpn_io file
rather than the main module. This raises an issue. Two of the directives
(INPUT and PRINTVAR) need access to the symbol table (sym_tab) created
in the main module. How does this table get shared with the rpn_io module?
As we showed in Section 14.6, having the two modules import each other is
risky, because it can create interdependencies that cause the program to fail.
The simplest, safest solution is to pass along a reference to sym_tab, the
dictionary that serves as the symbol table.
Just how, you may ask, do you pass a reference? Python always does this
when passing arguments. It’s a vital performance feature. If a function got a
14
a_list = open_rpn_file()
if not a_list:
print('Bye!')
return
Here are the functions to be added to the file rpn_io.py. These functions
carry out the four directives looked for by the main function.
def do_prints(s):
''' Carry out PRINTS directive by printing a
string.
'''
a_str = get_str(s)
print(a_str, end='')
def do_println(s):
''' Carry out PRINTLN directive: print the
optional string argument, if specified, and then
print a newline, unconditionally.
'''
if s:
do_prints(s)
print()
def get_str(s):
''' Helper function for do_prints.
'''
a = s.find("'")
b = s.rfind("'")
if a == -1 or b == -1:
return ''
return s[a+1:b]
14
PRINTS 'Enter side 2: '
INPUT side2
total side1 side1 * side2 side2 * + =
hyp total 0.5 ^ =
PRINTS 'Hypotenuse equals '
PRINTVAR hyp
Suppose that you write and save this RPN, placing it in a file called
rpn_hyp.txt. Here is a sample session resulting from running the main mod-
ule, rpn.py:
Enter RPN source: rpn_hyp.txt
Enter side 1: 30
Enter side 2: 40
Hypotenuse equals 50.0
The first line shown, “Enter RPN source,” is printed by the Python pro-
gram itself. The second, third, and fourth lines are actually printed by the RPN
script—or rather, they are printed during the evaluation of that RPN script.
Note Ë Carefully observe the use of the global statement in the code here. The
failure to use this keyword when needed can create some nasty bugs.
Ç Note
The first thing we need to do is add the program counter, pc, to the list
of global variables at the beginning of the main module. After importing the
necessary packages, as well as rpn_io.py, the source code begins with
sym_tab = { } # Symbol table (for variables)
stack = [] # Stack to hold the values.
pc = -1 # Program Counter
The third line, which is here placed in bold, is the one that needs to be added.
In addition, a few lines need to be added to the main function. These are
placed in bold in the following listing.
def main():
global pc
a_list = open_rpn_file()
if not a_list:
print('Bye!')
14
do_printvar(a_line[9:], sym_tab)
elif a_line.startswith('INPUT'):
do_input(a_line[6:], sym_tab)
elif a_line:
tokens, unknown = scanner.scan(a_line)
if unknown:
print('Unrecognized input:', unknown)
break
except KeyError as e:
print('Unrecognized symbol', e.args[0],
'found in line', pc)
print(a_list[pc])
break
Let’s walk through what each of these additions does. First of all, the
global statement is needed. Without it, Python would assume that the use of
pc in the function was a reference to a local variable. Why? It’s because assign-
ments create variables—remember that there are no variable declarations in
Python! Therefore, Python would have to guess what kind of variable was
being created and, by default, variables are local.
The global statement tells Python not to interpret pc as a local variable, even
if it’s the target of an assignment. Python looks for a global (module-level) ver-
sion of pc and finds it.
Next, pc is set to –1, in case it needs to be set to the initial position. The
action of the program is to increment pc as each line is read, and we want it to
be 0 after the first line is read.
pc = -1
The next few lines increment pc as mentioned. Should the value then be so
high as to be out of range for the list of strings, the program exits; this means
we’re done!
while True:
pc += 1
if pc >= len(a_list):
break
Finally, code is added to the very end of the main function to catch the
KeyError exception and report a useful error message if this exception is
raised. Then the program terminates.
except KeyError as e:
print('Unrecognized symbol', e.args[0],
'found in line', pc)
print(a_list[pc])
break
With these changes made, errors in writing variable names trigger more
intelligent error reporting. For example, if the variable side1 was never prop-
erly created (let’s say that the user had entered side11 or side 1 earlier on),
the interpreter now prints a useful message:
Unrecognized symbol side1 found in line 4
total side1 side1 * side2 side2 * + =
This message, should it happen, ought to tell you there is a problem with
the creation of side1 or side2.
conditional_expr line_num ?
If the conditional_expr is any value other than zero, the program
counter, pc, is set to the value of line_num. Otherwise, do nothing.
14
f2 f1 f2 + =
f1 temp =
PRINTVAR f2
n n 1 - =
n 4 ?
Do you see what this does? We’ll return to that question later.
But look at the last line (n 4 ?). To understand what this does, remember
that our program counter is designed to be zero-based. It didn't have to be,
but that simplified some of the programming. Because the program counter is
zero based, the last line—assuming n is not zero—causes a jump back to the
fifth line (temp f2 =). This forms a loop that continues until n is zero.
As we promised, the jump-if-not-zero operator, ?, is easy to implement. Just
add one line to the Scanner code and one short function. Here is the revised
Scanner code, with the new line to be entered in bold.
scanner = re.Scanner([
(r"[ \t\n]", lambda s, t: None),
(r"-?(\d*)?\.\d+", lambda s, t:
stack.append(float(t))),
(r"-?\d+", lambda s, t: stack.append(int(t))),
(r"[a-zA-Z_][a-zA-Z_0-9]*", lambda s, t:
stack.append(t)),
(r"[+]", lambda s, t: bin_op(operator.add)),
(r"[-]", lambda s, t: bin_op(operator.sub)),
(r"[*]", lambda s, t: bin_op(operator.mul)),
(r"[/]", lambda s, t: bin_op(operator.truediv)),
(r"[\^]", lambda s, t: bin_op(operator.pow)),
14
stack.append(float(t))),
(r"-?\d+", lambda s, t: stack.append(int(t))),
(r"[a-zA-Z_][a-zA-Z_0-9]*", lambda s, t:
stack.append(t)),
(r"[+]", lambda s, t: bin_op(operator.add)),
(r"[-]", lambda s, t: bin_op(operator.sub)),
(r"[*]", lambda s, t: bin_op(operator.mul)),
(r"[/]", lambda s, t: bin_op(operator.truediv)),
(r"[>]", lambda s, t: bin_op(operator.gt)),
(r"[\^]", lambda s, t: bin_op(operator.pow)),
(r"[=]", lambda s, t: assign_op()),
(r"[?]", lambda s, t: jnz_op())
])
That’s it! It may occur to you that adding new operators, as long as they
are standard arithmetic or comparison operators, is so trivial, we should add
them all.
That would be correct, except that if you’re depending on single punctua-
tion marks to represent different operations, you’ll soon run out of symbols on
the keyboard. The problem is potentially solvable by making “LE”, for exam-
ple, stand for “less than or equal to,” but if you use that approach, you need to
rethink how the scanner analyzes tokens.
Armed with this one additional operator, it’s now possible to make the
Fibonacci script more reliable. Just look at the revised script.
PRINTS 'Enter number of fibos to print: '
INPUT n
f1 0 =
f2 1 =
temp f2 =
f2 f1 f2 + =
f1 temp =
PRINTVAR f2
n n 1 - =
n 0 > 4 ?
The last line now says the following: If n is greater than 0, then jump to
(zero-based) line 4. This improves the script, because if the user enters a nega-
tive number, the RPN program doesn’t go into an infinite loop.
Finally—although this is not necessary for most scripts—let’s add an oper-
ation that gets a random integer in a specified range.
ntax
Key Sy
op1 op2 !
The action of this RPN expression is to call random.randint, passing op1
and op2 as the begin and end arguments, respectively. The random integer
produced in this range is then pushed on the stack.
Adding support for this expression is also easy. However, it involves import-
ing another package. The code will be easy to write if we can refer to it directly.
Therefore, let’s import it this way:
from random import randint
Now we need only add a line to add support for randomization. Again,
here is the revised scanner, with the line to be added in bold.
scanner = re.Scanner([
(r"[ \t\n]", lambda s, t: None),
(r"-?(\d*)?\.\d+", lambda s, t:
stack.append(float(t))),
(r"-?\d+", lambda s, t: stack.append(int(t))),
(r"[a-zA-Z_][a-zA-Z_0-9]*", lambda s, t:
stack.append(t)),
(r"[+]", lambda s, t: bin_op(operator.add)),
(r"[-]", lambda s, t: bin_op(operator.sub)),
(r"[*]", lambda s, t: bin_op(operator.mul)),
(r"[/]", lambda s, t: bin_op(operator.truediv)),
(r"[>]", lambda s, t: bin_op(operator.gt)),
(r"[!]", lambda s, t: bin_op(randint)),
(r"[\^]", lambda s, t: bin_op(operator.pow)),
(r"[=]", lambda s, t: assign_op()),
(r"[?]", lambda s, t: jnz_op())
])
14
1 01 ?
PRINTS 'Too low! Try again. '
1 01 ?
PRINTS 'Play again? (1 = yes, 0 = no): '
INPUT ans
ans 00 ?
This script is probably still difficult to follow, so it might help you to think
of it with virtual line numbers placed in for the sake of illustration. These line
numbers are imaginary; you can’t actually put them in the file at this point!
However, you might want to write them down on a piece of paper as you’re
programming.
00: n 1 50 ! =
01: PRINTS 'Enter your guess: '
02: INPUT ans
03: ans n > 07 ?
04: n ans > 09 ?
05: PRINTS 'Congrats! You got it! '
06: 1 011 ?
07: PRINTS 'Too high! Try again. '
08: 1 01 ?
09: PRINTS 'Too low! Try again. '
10: 1 01 ?
11: PRINTS 'Play again? (1 = yes, 0 = no): '
12: INPUT ans
13: ans 00 ?
This script takes advantage of a coding trick. If the jump-if-not-zero opera-
tion is given a constant, nonzero value, it amounts to an unconditional jump.
re
operator
random
rpn.py rpn_io.py
pc Functions are
stack[] passed back a
sym_tab{} ref. to sym_tab
import re
import operator
from random import randint
from rpn_io import *
14
# Scanner: Add items to recognize variable names, which
# are stored in the symbol table, and to perform
# assignments, which enter values into the sym. table.
scanner = re.Scanner([
(r"[ \t\n]", lambda s, t: None),
(r"-?(\d*)?\.\d+", lambda s, t:
stack.append(float(t))),
(r"-?\d+", lambda s, t: stack.append(int(t))),
(r"[a-zA-Z_][a-zA-Z_0-9]*", lambda s, t:
stack.append(t)),
(r"[+]", lambda s, t: bin_op(operator.add)),
(r"[-]", lambda s, t: bin_op(operator.sub)),
(r"[*]", lambda s, t: bin_op(operator.mul)),
(r"[/]", lambda s, t: bin_op(operator.truediv)),
(r"[>]", lambda s, t: bin_op(operator.gt)),
(r"[!]", lambda s, t: bin_op(randint)),
(r"[\^]", lambda s, t: bin_op(operator.pow)),
(r"[=]", lambda s, t: assign_op()),
(r"[?]", lambda s, t: jnz_op())
])
def jnz_op():
''' Jump on Not Zero operation.
After evaluating the operands, test the first op;
if not zero, set Program Counter to op2 - 1.
'''
global pc
op2, op1 = stack.pop(), stack.pop()
if type(op1) == str:
op1 = sym_tab[op1]
if type(op2) == str:
op2 = sym_tab[op2]
if op1:
pc = int(op2) - 1 # Convert op to int format.
def assign_op():
'''Assignment Operator function.
Pop off a name and a value, and make a symbol-table
entry.
'''
op2, op1 = stack.pop(), stack.pop()
if type(op2) == str: # Source may be another var!
op2 = sym_tab[op2]
sym_tab[op1] = op2
def bin_op(action):
'''Binary Operation function.
If an operand is a variable name, look it up in
the symbol table and replace with the corresponding
value, before being evaluated.
'''
op2, op1 = stack.pop(), stack.pop()
if type(op1) == str:
op1 = sym_tab[op1]
if type(op2) == str:
op2 = sym_tab[op2]
stack.append(action(op1, op2))
def main():
'''Main function.
This is the function that drives the
program. After opening the file and getting operations
into a_list, process strings in a_list one at a time.
'''
global pc
dir('_ _main_ _')
a_list = open_rpn_file()
14
do_println(a_line[8:])
elif a_line.startswith('PRINTVAR'):
do_printvar(a_line[9:], sym_tab)
elif a_line.startswith('INPUT'):
do_input(a_line[6:], sym_tab)
elif a_line:
tokens, unknown = scanner.scan(a_line)
if unknown:
print('Unrecognized input:', unknown)
break
except KeyError as e:
print('Unrecognized symbol', e.args[0],
'found in line', pc)
print(a_list[pc])
break
main()
When this source file is run, it starts the main function, which controls the
overall operation of the program. First, it calls the open_rpn_file function,
located in the file rpn_io.py.
Because this file is not large and there are relatively few functions, the
import * syntax is used here to make all symbolic names in rpn_io.py
directly available.
#File rpn_io.py
------------------------------------------
def open_rpn_file():
'''Open-source-file function.
Open a named file and read lines into a list,
which is then returned.
'''
while True:
try:
fname = input('Enter RPN source: ')
if not fname:
return None
f = open(fname, 'r')
break
except:
print('File not found. Re-enter.')
a_list = f.readlines()
return a_list
def do_prints(s):
'''Print string function.
Print string argument s, without adding a newline.
'''
a_str = get_str(s)
print(a_str, end='')
def do_println(s=''):
'''Print Line function.
Print an (optional) string and then add a newline.
'''
if s:
do_prints(s)
print()
def get_str(s):
'''Get String helper function.
Get the quoted portion of a string by getting text
from the first quote mark to the last quote mark. If
these aren't present, return an empty string.
'''
14
def do_input(s, sym_tab):
'''Get Input function.
Get input from the end user and place it in
the named variable, using a reference to the
symbol table (sym_tab) passed in as a reference.
'''
wrd = input()
if '.' in wrd:
sym_tab[s] = float(wrd)
else:
sym_tab[s] = int(wrd)
Chapter 14 Summary
In this chapter, we explored various ways of using the import statement in
Python, to create multiple-module projects that can involve any number of
source files.
Using multiple modules in Python does not work in quite the way it works
in other languages. In particular, importing in Python is safer of it’s unidirec-
tional, meaning that A.py can import B.py, but if so, B should not import A.
You can get away with A and B importing each other, but only if you know
what you’re doing and are careful not to create mutual dependencies.
Likewise, you should show some care in importing module-level variables
from another module. These are best referred to by their qualified names, as
in mod_a.x and mod_a.y. Otherwise, any assignment to such a variable, out-
side the module in which it is created, will cause the creation of a new variable
that is “local” to the module in which it appears.
Finally, this chapter completed the programming code for the RPN inter-
preter application that has been developed throughout this book. This chapter
added the question mark (?) as a jump-if-not-zero operation, a comparison
(>), and the exclamation mark (!) as a random-number generator. Adding
these operations greatly expanded the extent of what a script written in RPN
can do.
But those additions are far from final. There are many other important fea-
tures you might want to support, such as line labels and better error checking.
These are left as exercises at the end of the chapter.
14
Does addition (+) replace OR as well? For the most part, it does; however,
the result is sometimes 2 rather than 1 or 0. Can you then create a logical
NOT operator that takes an input of 0 and produces 1, but takes any positive
number and produces 0? What we’re really asking here is, Can you think of a
couple of arithmetic operations that, when put together, do the same thing as
logical OR?
5 The biggest piece still missing from the RPN script language is support for
line labels. These are not exceptionally difficult to add, but they are not triv-
ial, either. Any line that begins with label: should be interpreted as labeling
a line of code. To smoothly implement this feature, you should do a couple
of passes. The first pass should set up a “code table,” excluding blank lines
and compiling a second symbol table that stores labels along with each label’s
value; that value should be an index into the code table. For example, 0 would
indicate the first line.
6 The error checking in this application can be further improved. For example,
can you add error checking that reports a syntax error if there are too many
operators? (Hint: What would be the state of the stack in that case?)
517
From the Library of Vineeth Babu
'''
# pip install pandas_datareader
import pandas_datareader.data as web
def load_stock(ticker_str):
''' Load stock function.
15
Given a string, ticker_str, load information
for the indicated stock, such as 'MSFT,' into a Pandas
data frame (df) and return it.
'''
df = web.DataReader(ticker_str, 'yahoo')
df = df.reset_index()
return df
15
15.4 Producing a Simple Stock Chart
The next step in creating a stock-market application is to plot the data—
although this section will do it in a minimal way, not putting up legends, titles,
or other information at first.
Here are the contents of version 1 of the second module, stock_plot.
'''File stock_plot_v1.py ---------------------------
def do_plot(stock_df):
''' Do Plot function.
Use stock_df, a stock data frame read from the web.
'''
column = stock_df.Close # Extract price.
15
plt.title(title_str)
The argument title_str contains text to be placed at the top of the graph
when it’s shown.
Displaying a legend is a two-part operation:
◗ First, when you call the plt.plot method to plot a specific line, pass a named
argument called label. In this argument, pass the text to be printed for the
corresponding line.
◗ Before you call the plt.show function, call plt.legend (no arguments).
We show how this is done by making changes to the do_plot function and
then showing the results. First, here’s the new version of do_plot, with new
and altered lines in bold.
def do_plot(stock_df):
''' Do Plot function.
Use stock_df, a stock data frame read from web.
'''
column = stock_df.Close # Extract price.
column = np.array(column, dtype='float')
plt.plot(stock_df.Date, column,label = 'closing price')
plt.legend()
plt.title('MSFT Stock Price')
plt.show() # Show the plot.
Because this information is specific to Microsoft, let’s not graph the Apple
price information yet. In Section 15.8, we’ll show how to graph the two stock
prices side by side.
If you make the changes shown and then rerun the application, a graphical
display is printed, as shown in Figure 15.2.
15
Figure 15.2. Microsoft stock with title and legend
◗ field, the name of the column (or attribute) you wish to graph
◗ my_str, the name to be placed in the legend, which describes what the partic-
ular line in the chart corresponds to, such as “MSFT” in Figure 15.3
This section, and the ones that follow, show how the do-plot functions
call makeplot and pass along the needed information.
Here’s the definition of makeplot itself. There are some things it doesn’t
do, such as call plt.plot or set the title, for good reasons, as you’ll see. But it
does everything else.
def makeplot(stock_df, field, my_str):
column = getattr(stock_df, field)
column = np.array(column, dtype='float')
plt.plot(stock_df.Date, column, label=my_str)
plt.legend()
Let’s review each of these statements. The first statement inside the defi-
nition causes the specified column to be selected from the data frame, using
a named attribute accessed by the built-in getattr function. The attribute,
such as Close, needs to be passed in as a string by the caller.
The second statement inside the definition converts information stored in
a pandas data frame into numpy format. The third statement does the actual
plotting, using my_str, a string used for the legend, which is added to the plot.
But makeplot does not call plt.show, because that function should not be
called until all the other plots have been put in the desired graph.
With makeplot defined, the rest of the code becomes shorter. For exam-
ple, with makeplot available, the do_plot function in the last section can be
revised as
def do_plot(stock_df, name_str):
makeplot(stock_df, 'Close', 'closing price')
plt.title(name_str + ' Stock Price')
plt.show()
After calling makeplot, all this function has to do is to put up a title—
which we have left as a flexible action—and then call plt.show. The second
argument to makeplot selects the column to be accessed, and the third argu-
ment ('closing price') is a string to be placed in the legend.
15
For each stock, a separate call is made to load_stock so that we don’t have
to alter the first module, stock_load.py. Both data frames are then handed
to the do_duo_plot so that the two stocks are plotted together, along with a
legend that includes both labels.
def do_duo_plot(stock1_df, stock2_df):
'''Revised Do Plot function.
Take two stock data frames this time.
Graph both.
'''
makeplot(stock1_df, 'Close', 'MSFT')
makeplot(stock2_df, 'Close', 'AAPL')
Note how the built-in getattr function is used to take a string and access
the column to be displayed. This function was introduced in Section 9.12,
“Setting and Getting Attributes Dynamically.” Here this technique is a major
coding convenience.
Figure 15.3 displays the result of the do_duo_plot function.
If you look closely at the code, you should see a flaw. “MSFT” and “AAPL”
are hard-coded. That’s fine when Microsoft and Apple are the two stocks you
want to track. But what if you want to look at others—say, “IBM” and “DIS”
(Walt Disney Co.)?
A good design goal is to create flexible functions; you should avoid hard-
coding them so that you don’t have to revise the code very much to accommo-
date new values.
Therefore, for this listing of the latest version of the stock_plot module—
which we’ll call version 2—we’ve revised the code so that the do_duo_plot
function prints the appropriate labels and title, depending on the stocks
passed to it.
15
plt.show()
There is a caveat to charting stocks this way: Without the help of color
printing or color displays, it may not be easy to differentiate between the lines.
Hopefully, even in this book (the printed version) the differences between the
lines should show up in contrasting shading. But if this isn’t satisfactory, one
approach you might experiment with is using different styles for the two lines,
as described in Chapter 13.
15
highs and the daily lows for a given stock.
The following code listing—we’ll call this version 3—produces a combined
high/low graph for a stock. New and altered lines, as usual, are shown in bold.
'''File stock_plot_v3.py
---------------------------------
What else can we do with the data? Another useful piece of information is
volume: the number of shares sold on any given day. Here’s another plotting
function, with new lines in bold.
15
The problem with this graph, as you can see, is that it gives X-axis labels in
months/years/date rather than only month/year, with the result that the date
and time information is crowded together.
But there’s an easy solution. Use the mouse to grab the side of the chart’s
frame and then widen it. As you do so, room is made along the X axis so that
you can see the date and time figures nicely, as shown in Figure 15.8.
With this graph, there’s even more you can do. Within this time period,
there is a day that Microsoft stock had its highest volume of sales. By moving
the mouse pointer to the apex of the line, you can see that this high volume
occurred in late December, and that the number of shares traded that day was
more than 110,000,000: a hundred and ten million shares, worth more than
eleven billion dollars. As Bill would say, that would buy a lot of cheeseburgers.
plt.show()
15
◗ Start with a date having at least 180 preceding data points. You can start ear-
lier if you want, in which case you should use as many preceding dates as are
available.
◗ Average the closing price of all 180 dates preceding this one. This becomes the
first point in the moving-average line.
◗ Repeat steps 1 and 2 for the next day: You now get a price that is the average of
the 180 prices preceding the second day.
◗ Keep repeating these steps so that you produce a line charting average prices,
each data point in this line averaging the prices of the 180 previous days.
15
def makeplot(stock_df, field, my_str, avg=0):
column = getattr(stock_df, field)
if avg: # Only work if avg is not 0!
column = column.rolling(avg, min_periods=1).mean()
column = np.array(column, dtype='float')
plt.plot(stock_df.Date, column, label=my_str)
plt.legend()
Notice that this gives makeplot an additional argument, avg, but that
argument, in effect, is optional; it has a default value of 0.
Now let’s graph both a stock and its 180-day moving average, correspond-
ing roughly to six months. New and altered lines, as usual, are in bold.
def do_movingavg_plot(stock_df, name_str):
''' Do Moving-Average Plot function.
Plot price along with 180-day moving average line.
'''
makeplot(stock_df,'Close', 'closing price')
makeplot(stock_df,'Close', '180 day average', 180)
plt.title(name_str + ' Stock Price')
plt.show()
Figure 15.10 shows the resulting graph—assuming AAPL (Apple) is the
selected stock—containing the stock price as well as the 180-day moving-
average line for those prices. The smoother line is the moving average.
Wow, it works!
And it’s clear how to tweak this example. The following statement replaces
180 with 360, thereby doubling the period of averaging so that for any given
day, the previous 360 days’ prices are averaged in to produce the moving-average
line rather than 180.
makeplot(stock_df,'Close', '360 day average', 360)
import numpy as np
import matplotlib.pyplot as plt
from stock_load import load_stock
15
def do_simple_plot(stock_df, name_str):
''' Do Plot function.
Plot a simple graph of closing price.
'''
makeplot(stock_df,'Close', 'closing price')
plt.title(name_str + ' Stock Price')
plt.show()
15
'3. Plot of Price with Volume Subplot\n' +
'4. Price plus Moving Average\n')
if n < 0 or n > 4:
n = 0
if n == 0:
return
fn = [do_simple_plot, do_highlow_plot,
do_volume_subplot, do_movingavg_plot][n-1]
fn(stock_df, s)
except:
print('Couldn\'t find stock. Re-try.')
main()
One of the techniques employed here is the use of an open parentheses to
enable line continuation, helping create the multiline string, menu_str.
Another technique is to index into a list of function names (callables) and
then call the appropriate command. There are other ways to achieve the same
effect—the obvious one being to use a series of if/elif statements—but the
technique used here is compact and efficient.
An interesting aspect of this module is that it calls the load_stock
function, even though that function is not defined in either this module
or the module it imports directly, stock_plot_v4.py. But that module
imports stock_load.py. Consequently, stock_demo imports stock_load
indirectly.
There are many improvements that might still be made, but these are left
for the suggested problems at the end of the chapter.
Chapter 15 Summary
We’ve come a long way in this book. In Chapter 1, our Python programs
showcased a few statements that printed some values. Then, still in the first
chapter, we used Python to print sophisticated sequences such as Fibonacci
numbers. But in this chapter and the ones leading up to it, we added Python
objects such as lists, matrixes, and sophisticated data frames.
The ability to grab information off the Internet, load it into data frames
and arrays, and finally plot it is amazing. Graphical programming is among
the most difficult challenges you can master, but thanks to its packages, Python
reduces the problem to only a few lines of code. Even three-dimensional
graphics were touched on in Chapter 13.
So if we ask the question “Python, what is it good for?,” the answer is “Any-
thing, as long as someone wrote a package for it!”
There are some other things to be said for the language. The most common
things you’d want to do in most programs—get user input and then break
up input into words, phrases, and numbers—are extremely well supported in
Python. And the regular-expression package takes these abilities to a higher
level.
Python’s built-in abilities with lists, strings, dictionaries, and sets are so
strong that one of the biggest challenges is just learning about all the options
you have. You can go for a long time without realizing that collections are
15
4 In a stock-market chart, why is it important to print a legend? (Note: We’re
not talking about Hercules, King Arthur, or Batman.)
5 How do you restrict the duration of a pandas data frame so that it covers less
than a year’s time?
6 What is a 180-day moving average?
7 Did the final example in this chapter use “indirect” importing? If so, how,
exactly?
3 Yet another desirable feature would be to enable the user to enter any number
of stock-market ticker symbols (assuming they are all valid) and then graph
all the stocks referred to. (Hint: To implement this feature, you might create a
list of such symbols and pass it to one or more for loops that would get a data
frame for each stock and then pass all that data to a function that plotted each
and every stock, before showing the entire graph.)
a. The val may be almost any kind of value; Python will apply the operation bool() to convert before applying to conditionals—
within, say, if or while. See notes that follow.
547
From the Library of Vineeth Babu
Other notes:
1 Where operators are at the same level of precedence, they are evaluated left to
right.
2 Parentheses override precedence rules.
3 The special symbol = (not to be confused with ==, which is test for equality)
is part of assignment-statement syntax and is not an operator.
4 With combined assignment-operator symbols (+=, *=, /=, etc.), the entire
expression on the right is evaluated and then the assignment is carried out,
regardless of precedence. For example, if x starts out as 12, then the statement
x /= 3 + 9 sets x to 1, but the statement x = x / 3 + 9 sets x to 13.
5 Assignment-operator symbols include +=, –=, *=, /=, //=, **=, <<=, >>=,
&=, ^=, |=. In each case, x op= y is equivalent to x = x op y; but note 4
applies.
6 As mentioned in Chapter 4, the Boolean operators apply short-circuit logic. If
the first operand is true, the operator and returns the second operand. If the
first operand is false, the operator or returns the second operand. Otherwise,
the first operand is returned without evaluating the second operand.
7 To determine whether a value behaves as true or false, Python applies a Bool-
ean conversion, bool(). For numeric values, zero is “false.” For collections,
an empty string or collection is “false.” For most types, None is “false.” In
all other cases, the value behaves as if “true.” (Comparisons, such as n > 1,
always return True or False, which are fixed values.)
By combining the last two rules, you should be able to see why Python
responds as follows:
>>>print(None and 100)
None
>>>print(None or 100)
100
>>>print(not(''))
True
8 &, –, ^, and | have specialized uses with set objects, as intersection, difference,
symmetric difference, and union, respectively.
549
From the Library of Vineeth Babu
abs(x)
Returns the absolute value of the numeric argument x. The result is non-
negative; so negative arguments are multiplied by –1. In the case of a complex
argument, the function returns the length of the real or imaginary vector as a
non-negative, real-number result. The function uses the Pythagorean Theo-
rem to find this value. For example:
>>> c = 3+4j
>>> abs(c)
5.0
In fact, this combination of features—using abs on a complex number—is
a convenient shortcut for invoking the Pythagorean Theorem!
all(iterable)
Returns True if all the elements generated by iterable are “true”—that is,
they produce True after a bool conversion is applied. In general, non-zero
values and non-empty collections evaluate as true. For example:
>>> all([1, 2, 4])
True
>>> all([1, 2, 0])
False
any(iterable)
Returns True if at least one of the elements in iterable is “true,” after a
bool conversion is applied to each item. Remember that non-zero values and
non-empty collections evaluate as “true.” For example:
>>> any([0, 2, 0])
True
>>> any([0, 0, 0])
False
ascii(obj)
Produces an ASCII-only representation of the object, obj, returning it as a
string. If non-ASCII characters are found in the output string, they are trans-
lated into escape sequences.
bin(n)
Binary-radix conversion. Returns a string containing a binary-radix repre-
sentation of integer n, including the 0b prefix. Inputs other than integer will
cause Python to raise a TypeError exception.
For example:
>>> bin(7)
'0b111'
bool(obj)
Boolean conversion. This is an important conversion because it’s implic-
itly called as needed by if and while statements, as well as by the filter
function.
This function returns either True or False, depending on the value of
obj. Each class can determine how to evaluate this function by implementing
a _ _bool_ _ method; but otherwise, the default behavior is to return True.
Python classes tend to observe the following general guidelines. An object
usually converts to True if it contains any of the following:
◗ A non-zero numeric value. (For complex numbers, this is any value other than
0+0j.)
◗ Non-empty collections, including lists and strings that are not empty.
For example:
>>> class Blah():
pass
>>> b = Blah()
>>> b.egg = 0
>>> bool(b)
True
>>> bool(b.egg)
False
In this case, the object b automatically converted to True, because no
_ _bool_ _ method was defined for the class, but b.egg is equal to 0 and
therefore False.
bytes(source, encoding)
Byte-string conversion function. Converts a source, typically a string, into
a bytes string, in which each element has a byte value between 0 and 255,
inclusive, and is stored in a single byte. In Python 3.0, it’s typical for Python
strings to use Unicode or UTF-8 representation, so that regular strings may
occupy more than one byte per character. Therefore, ordinary strings need
to be converted to produce bytes strings, if they need to contain one byte per
character.
For example:
>>> bs = bytes('Hi there!', encoding='utf-8')
>>> bs
b'Hi there!'
callable(obj)
Returns True if the object obj can be called as a function. A true value indi-
cates one or more of the following: obj is the name of a function, is created by
a function definition, or is an instance of a class that implements _ _call_ _.
chr(n)
Returns the one-character string in which that character has Unicode value n.
The lower part of this range includes ASCII characters. This is the inverse of
the ord function. The domain of this function is 0x10FFFF. Values of n out-
side that domain cause a ValueError to be raised.
For example:
>>> chr(97)
'a'
complex(real=0, imag=0)
Complex-number conversion. Both real and imag arguments are optional,
and each has a default value of 0. Given numeric input (as opposed to string
input, which is explained in the next entry), the function responds by doing
the following:
These simple rules encompass everything the conversion does. For exam-
ple, a zero-value complex number is returned by default:
>>> complex()
0j
You can also provide an argument to the real portion only, creating a value
that includes 0j to indicate its status as a complex rather than a real number.
>>> complex(5.5)
(5.5+0j)
The most common way to use this conversion is to use a real number or
integer for each argument.
>>> complex(3, -2.1)
(3-2.1j)
You can also specify a complex argument for the first argument only, and
the conversion will simply return the number as is.
>>> complex(10-3.5j)
(10-3.5j)
A quirk of this function is that you can specify a complex number for both
arguments! The rules outlined earlier still apply. In the following example,
the imag argument is 5j. This is multiplied by 1j, as usual, and the result is
5j * 1j = -5. That value is then added to the real argument, 1, to produce –4.
>>> complex(1, 5j)
(-4+0j)
complex(complex_str)
The complex-number conversion accepts a lone string argument of the form
'a+bj'—but it must have no internal spaces, and no second argument is
allowed. Optional parentheses are accepted around a+bj. For example:
>>> complex('10.2+5j')
(10.2+5j)
>>> complex('(10.2+5j)')
(10.2+5j)
This function accepts strings that are valid complex numbers even if such
numbers have only a real or only an imaginary portion. 0j will always be pres-
ent in the result even if there is no non-zero imaginary portion. For example:
>>> complex('2')
(2+0j)
>>> complex('3j')
3j
Note Ë It’s possible to produce values that use -0j instead of the usual +0j. For
example, complex(1, -1j) produces (2-0j) even though complex(2, 0)
produces (2+0j). The two results can be compared successfully with ==
but not with is. This phenomenon arises from floating-point representation
including a sign bit; 0.0 and -0.0 are distinct objects, even though numeri-
cally equal.
Ç Note
delattr(obj, name_str)
Delete-attribute function. Removes an attribute from object obj, in which
name_str is a string containing the name of the attribute. For example:
my_dog.breed = 'St Bernard'
...
a_str = 'breed'
delattr(my_dog, a_str)
After this statement is executed, the object my_dog no longer has a 'breed'
attribute.
An AttrbuteError exception is raised if the obj does not already have
the named attribute before the deletion. See also the hasattr and setattr
functions.
dir([obj])
Directory function. Returns a list of attributes for the optional argument obj.
If this object is a class, it shows all the attributes of the class. If the object is
not a class, it gets the object’s class and shows all that class’s attributes.
If this argument is omitted, dir returns a list of attributes for the current
context—either the current function or module. For example:
dir() # Get attributes of the module.
divmod(a, b)
Divides a by b and then returns a tuple containing a/b, rounded downward to
the nearest integer, along with the result, a % b, the remainder. This function
is typically used with integers. For example:
quot, rem = divmod(203, 10) # Result is 20, remainder 3
Either or both of the arguments a and b may be float values; but in that
case, both of the return values will be in floating-point format. The quotient
will have no fractional portion, but the remainder may. For example:
>>> divmod(10.6, 0.5)
(21.0, 0.09999999999999964)
The resulting tuple in this case should be (21.0, 0.1), but there is a small
rounding error, as can happen with floating-point values. You can use the
round function to help correct it.
enumerate(iterable, start=0)
Enumeration. Takes an iterable as input and returns a sequence of tuples,
each having the form
number, item
Here, number is an integer in a sequence beginning with start (0 by
default) and increasing by 1 in each position; and item is an item produced by
the iterable.
Note Ë Although you can use the arguments to try to make eval safer, you can
never fully bulletproof this usage if your application should happen to take an
arbitrary string from the user and evaluate it. There are ways hackers can take
advantage of such code to bring down a system. So again, take care.
Ç Note
filter(function, iterable)
Generates a filtered sequence of values. Each item in the iterable is passed
to function in turn. This, argument, function, should take one argument
and return True or False.
If True is returned for a given element, the element gets included in the
sequence generated by filter. Otherwise, the item is omitted.
float([x])
Floating-point conversion. If the optional argument x is specified, the result of
converting the value to floating point is returned. Types that may be success-
fully converted to floating point include numeric types (such as integers) and
strings containing a valid representation of a floating-point number.
Strings can include numeric representations such as '4.105', '-23E01',
and '10.5e-5'. Positive and negative infinity can also be represented as
'Infinity', '-Infinity', 'inf', and '-inf'.
n = 1
yyy = float(n) # Assign 1.0 to yyy.
amt = float('-23E01') # Assign -23E01 to amt.
If no argument is specified, the value 0.0 is returned. Note that the square
brackets, in this case, indicate that the argument, x, is optional.
amt = float() # Assign 0.0 to amt.
format(obj, [format_spec])
Format-string function, using the extended syntax for formatting described
in Chapter 5, “Formatting Text Precisely.” If the optional format_spec
argument is specified, that argument is interpreted as a spec formatting code,
frozenset([iterable])
Returns a frozenset object, which is an immutable version of the set type.
Note the use of parentheses in the following example, indicating that the
argument is a tuple.
>>> frozenset((1, 2, 2, 3, 3, 4, 1))
frozenset({1, 2, 3, 4})
If the iterable argument is omitted, this function returns an empty
frozenset.
globals()
Returns a data dictionary, giving the names and values of global variables for
the module that’s currently executing.
hasattr(obj, name_str)
Returns True if the object, obj, has the attribute specified in name_str. The
following example assumes that my_dog is an instance of a class, such as Dog:
>>> my_dog.breed = 'Husky'
>>> nm = 'breed'
>>> hasattr(my_dog, nm)
True
hash(obj)
Returns a hash value for the specified object, obj. This hash value is used
by data dictionaries; as long as such a value is provided, the object, obj, can
be used as a key. Classes whose objects do not support this function are not
“hashable” and therefore cannot be used as dictionary keys.
This function is implemented by calling the _ _hash_ _ method of the
object’s class. See Chapter 9 for more information.
Very rarely is there any reason to call hash or _ _hash_ _ directly, except
for testing purposes. The most important thing to remember is that if you
write a class and want it to serve as a type that can be used as a key, then be
sure to implement the _ _hash_ _ magic method.
help([obj])
Prints help documentation for the specified object’s class. This is used often
within IDLE. If no object is specified, then an introductory page for the help
system for Python is printed.
hex(n)
Hexadecimal conversion. Returns a string containing a hexadecimal-radix
(base 16) representation of integer n, including the prefix, 0x. Inputs other
than integer cause Python to raise a TypeError exception.
For example:
>>> hex(23)
'0x17'
id(obj)
Returns a unique identifier for the object, obj. If two variables (obj1 and
obj2) have the same identifier, then the expression obj1 is obj2 returns
True—meaning that they refer to the same object in memory. (Note: Do not
confuse this with test for equality, ==, which is a less restrictive condition.)
input([prompt_str])
Input function. Returns a string input by the user after waiting for the user to
input zero or more characters and press Enter. The prompt_str, if specified,
prints a string for prompting the user, such as “Enter name here: ”. No extra
space is automatically printed after this prompt string, so you may want to
provide it yourself. For example:
my_name = input('Enter your name, please: ')
my_age = int(input('Enter your age, please: '))
int(x, base=10)
Integer conversion function. This function takes a numeric value or a string
containing a valid integer representation and then returns an actual integer
value. The base argument, if included, determines how to interpret the string
of digits given as the argument; by default, decimal radix (base 10) is assumed,
but another radix, such as 2 (binary), 8 (octal), or 16 (hexadecimal) may also
be used. For example:
>>> int('1000', 16)
4096
For the first argument, you can specify an object of another numeric type,
such as float. The int conversion truncates the fractional portion. For
example:
>>> int(3.99), int(-3.99)
(3, -3)
To round to the nearest integer (or any significant digit) using a different
rounding scheme, see the round function.
int()
The int conversion function for an empty argument list. This version of int,
with no argument specified, returns the integer value 0.
See also the earlier entry for int.
isinstance(obj, class)
Returns True if object obj is an instance of the specified class or any type
derived from that class. It’s often recommended that you use this function
instead of the type function. (The second argument can also be a tuple of
types. In that case, this function returns True if the object is an instance of
any of the specified classes.)
issubclass(class1, class2)
Returns True if class1 is a subclass of class2 or if the two arguments are
the same class. A TypeError exception is raised if class1 is not a class. As
with isinstance, the second argument can also be a tuple of types.
For example:
>>> class Floopy(int):
pass
>>> f = Floopy()
>>> issubclass(Floopy, int)
True
>>> issubclass(int, Floopy)
False
>>> issubclass(int, int)
True
>>> issubclass(f, int) # TypeError: f is not a class
iter(obj)
Iteration function. This function call assumes that obj is an iterable—
an object that returns an iterator. This includes standard collections and
sequences, as well as generators.
If the argument is not an iterable, then calling iter(obj) causes a
TypeError exception to be raised. If obj is an iterable, the call to iter should
return an iterator object. Such an object does the actual stepping through a
sequence of values by responding to next.
A few examples should clarify. First of all, you can’t legally call
iter(obj) if the object is not an iterable. For example:
>>> gen = iter(5) # This raises a TypeError
However, the call is valid if the target object is a list, even if (as in this case)
it is a short list containing only a single member.
>>> gen = iter([5])
It’s more common, of course, to use iter on a longer list containing at least
two members. The object returned—called an iterator or generator object—
can then be used with next to access a stream of values, one at a time. For
example:
>>> gen = iter([1, 2, 3])
>>> next(gen)
1
>>> next(gen)
2
>>> next(gen)
3
>>> next(gen) # StopIteration exception raised.
The iter function has the effect of calling the _ _iter_ _ magic method
in the iterable object’s class (such as a collection or generator), and the next
function has the effect of calling the _ _next_ _ method of an iterator object’s
class. Remember that it’s a two-step process.
len(sequence)
Returns the number of elements currently stored in the sequence, which is
usually a collection but may also be a sequence generated by the range func-
tion. In the case of a string, this gives the number of characters in the string.
>>> len('Hello')
5
This function is usually implemented by calling the _ _len_ _ method for
the object’s class.
Note that although the sequence generated by the range function supports
this function, there is no guarantee that other generators do.
list([iterable])
List conversion function. Takes an argument, which must be some kind of
iterable, and returns a list. If a generator object is involved, the source of val-
ues must be finite in number.
If iterable is a string, the function returns a list in which each element is
a one-character string.
>>> print(list('cat'))
['c', 'a', 't']
locals()
Returns a dictionary containing information on values in the local symbol
table. This dictionary should not be altered directly. For example:
>>> def foobar():
a = 2
b = 3
c = 1
print(locals())
>>> foobar()
{'a':2, 'b':3, 'c':1}
The map function can be used with as few as one iterable argument;
however, in such cases, list comprehension usually offers a better solution.
max(arg1 [, arg2]…)
Returns the maximum value from a series of one or more arguments (brackets
here are not intended literally). See the other entry for max for more informa-
tion on how this function works.
>>> max(1, 3.0, -100, 5.25)
5.25
max(iterable)
Returns the maximum element from a finite iterable (which may be a col-
lection, sequence, or generator object). In Python 3.0, all the elements must
be sortable with regard to all the other elements, or else this function raises a
TypeError exception.
Sorting is enabled by support for the less-than operator (<) for the objects
involved; this means that the appropriate _ _lt_ _ magic method must be
defined for every combination of element.
But note that all built-in numeric types, except for complex, are sortable
with regard to each other.
For example:
>>> from fractions import Fraction
>>> a_list = [1, Fraction('5/2'), 2.1]
>>> max(a_list)
Fraction(5, 2)
See also the previous listing for max.
min(arg1 [, arg2]…)
Returns the minimum value from a series of one or more arguments (brackets
here are not intended literally). See the other entry for min for more informa-
tion on how this function works.
>>> min(1, 3.0, -100, 5.25)
-100
min(iterable)
Returns the minimum element from a finite iterable (which may be a col-
lection, sequence, or generator object). In Python 3.0, all the elements must
be sortable with regard to all the other elements, or else this function raises a
TypeError exception.
Sorting is enabled by support for the less-than operator (<) for the objects
involved; this means that the appropriate _ _lt_ _ magic method must be
defined for every combination of element.
But note that all built-in numeric types, except for complex, are sortable
with regard to each other.
For example:
>>> from fractions import Fraction
>>> a_list = [1, Fraction('5/2'), 2.1]
>>> min(a_list)
1
See also the previous listing for min.
oct(n)
Returns a string containing the integer n in octal representation, including the
octal prefix (0o). Inputs other than integer cause Python to raise a TypeError
exception.
For example:
>>> oct(9)
'0o11'
open(file_name_str, mode='rt')
Attempts to open the file named by the first argument, which may include a
complete path name or a name local to the current directory. If the file open
is successful, a file object is returned. If not, an exception is raised, such as
FileNotFoundError.
The mode is a string that should not contain more than two or three char-
acters. Up to one character may be 't' or 'b', indicating whether the file is
accessed as text or binary. The default is text ('t').
The other character or characters determine whether the file-access mode
is read, write, or read/write. The default is read mode ('r'). Table B.2 shows
the read/write modes that may be combined with 't' (the default) or 'b',
representing text and binary mode, respectively.
Note Ë A file, once open, can be closed by calling the close method of the file
class, which includes many other I/O methods. A file may also be closed auto-
matically by using the with keyword, as explained in Appendix E.
Ç Note
ord(char_str)
Ordinal value function. Returns the number that is the ASCII or Unicode
character code for the character contained in char_str. This argument is
assumed to be a string containing exactly one character. If it is not a string or
if it contains more than one character, a TypeError exception is raised.
The ord function is the inverse of the chr function.
>>> chr(ord('a'))
'a'
pow(x, y [, z])
Power function. Returns the same value that x ** y does—that is, it raises
the numeric value x to the power of y. If z is specified, the function returns x
** y % z (divide the result by z and then return the remainder). For example:
>>> pow(2, 4)
16
>>> pow(2, 4, 10)
6
>>> pow(1.1, 100)
13780.61233982238
This last figure represents one dollar compounded at 10 percent annually
for 100 years. The result is more than $13,000. (All good things come to those
who wait!)
range(n)
Returns a sequence of integers, starting with 0 up to but not including n.
Therefore, range(n) produces 0, 1, 2, . . . n-1. You can use this sequence
directly in a for statement; but if you want the sequence to have the full status
of a list (so that you can print or index it), you need to apply a list conversion.
>>> list(range(5))
[0, 1, 2, 3, 4]
The expression range(len(collection)) returns a sequence of integers
corresponding to all the valid, non-negative indexes for the collection.
Remember that the value n is itself not included in the sequence. Instead,
range generates integers from 0 up to but not including n.
repr(obj)
Produces a string representation of obj, similar to the action of the str con-
version function; however, whereas str gives a standard string representa-
tion, repr gives the canonical representation of the object as it appears in
code. Therefore, whereas str(a_string) prints a string as it is, without any
surrounding quotation marks, repr(a_string) prints it with the quotes,
because that’s how it would appear in Python code.
reversed(iterable)
Produces a reverse generator over the elements in the source—that is, it iter-
ates over items in the reverse of the order they have in iterable. You can use
this generator in a for loop, the most typical use. Another thing you can do is
to convert it to a list by using the list conversion function. For example:
>>> print(list(reversed([1, 2, 3])))
[3, 2, 1]
Technically, you can get the reverse generator of a string and attempt to dis-
play meaningful results. This is difficult to do, however, and requires the use
of lists and the join function. Otherwise, look at what happens:
>>> str(reversed('Wow, Bob, wow!'))
'<reversed object at 0x11124bc88>'
The problem is that the reversed function, operating on a string, produces
a generator object and not a string. But there are alternatives. The easiest solu-
tion is to use slicing directly on the string. For example:
>>> 'Wow, Bob, wow!'[::-1]
'!wow ,boB ,woW'
round(x [,ndigits])
Rounds a numeric value x, using ndigits to indicate at which position to do
the rounding: Specifically, ndigits is an integer indicating how many posi-
tions to the right of the decimal point to perform the rounding. Negative numbers
indicate a position to the left of the decimal point.
set([iterable])
Conversion function for Python sets. If the iterable argument is omitted,
the result is empty. This is the standard way of representing an empty set in
Python, because {} represents an empty dictionary rather than an empty set.
empty_set = set()
If the iterable is not empty, then the resulting set contains all the elements
in the argument, but duplicates are dropped and order is not significant. For
example:
>>> my_list = [11, 11, 3, 5, 3, 3, 3]
>>> my_set = set(my_list)
>>> my_set
{3, 11, 5}
str(obj='')
Returns a string representation of the object, obj. If obj is not specified, this
function returns an empty string.
This conversion is implemented by calling the _ _str_ _ method for the
object’s class. If the class has no _ _str_ _ method defined, then, by default,
its _ _repr_ _ method is called. In many cases, the two methods display the
same results; however, the difference is that the _ _repr_ _ method contains
the representation of the object as it appears in code—string objects, for
example, are returned with quotation marks.
Aside from its role in printing, this string conversion has other uses. For
example, you might use this conversion if you want to count the number of 0’s
in a number.
>>> n = 10100140
>>> s = str(n)
>>> s.count('0')
4
str(obj=b'' [, encoding='utf-8'])
This version of str converts a bytes string (which is guaranteed to be made
up of individual bytes) to a standard Python string, which may use two or
more bytes to store a character. For example:
bs = b'Hello!' # Guaranteed to hold exactly six bytes
s = str(bs, encoding='utf-8')
print(s) # Print a normal string, ? bytes per char.
See also the previous entry for str.
sum(iterable [, start])
Produces the sum of the elements in iterable. All the elements must be
numeric; or they must at least support the _ _add_ _ method for objects of that
type with each other and with integers. It will not concatenate strings.
This is a super convenient function for use with numeric lists, tuples, and
sets. For example, here’s a simple function that gets the average value of a
numeric collection:
def get_avg(a_list):
return sum(a_list)/len(a_list)
Here’s an example that executes this function:
>>> get_avg([1, 2, 3, 4, 10])
4.0
The sum function can be used on other kinds of iterables, such as genera-
tors, as long as they produce a finite sequence. For example:
>>> def gen_count(n):
i = 1
while i <= n:
yield i
i += 1
>>> sum(gen_count(100))
super(type)
Returns the superclass of the specified type. This is useful when you’re inher-
iting from a class and you wish to call the superclass version of a particular
method, such as _ _init_ _.
tuple([iterable])
Tuple conversion: returns an immutable sequence by taking the values from
the iterable, which must be finite in size. The square brackets indicate that
iterable is an optional argument; if omitted, the tuple returned is empty.
type(obj)
Returns the type of obj, which can be compared to other types at run time,
using either test for equality (==) or is. For example:
>>> i = 5
>>> type(i) is int
True
>>> type(i) == int
True
The type function is often useful in Python in determining what the type
of an argument is; it enables you to respond in different ways to different types
of arguments. However, use of isinstance is usually recommended over the
use of type, because isinstance takes subclasses into account.
zip(*iterables)
Returns a sequence of tuples from a series of arguments. For each position, the
tuple in the result is (i1, i2, i3… iN), where N is the number of arguments to
this function and i is a value produced by the corresponding iterable argu-
ment. When the shortest of these arguments is exhausted, the function stops
producing tuples.
That’s a mouthful, but an example should help clarify. The following
example demonstrates how zip can be used to create one list from the sum of
two other lists: each element in a is added to the corresponding element in b.
a = [1, 2, 3]
b = [10, 20, 30]
c = [i[0] + i[1] for i in zip(a, b)]
Printing the results gives us
[11, 22, 33]
The expression zip(a, b), were you to convert it to a list and print it, pro-
duces a list of tuples, as shown:
>>> a_list = list(zip(a, b))
>>> a_list
[(1, 10), (2, 20), (3, 30)]
Compare the first three lines of the previous example with the following
lines, which are more complicated and harder to maintain. This is a longer
way of producing the same result.
a = [1, 2, 3]
b = [10, 20, 30]
c = []
for i in range(min(len(a), len(b))):
c.append(a[i] + b[i])
set_obj.add(obj)
Adds an object, obj, to an existing set. The statement has no effect if obj is
aslready a member of the set. Returns None in either case. For example:
a_set = {1, 2, 3}
a_set.add(4) # Adds 4 to the set.
The set a_set is now equal to {1, 2, 3, 4}.
577
From the Library of Vineeth Babu
set_obj.clear()
Clears all the elements from an existing set. Takes no arguments and returns
the value None.
a_set.clear()
set_obj.copy()
Returns a shallow, member-by-member copy of a set. For example:
a_set = {1, 2, 3}
b_set = a_set.copy()
After these statements are executed, b_set has the same contents as a_set,
but they are two separate sets, so changes to one do not affect the other.
set_obj.difference(other_set)
Returns a set that contains all the elements in set_obj that are not in other_set.
For example:
a_set = {1, 2, 3, 4}
b_set = {3, 4, 5, 6}
c = a_set.difference(b_set)
print(c) # Prints {1, 2}
print(b_set.difference(a_set)) # Prints {5, 6}
The difference operator, which uses a minus sign (-) for sets, produces
the same results and is more compact.
print(a_set - b_set) # Prints {1, 2}
set_obj.difference_update(other_set)
Performs the same action as the difference method, except that the results
are placed in set_obj and the value returned is None.
The difference-assignment operator (-=) performs the same action.
a_set -= b_set # Put results of diff. in a_set
set_obj.discard(obj)
Removes the element obj from set_obj. Returns the value None. Performs
the same action as the remove method, except that no exception is raised if
obj is not currently a member of the set.
a_set = {'Moe', 'Larry', 'Curly'}
a_set.discard('Curly')
print(a_set) # Prints {'Moe', 'Larry'}
set_obj.intersection(other_set)
Returns the intersection of set_obj and other_set, which consists of all
objects that are elements of both sets. If the sets have no elements in common,
the empty set is returned. For example:
a_set = {1, 2, 3, 4}
b_set = {3, 4, 5, 6}
print(a_set.intersection(b_set)) # Prints {3, 4}
The intersection operator (&) performs the same action.
print(a_set & b_set) # Prints {3, 4}
set_obj.intersection_update(other_set)
Performs the same action as the intersection method, except that the
results are placed in set_obj and the value returned is None.
The intersection-assignment operator (&=) performs the same action.
a_set &= b_set # Put the intersection in a_set
set_obj.isdisjoint(other_set)
Returns True or False, depending on whether set_obj and other_set are
disjoint—meaning that they have no elements in common.
set_obj.issubset(other_set)
Returns True if set_obj is a subset of other_set; this includes the condition
of the two sets being equal. For example:
{1, 2}.issubset({1, 2, 3}) # Produces the value True.
{1, 2}.issubset({1, 2}) # Also produces True.
set_obj.issuperset(other_set)
Returns True if set_obj is a superset of other_set; this includes the condi-
tion of the two sets being equal. For example:
{1, 2}.issuperset({1}) # Produces the value True.
{1, 2}.issuperset({1, 2}) # Also produces True.
set_obj.pop()
Returns a random element from the set and then removes that element. For
example:
a_set = {'Moe', 'Larry', 'Curly'}
stooge = a_set.pop()
print(stooge, a_set)
This example prints, or rather may print, the following:
Moe {'Larry', 'Curly'}
set_obj.remove(obj)
Removes the specified element, obj, from the set_obj. This performs the
same action as the discard method, except that remove raises a KeyError
exception if obj is not currently an element.
set_obj.symmetric_difference(other_set)
Returns a set consisting of all objects that are a member of set_obj but not
other_set, and vice versa. For example:
a_set = {1, 2, 3, 4}
b_set = {3, 4, 5, 6}
print(a_set.symmetric_difference(b_set))
This code prints the set {1, 2, 5, 6}.
The symmetric-difference operator (^) performs the same action.
print(a_set ^ b_set) # Prints {1, 2, 5, 6}
set_obj.symmetric_difference_update(other_set)
Performs the same action as the symmetric_difference method, except
that the results are placed in set_obj and the value returned is None.
The symmetric-difference-assignment operator (^=) performs the same
action.
a_set ^= b_set # Put the sym. difference in a_set
set_obj.union(other_set)
Returns the union of set_obj and other_set, which is the set containing all
objects that are in either set or both. For example:
a_set = {1, 2, 3, 4}
b_set = {3, 4, 5, 6}
print(a_set.union(b_set)) # Prints {1, 2, 3, 4, 5, 6}
The union operator (|) performs the same action.
print(a_set | b_set) # Prints {1, 2, 3, 4, 5, 6}
set_obj.union_update(other_set)
Performs the same action as the union method, except that the results are
placed in set_obj and the value returned is None.
The union-assignment operator (|=) performs the same action.
a_set |= b_set # Put the union in a_set
This operator provides an easy way to extend the contents of a set. For
example:
a_set = {1, 2, 3}
a_set |= {200, 300}
print(a_set) # Prints {1, 2, 3, 200, 300}
dict_obj.clear()
Clears all the elements from an existing dictionary. Takes no arguments and
returns the value None.
a_dict.clear()
583
From the Library of Vineeth Babu
dict_obj.copy()
Returns a shallow copy of a dictionary by performing a member-by-member
copy. For example:
a_dict = {'pi': 3.14159, 'e': 2.71828 }
b_dict = a_dict.copy()
After these statements are executed, b_dict has the same contents as
a_dict, but they are two separate collections, so changes to one do not affect
the other.
dict_obj.items()
Returns a sequence of tuples, in the format (key, value), containing all the
key-value pairs in the dictionary. For example:
grades = {'Moe':1.5, 'Larry':1.0, 'BillG':4.0}
print(grades.items())
This code prints the following:
dict_items([('Moe', 1.5), ('Larry', 1.0), ('BillG', 4.0)])
dict_obj.keys()
Returns a sequence containing all the keys in the dictionary. For example:
grades = {'Moe':1.5, 'Larry':1.0, 'BillG':4.0}
print(grades.keys())
This code prints the following:
dict_keys(['Moe', 'Larry', 'BillG'])
dict_obj.pop(key [, default_value])
Returns the value associated with key and then removes that key-value pair from
the dictionary. If the key is not found, this method returns the default_value,
if specified; if that argument is not specified, a KeyError is raised. For example:
grades = {'Moe':1.5, 'Larry':1.0, 'BillG':4.0}
print(grades.pop('BillG', None)) # Prints 4.0
print(grades) # Prints grades, with
# BillG removed.
dict_obj.popitem()
Returns an arbitrary key-value pair from the dictionary object and removes it.
(This is not precisely the same as “random object,” because the selection is not
guaranteed to conform to the statistical requirements of true randomness.)
The key-value pair is returned as a tuple. For example:
grades = {'Moe':1.5, 'Larry':1.0, 'BillG':4.0}
print(grades.popitem())
print(grades)
These statements print the following:
('BillG', 4.0)
{'Moe': 1.5, 'Larry': 1.0}
dict_obj.setdefault(key, default_value=None)
Returns the value of the specified key. If the key cannot be found, this method
inserts that key, along with the associated value specified as default_value;
this value is None if not specified. In either case, the value is returned.
For example, the following statement returns the current value associated
with the key 'Stephen Hawking' if it is present; otherwise, it inserts a key-
value pair and returns the new value, 4.0.
print(grades.setdefault('Stephen Hawking', 4.0))
dict_obj.values()
Returns a sequence containing all the associated values in the dictionary. To
treat this sequence as a list, you can use a list conversion. For example:
grades = {'Moe':1.5, 'Larry':1.0, 'Curly':1.0,
'BillG': 4.0}
print(grades.values())
These statements print the following:
dict_values([1.5, 1.0, 1.0, 4.0])
dict_obj.update(sequence)
This method extends the dictionary object by adding all the key-value entries
in sequence to the dict_obj. The sequence argument is either another dic-
tionary or a sequence of tuples containing key-value pairs.
For example, the following statements start with two entries in grades1
and then adds three more entries to that dictionary.
grades1 = {'Moe':1.0, 'Curly':1.0}
grades2 = {'BillG': 4.0}
grades3 = [('BrianO', 3.9), ('SillySue', 2.0)]
grades1.update(grades2)
grades1.update(grades3)
print(grades1)
These statements, when executed, print the following:
{'Moe': 1.0, 'Curly': 1.0, 'BillG': 4.0, 'BrianO': 3.9,
'SillySue': 2.0}
587
From the Library of Vineeth Babu
main()
The result is to print the data dictionary for the function’s local scope.
{'b': 200, 'a': 100}
Now, whenever the variable a or b is referred to in an expression within this
function, the corresponding value (which can be any type of object) is looked
up in the table, and Python then uses the value associated with the variable
name. If the name is not found, Python then looks at the global symbol table.
Finally, it looks at the list of built-ins. If the name is not found in any of these
places, a NameError exception is raised.
Consequently, Python variables are essentially names. They can be reas-
signed new values (that is, objects) at any time; they can even be assigned to
different types of objects at different times, although that’s mostly discour-
aged—except in the case of polymorphic arguments and duck typing. (“Duck
typing” is discussed in Python Without Fear and other books.)
Variables, therefore, do not occupy fixed places in memory, unlike vari-
ables in other programming languages such as C and C++. A variable has no
attributes of its own, only those of the object it refers to.
Variables work like references to objects. Assigning a new object to an
existing variable replaces its entry in the data dictionary, canceling the old
association. A counterexample would be the use of an assignment opera-
tor, such as +=, on a list, which works as an in-place modification. See Section
4.2.3, “Understand Combined Operator Assignment (+= etc.),” for more
information.
A valid symbolic name begins with an underscore ( _) or letter. Thereafter,
every character must be a letter, underscore, or digit character.
Note Ë Be careful about tab characters, which Python does not consider equiv-
alent to any number of spaces. If possible, direct your text editor to insert
spaces instead of tab characters (\t).
Ç Note
assert Statement
This statement is helpful as a debugging tool. It has the following syntax:
assert expression, error_msg_str
Python responds by evaluating the expression. If it’s true, nothing hap-
pens. If the expression is false, error_msg_str is printed and the program
break Statement
This statement has a simple syntax.
break
The effect of a break statement is to exit from the nearest enclosing for or
while loop, transferring control to the first statement after the loop, if any.
For example:
total = 0.0
while True:
s = input('Enter number: ')
if not s: # Break on empty-string input.
break
total += float(s) # Only executed on non-empty s.
print(total)
In this case, the effect of the conditional involving the break statement is to
exit the loop after an empty string is entered.
Use of break outside a loop causes a syntax error.
class Statement
This statement creates, or “compiles,” a class definition at run time. The
definition must be syntactically correct, but Python doesn’t need to resolve
all symbols in the definition until the class is used to instantiate an object.
(Classes can therefore refer to each other if neither is instantiated until both
are defined.)
This keyword has the following syntax, in which square brackets indicate
an optional item: base_classes, a list of zero or more classes separated by
commas if there are more than one.
class class_name [(base_classes)]:
statements
The statements consist of one or more statements; these are usually variable
assignments and function definitions. A pass statement, a no-op, may also be
used as a stand-in for statements to be added later.
class Dog:
pass
Variables created in a class definition become class variables. Functions in
a class definition become methods. A common such method is _ _init_ _.
As with other methods, if it’s to be called through an instance, the definition
must begin with an extra argument, self, that refers to the instance itself. For
example:
class Point:
def _ _init_ _(self, x, y):
self.x = x
self.y = y
Once a class is defined, you can use it to instantiate objects. Arguments
given during object creation are passed to the _ _init_ _ method. Note that
this method provides a way to create a uniform set of instance variables for
objects of the class. (As a result of this method, all Point objects will have an
x and y element.)
my_pt = Point(10, 20) # my_pt.x = 10, my_pt.y = 20
Function definitions inside a class definition can involve decoration with
@classmethod and @staticmethod, which create class and static methods,
respectively.
A class method has access to the symbols defined within the class, and it
starts with an extra argument, which by default is cls and refers to the class
itself.
A static method is defined in a class but has no access to class or instance
variables.
For example, the following code defines a class method, set_xy, and
a static method, bar. Both are methods of class foo, and both are called
through the class name. They can also be called through any instance of foo.
@classmethod
def set_xy(cls, n, m):
cls.x = n
cls.y = m
@staticmethod
def bar():
return 100
continue Statement
This statement has a simple syntax.
continue
The effect of continue is to transfer execution to the top of the enclosing
for or while loop and advance to the next iteration. If continue is executed
inside a for loop, the value of the loop variable is advanced to the next value
in the iterable sequence, unless that sequence has already been exhausted, in
which case the loop terminates.
For example, the following example prints every letter in “Python” except
for uppercase or lowercase “d”.
for let in 'You moved Dover!':
if let == 'D' or let == 'd':
continue
print(let, end='')
The effect of this code is to print:
You move over!
Use of continue outside a loop causes a syntax error.
def Statement
This statement creates, or “compiles,” a function definition at run time. The
definition must be syntactically correct, but Python doesn’t need to resolve
all symbols in the definition until the function is called. (This enables mutual
self-references as long as both functions are defined before either is called.)
def function_name(args):
statements
In this syntax, args is a list of zero or more arguments, separated by com-
mas if there are more than one:
[arg1 [,arg2]...]
For example:
def hypotenuse(side1, side2):
total = side1 * side1 + side2 * side2
return total ** 0.5 # Return square root of total.
Once a function is defined, it may be executed (called) at any time, but paren-
theses are always necessary in a function call, whether or not there are arguments.
def floopy():
return 100
del Statement
This statement removes one or more symbols from the current context. It has
the following syntax:
del sym1 [, sym2]...
The effect is to remove the specified symbol or symbols but not necessarily
to destroy any other object, as long as there are references to it. For example:
a_list = [1, 2, 3]
b_list = a_list # Create alias for the list
del a_list # Remove a_list from symbol table
print(b_list) # List referred to still exists.
elif Clause
The elif keyword is not a separate statement but is used as part of the syntax
in an if statement. See if statement for more information.
else Clause
The else keyword is not a separate statement but is used as part of the syntax
in an if, for, while, or try statement.
except Clause
The except clause is not a separate statement but is used in a try statement.
See try for more information.
for Statement
This statement has the syntax shown below. It’s essentially a “for each” loop.
You need to use the range built-in function if you want it to behave like a tradi-
tional FORTRAN “for” (which you should do only if necessary or you’re very
stubborn). The brackets are not intended literally but indicate an optional item.
for loop_var in iterable:
statements
[else:
statements] # Executed if first block of statements
# finished successfully, without break
One effect is to create loop_var as a variable, referring to the first item
produced by iterable (which is a collection or generated sequence). This
variable continues to exist at the current level of scope. Upon completion of
the loop, the variable should refer to the last item in iterable, assuming
there was no early exit.
The for loop performs statements over and over, just as a while loop
does; but it also sets loop_var to the next value produced by iterable at the
beginning of each iteration (cycle). When the iteration is exhausted, the loop
terminates.
Here are some examples.
# Print members of the Beatles on separate lines
def factorial(n):
prod = 1
for n in range(1, n + 1):
prod *= n
return(prod)
See Section 4.2.9, “Loops and the ‘else’ Keyword,” for an example of using
an else clause with for.
global Statement
This statement has the following syntax, involving one or more variable
names.
global var1 [, var2]...
The effect of the global statement is: “Do not treat this variable, or vari-
ables, as local within the scope of the current function.” It does not, however,
create a global variable; a separate statement is required to do that.
This statement is sometimes necessary because otherwise, assignment to
a global variable, from within a function, is interpreted as creating a local
variable. If a global is not assigned to, there’s no problem. But if you assign
to a global variable from within a function, the code creates a local variable
instead. This is what we call “the local variable trap.”
account = 1000
def clear_account():
account = 0 # Oops, create new var, as a local
clear_account()
print(account) # Prints 1000, this is an error!
This simple program ought to create a variable, reset it to 0, and then print
0. But it fails to do so, because the statement account=0 occurs inside a func-
tion. When executed, the function creates account as a local variable and
therefore not connected to the global copy of account.
The solution is to use the global statement, which causes the function to
not treat account as a local variable; it therefore forces Python to refer to the
global version.
def clear_account():
global account # Don't make account a local var.
account = 0 # Reset global copy to 0!
clear_account()
print(account) # Prints 0, not 1000.
if Statement
This statement has a simple version and a more complete version, included
here (although redundantly) for ease of understanding. The simplest version is
if condition:
statements
The condition can be any Python object—or expression evaluating to an
object—or a chained series of comparisons, as shown next.
age = int(input('Enter your age: '))
if 12 < age < 20:
print('You are a teenager.')
All Python objects convert to True or False in this context. The statements
are one or more Python statements.
Here’s the full syntax. Square brackets in this case show optional items.
You can have any number of elif clauses.
if condition:
statements
[elif condition:
statements ]...
[else:
statements]
Here’s an example. This example features only one elif clause, although you
can have any number of them.
age = int(input('Enter age: '))
if age < 13:
print('Hello, spring chicken!')
elif age < 20:
print('You are a teenager.')
print('Do not trust x, if x > 30.')
else:
print('My, my. ')
print('We are not getting any younger are we?')
import Statement
The import statement suspends execution of the current module and executes
the named package or module if it hasn’t been executed already. This is neces-
sary, because in Python, function and class definitions are executed (“compiled”)
dynamically, at run time.
The other effect is to make symbols in the named module or package acces-
sible to the current module, depending on which version is used.
import module
import module as short_name
from module import sym1 [, sym2]...
from module import *
The first two versions make symbols accessible but only as qualified names,
as in math.pi or math.e. The third version makes symbols accessible without
qualification, but only those listed. The fourth version makes all symbols in
the module available without qualification.
This last version is the most convenient, but it presents the danger of nam-
ing conflicts if the named module or package defines many symbols. For
large modules and packages with large namespaces, the other versions of the
import statement are recommended.
See Chapter 14, “Multiple Modules and the RPN Example,” for more
information.
nonlocal Statement
This statement has a syntax that is similar to that of the global statement.
nonlocal var1 [, var2]...
The purpose of the nonlocal statement is similar to that of the global
statement, with one difference: nonlocal is used to deny local scope and
to prefer an enclosing, yet nonglobal, scope. This only ever happens when a
function definition is nested inside another function definition. For that rea-
son, the use of nonlocal is rare.
See the global statement for more information.
pass Statement
This statement has a simple syntax:
pass
This statement is essentially a no-op. It does nothing at run time, and
its major use is to act as a stand-in or placeholder inside a class or function
definition—that is, it holds a place to be filled in later with other statements.
class Dog:
pass # This class has no methods yet.
raise Statement
This statement has the following syntax, in which square brackets are not
intended literally but indicate optional items.
raise [exception_class [(args)]]
The effect is to raise the specified exception, with optional arguments.
Once an exception is raised, it must be handled by the program or else it
causes rude and abrupt termination.
An exception handler can rethrow an exception by using raise. Using the
statement with no exception_class rethrows the exception without chang-
ing it, in effect saying, “I’ve decided not to handle this after all” and passing it
along. Python must then look for another handler.
raise
See the try statement for more information.
return Statement
This statement has one optional part. Here the square brackets are not
intended literally.
return [return_val]
The action is to exit the current function and return a value to the caller of
the function. If return_val is omitted, the value None is returned by default.
Multiple values may be returned by returning a tuple.
return a, b, c # Exit and return three values.
If return is used outside all functions, the result is a syntax error. See the
def statement for an example of return in context.
try Statement
This statement has a fairly complicated syntax, so we break it down into two
major parts: first, the overall syntax. Square brackets indicate optional items.
try:
statements
[except exception_specifier:
statements]...
[else:
statements]
[finally:
statements]
The first block of statements is executed directly. However, if an excep-
tion is raised during the execution of that block—even during a function
called directly or indirectly by one of these statements—Python checks excep-
tion handlers, as described later in this section. There may be any number of
exception handlers.
The statements in the optional else clause are executed if the first state-
ment block finishes without being interrupted by an exception. The statements
in the optional finally clause execute unconditionally, after other state-
ments in this syntax.
Each except clause has the following syntax:
except [exception [as e]]:
statements
If exception is omitted, the clause handles all exceptions. If it is specified,
Python considers an exception a match if it has the same class as or a derived
class of exception—or if it is an object whose class is a match. The optional
e symbol is an argument providing information. Python checks each excep-
tion handler in turn, in the order given.
The Exception class is the base class of all error classes, but it does not
catch all exceptions, such as StopIteration.
Handling an exception means to execute the associated statements and
then transfer execution to the finally clause, if present, or else the end of the
entire try/except block.
>>> def div_me(x, y):
try:
quot = x / y
except ZeroDivisionError as e:
print("Bad division! Text:", e)
>>> div_me(2, 1)
Quotient is 2.0.
Execution complete!
>>> div_me(2, 0)
Bad division! Text: division by zero
Execution complete!
while Statement
This statement is a simple loop that has only one version. There is no
“do-while.” Square brackets are not intended literally but indicate an optional
item.
while condition:
statements
[else:
statements] # Executed if first block of statements
# finished successfully, without break
At the top of the loop, condition is evaluated: Any “true” value causes the
statements to be executed; then control is transferred back to the top, where
condition is evaluated again. A “false” value causes exit from the loop.
(The true/false value of a non-Boolean condition is determined by applying
a bool conversion and getting either True or False. All Python objects sup-
port such a conversion. The default behavior for Python objects is to convert
to True. In general, objects are “true” except for zero values, None, and empty
collections.)
So, for example, the following example counts from 10 down to 1, printing
the number each time. (Blastoff!)
n = 10
while n > 0:
print(n, end=' ')
n -= 1 # Decrement n by 1
As a consequence of the rules for evaluating conditions, you could also
write the code as follows, although it is a little less reliable, because if n is ini-
tialized to a negative number, the result is an infinite loop.
n = 10
while n:
print(n, end=' ')
n -= 1 # Decrement n by 1
See continue and break for ways to control execution of the loop.
with Statement
This statement has the following syntax, in which the optional part is shown
in square brackets.
with expression [as var_name]
statements
yield Statement
The syntax of this statement is similar to that for the return statement, but
its effect is completely different.
yield [yielded_val]
The default setting of yielded_val is None. Using yield outside a func-
tion is a syntax error.
The effect of yield is to turn the current function into a generator factory
rather than an ordinary function. The actual return value of a generator fac-
tory is a generator object, which then yields values in the way defined by the
factory.
This is admittedly one of the most confusing parts of Python, because no
value is actually yielded until a generator object is created. (Whoever said all
language designers are logical?) In fact, this is probably the single most coun-
terintuitive feature in Python.
For clarifying information, see Section 4.10, “Generators.”
605
From the Library of Vineeth Babu
LEARN QUICKLY
Learn a new technology in just hours. Video training can teach more in
less time, and material is generally easier to absorb and remember.
TEST YOURSELF
Our Complete Video Courses offer self-assessment quizzes throughout.
CONVENIENT
Most videos are streaming with an option to download lessons for offline viewing.
Learn more, browse our store, and watch free, sample lessons at
i n f o r m i t . co m / v i d e o
Save 50%* off the list price of video courses with discount code VIDBOB
*Discount code VIDBOB confers a 50% discount off the list price of eligible titles purchased on informit.com. Eligible titles include most full-course video titles. Book + eBook bundles,
book/eBook + video bundles, individual video lessons, Rough Cuts, Safari Books Online, non-discountable titles, titles on promotion with our retail partners, and any title featured
as eBook Deal of the Day or Video Deal of the Week is not eligible for discount. Discount may not be combined with any other offer and is not redeemable for cash. Offer subject to change.
• Automatically receive a coupon for 35% off your next purchase, valid
for 30 days. Look for your code in your InformIT cart or the Manage
Codes section of your account page.
• Download available product updates.
• Access bonus material if available.*
• Check the box to hear from us and receive exclusive offers on new
editions and related products.
*
Registration benefits vary by product. Benefits will be listed on your account page under
Registered Products.
Addison-Wesley • Adobe Press • Cisco Press • Microsoft Press • Pearson IT Certification • Prentice Hall • Que • Sams • Peachpit Press