Reading Files: Python For Informatics: Exploring Information
Reading Files: Python For Informatics: Exploring Information
Chapter 7
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
http://www.py4inf.com/code/mbox-short.txt
Opening a File
• Before we can read the contents of the file, we must tell Python
which file we are going to work with and what we will be doing with
the file
> mode is optional and should be 'r' if we are planning to read the
file and 'w' if we are going to write to the file
What is a Handle?
>>> fhand = open('mbox.txt')
>>> print fhand
<open file 'mbox.txt', mode 'r' at 0x1005088b0>
When Files are Missing
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
File Processing
• A text file has newlines at the end of each line
• Remember - a sequence is an
ordered set
Counting Lines in a File
• Open a file read-only fhand = open('mbox.txt')
count = 0
• Use a for loop to read each line for line in fhand:
count = count + 1
• Count the lines and print out print 'Line Count:', count
the number of lines
$ python open.py
Line Count: 132045
Reading the *Whole* File
From: zqian@umich.edu
From: rjlowe@iupui.edu
...
OOPS!
What are all these blank From: stephen.marquard@uct.ac.za\n
lines doing here? \n
From: louis@media.berkeley.edu\n
\n
• Each line from the file has a From: zqian@umich.edu\n
newline at the end \n
From: rjlowe@iupui.edu\n
• The print statement adds a \n
newline to each line ...
Searching Through a File (fixed)
fhand = open('mbox-short.txt')
• We can strip the whitespace for line in fhand:
line = line.rstrip()
from the right-hand side of the if line.startswith('From:') :
string using rstrip() from the print line
string library
From: stephen.marquard@uct.ac.za
• The newline is considered From: louis@media.berkeley.edu
“white space” and is stripped From: zqian@umich.edu
From: rjlowe@iupui.edu
....
Skipping with continue
fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
• We can conveniently if not line.startswith('From:') :
skip a line by using the continue
continue statement print line
Using in to select lines
Bad File
except:
print 'File cannot be opened:', fname
exit()
Names count = 0
for line in fhand:
if line.startswith('Subject:') :
count = count + 1
print 'There were', count, 'subject lines in', fname