0% found this document useful (0 votes)
83 views

Reading Files: Python For Informatics: Exploring Information

The document discusses opening and reading files in Python. It covers opening a file, using a file handle as a sequence, counting and reading lines in a file, searching through a file for specific lines, and handling errors from opening files.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views

Reading Files: Python For Informatics: Exploring Information

The document discusses opening and reading files in Python. It covers opening a file, using a file handle as a sequence, counting and reading lines in a file, searching through a file for specific lines, and handling errors from opening files.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Reading Files

Chapter 7

Python for Informatics: Exploring Information


www.pythonlearn.com
Software What
It is time to go find some
Next? Data to mess with!
Input Central
and Output Processing Files R Us
Devices Unit
Secondary
if x < 3: print Memory

Main From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008


Memory Return-Path: <postmaster@collab.sakaiproject.org>
Date: Sat, 5 Jan 2008 09:12:18 -0500To:
source@collab.sakaiproject.orgFrom:
stephen.marquard@uct.ac.zaSubject: [sakai] svn commit: r39772 -
content/branches/Details: http://source.sakaiproject.org/viewsvn/?
view=rev&rev=39772
...
File Processing
• A text file can be thought of as a sequence of lines

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008


Return-Path: <postmaster@collab.sakaiproject.org>
Date: Sat, 5 Jan 2008 09:12:18 -0500
To: source@collab.sakaiproject.org
From: stephen.marquard@uct.ac.za
Subject: [sakai] svn commit: r39772 - content/branches/

Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772

http://www.py4inf.com/code/mbox-short.txt
Opening a File
• Before we can read the contents of the file, we must tell Python
which file we are going to work with and what we will be doing with
the file

• This is done with the open() function

• open() returns a “file handle” - a variable used to perform operations


on the file

• Similar to “File -> Open” in a Word Processor


Using open()
• handle = open(filename, mode) fhand = open('mbox.txt', 'r')

> returns a handle use to manipulate the file

> filename is a string

> mode is optional and should be 'r' if we are planning to read the
file and 'w' if we are going to write to the file
What is a Handle?
>>> fhand = open('mbox.txt')
>>> print fhand
<open file 'mbox.txt', mode 'r' at 0x1005088b0>
When Files are Missing

>>> fhand = open('stuff.txt')


Traceback (most recent call last): File
"<stdin>", line 1, in <module>IOError: [Errno 2]
No such file or directory: 'stuff.txt'
The newline Character
• We use a special character >>> stuff = 'Hello\nWorld!'
>>> stuff
called the “newline” to
'Hello\nWorld!'
indicate when a line ends >>> print stuff
Hello
• We represent it as \n in World!
strings >>> stuff = 'X\nY'
>>> print stuff
• Newline is still one character - X
not two Y
>>> len(stuff)
3
File Processing
• A text file can be thought of as a sequence of lines

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008


Return-Path: <postmaster@collab.sakaiproject.org>
Date: Sat, 5 Jan 2008 09:12:18 -0500
To: source@collab.sakaiproject.org
From: stephen.marquard@uct.ac.za
Subject: [sakai] svn commit: r39772 - content/branches/

Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
File Processing
• A text file has newlines at the end of each line

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008\n


Return-Path: <postmaster@collab.sakaiproject.org>\n
Date: Sat, 5 Jan 2008 09:12:18 -0500\n
To: source@collab.sakaiproject.org\n
From: stephen.marquard@uct.ac.za\n
Subject: [sakai] svn commit: r39772 - content/branches/\n
\n
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772\n
File Handle as a Sequence
• A file handle open for read can be
treated as a sequence of strings
where each line in the file is a string xfile = open('mbox.txt')
in the sequence for cheese in xfile:
print cheese
• We can use the for statement to
iterate through a sequence

• Remember - a sequence is an
ordered set
Counting Lines in a File
• Open a file read-only fhand = open('mbox.txt')
count = 0
• Use a for loop to read each line for line in fhand:
count = count + 1
• Count the lines and print out print 'Line Count:', count
the number of lines
$ python open.py
Line Count: 132045
Reading the *Whole* File

• We can read the whole file >>> fhand = open('mbox-short.txt')


>>> inp = fhand.read()
(newlines and all) into a >>> print len(inp)94626
single string >>> print inp[:20]
From stephen.marquar
Searching Through a File

• We can put an if statement in fhand = open('mbox-short.txt')


our for loop to only print lines for line in fhand:
that meet some criteria if line.startswith('From:') :
print line
OOPS!
From: stephen.marquard@uct.ac.za
What are all these blank
lines doing here? From: louis@media.berkeley.edu

From: zqian@umich.edu

From: rjlowe@iupui.edu
...
OOPS!
What are all these blank From: stephen.marquard@uct.ac.za\n
lines doing here? \n
From: louis@media.berkeley.edu\n
\n
• Each line from the file has a From: zqian@umich.edu\n
newline at the end \n
From: rjlowe@iupui.edu\n
• The print statement adds a \n
newline to each line ...
Searching Through a File (fixed)
fhand = open('mbox-short.txt')
• We can strip the whitespace for line in fhand:
line = line.rstrip()
from the right-hand side of the if line.startswith('From:') :
string using rstrip() from the print line
string library
From: stephen.marquard@uct.ac.za
• The newline is considered From: louis@media.berkeley.edu
“white space” and is stripped From: zqian@umich.edu
From: rjlowe@iupui.edu
....
Skipping with continue
fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
• We can conveniently if not line.startswith('From:') :
skip a line by using the continue
continue statement print line
Using in to select lines

• We can look for a string


fhand = open('mbox-short.txt')
for line in fhand:
anywhere in a line as our line = line.rstrip()
if not '@uct.ac.za' in line :
selection criteria continue
print line

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008


X-Authentication-Warning: set sender to stephen.marquard@uct.ac.za using –f
From: stephen.marquard@uct.ac.za
Author: stephen.marquard@uct.ac.za
From david.horwitz@uct.ac.za Fri Jan 4 07:02:32 2008
X-Authentication-Warning: set sender to david.horwitz@uct.ac.za using -f...
fname = raw_input('Enter the file name: ')
Prompt for
File Name
fhand = open(fname)
count = 0
for line in fhand:
if line.startswith('Subject:') :
count = count + 1
print 'There were', count, 'subject lines in', fname

Enter the file name: mbox.txt


There were 1797 subject lines in mbox.txt

Enter the file name: mbox-short.txt


There were 27 subject lines in mbox-short.txt
fname = raw_input('Enter the file name: ')
try:
fhand = open(fname)

Bad File
except:
print 'File cannot be opened:', fname
exit()

Names count = 0
for line in fhand:
if line.startswith('Subject:') :
count = count + 1
print 'There were', count, 'subject lines in', fname

Enter the file name: mbox.txt


There were 1797 subject lines in mbox.txt

Enter the file name: na na boo boo


File cannot be opened: na na boo boo
Summary
• Searching for lines
• Secondary storage
• Reading file names
• Opening a file - file handle
• Dealing with bad files
• File structure - newline character

• Reading a file line by line with a


for loop
Acknowledgements / Contributions
These slidee are Copyright 2010- Charles R. Severance (
...
www.dr-chuck.com) of the University of Michigan School of
Information and open.umich.edu and made available under a
Creative Commons Attribution 4.0 License. Please maintain this
last slide in all copies of the document to comply with the
attribution requirements of the license. If you make a change,
feel free to add your name and organization to the list of
contributors on this page as you republish the materials.

Initial Development: Charles Severance, University of Michigan


School of Information

… Insert new Contributors and Translators here

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy