10 ch10 ELEC462
10 ch10 ELEC462
(ELEC462)
I/O Redirection and Pipes
Dukyun Nam
HPC Lab@KNU
Contents
● Introduction
2
Introduction
● Ideas and Skills
○ I/O Redirection: What and Why?
○ Definitions of standard input, output, and error
○ Redirecting standard I/O to files
○ Using fork to redirect I/O for other programs
○ Pipes
○ Using fork with pipes
● System Calls and Functions
○ dup, dup2
○ pipe
3
Shell Programming
● How do the following commands work?
ls > myfiles
who | sort > userlist
● Questions
○ How does the shell tell a program to send its output to a file instead of the screen?
○ How does the shell connect the output stream of one process to the input stream of
another process?
○ What’s the real meaning of ‘standard input’?
● In this chapter, we’ll focus on a particular form of IPC (Interprocess
communication)
○ (1) Input/output (I/O) redirection
○ (2) pipes
4
Interprocess Communication (IPC)
● Communication and synchronization facilities
○ Communication
■ These facilities are concerned with exchanging
data between processes
○ Signals
■ Although signals are intended primarily for
other purposes, they can be used as
a synchronization technique in certain circumstances
○ Synchronization
■ These facilities are concerned with synchronizing
the actions of processes
< A taxonomy of UNIX IPC facilities >
* M. Kerrisk, The Linux Programming Interface, No Starch Press, October 2010. 5
IPC: Communication Facilities
● Data-transfer facilities
○ The key factor distinguishing these facilities is
< Exchanging data between two
the notion of writing and reading processes using a pipe >
● Shared memory
○ Shared memory allows processes to exchange
information by placing it in a region of memory
that is shared between the processes
7
A Shell Application: Watch for Users (cont.)
● Execution
8
A Shell Application: Watch for Users (cont.)
● who | sort > prev
○ Tells the three things to the shell:
■ 1) Run the commands who and sort
at the same time
■ 2) Send the output of who directly to
< Connecting output of who to input of sort >
the input to sort
● Not necessary to finish analyzing the utmp file before sort begins its task
● The two processes are scheduled to run in small time slices, sharing CPU time with other
processes on the system
■ 3) Send the output of sort into a file, called prev: what if it already exists?
9
A Shell Application: Watch for Users (cont.)
● comm: a command to find lines common to two sorted files
○ Compares two sorted lists and prints out the three columns
○ In the example, it produces exactly the two sets we want
■ Logouts: who did leave? (comm -23 prev current)
■ Logins: who are new? (comm -13 prev current)
< comm compares two lists and outputs three sets >
10
A Shell Application: Watch for Users (cont.)
● Three important ideas behind watch.sh
○ (1) Power of shell scripts
■ Easier and quicker than C (or other programming languages requiring compiling)
○ (2) Flexibility of software tools
■ Each tool (or command or program) does one specific, general task
○ (3) Use and value of I/O redirection and pipes
● And one more …
○ The script shows how to use the ‘>’ operator to treat files as variables of arbitrary
size and structure
11
A Shell Application: Watch for Users (cont.)
● Questions
○ How does all of these connected programs jointly work?
○ What role does the shell play in connecting processes?
○ What role does the kernel play to get the processes to work?
○ What role do the individual programs play?
12
Facts about Standard I/O & Redirection
● All Linux/Unix I/O redirection based on
○ Principle of standard streams of data
○ e.g., The task of sort
■ Reads bytes from one stream of data
● Then it performs the sorting task on the read byte stream
■ Writes the sorted results to another stream
■ Reports any errors to a third stream
○ The three channels for data flow are as follows:
■ Standard input: The stream of data to process
< A software tool reads input and writes
■ Standard output: The stream of result data
output and errors >
■ Standard error: A stream of error messages
13
Fact 1: 3 Standard File Descriptors
● All Linux/Unix tools make use of the three-stream model
○ Each of the stream is a specific file descriptor (fd)
○ Linux/Unix tools find file descriptors 0, 1, and 2 already open for reading,
writing, and writing, respectively
15
[Remind] Terminal (tty)
● A terminal driver operates two queues
○ One for input characters transmitted
from the terminal device to the reading
process(es) and the other for output characters
transmitted from processes to the terminal
16
The Shell, Not the Program, Redirects I/O
● cmd > filename
○ Tells the shell to attach fd 1 to a file
■ By using the output redirection notation as specified above
17
The Shell, Not the Program, Redirects I/O (cont.)
● listargs prints to standard output the list of command-line
arguments
○ Does not print the redirection symbol
and filename
18
Some Important Facts …
● The shell doesn’t pass the redirection symbol and filename to the
command
● The redirection request can appear anywhere in the command.
○ Doesn’t require spaces around the redirection symbol (>)
■ Even a command like ‘> listing ls’ is acceptable
○ Doesn’t terminate the command and arguments: just an added request
● Many shells provide notation for redirecting other fds
○ e.g., 2>filename
■ Redirects fd 2, that is, standard error, to the named file
19
Understanding I/O Redirection
● Goal
○ Understand how I/O redirection works
AND learn how to write programs that use it
● Method: write programs that do
○ sort < data attach stdin to a file
○ who > userlist attach stdout to a file
○ who | sort attach stdout to stdin
20
Fact 2: the “Lowest-Available-fd” Principle
● The meaning of a file descriptor? An array index!
○ Each process has a collection of files it has open
■ Those “open” files are kept in an array
○ So a file descriptor: simply an index for an item in that array
○ Making a new connection with file descriptors is like receiving a connection on a multiline phone
the next incoming call => mapped to the lowest available line
22
How to Attach stdin to a File (cont.)
● How does a Linux program redirect stdin in order for data to
come from a file?
○ Linux processes don’t read from files, but actually from file descriptors
○ If fd 0 is attached to a file, then the attached file becomes the source for
standard input
● There are three methods for attaching stdin to a file
○ Method 1: Close-then-open
○ Method 2: Open-close-dup-close
○ Method 3: Open-dup2-close
23
Method 1: Close-then-Open
● Step 1) Starting with the three standard streams connected to the
terminal driver
○ File descriptors 0, 1, 2 attached to /dev/tty
■ 0 for reading
■ 1 for writing
■ 2 for writing
25
Method 1: Close-then-Open (cont.)
● Step 3) Finally, open(filename, O_RDONLY)
○ If the process opens another file,
■ that connection is attached to
the FIRST FREE entry
in the array of I/O channels
26
Method 1: Close-then-Open (cont.)
● stdinredir1.c
What’s returned?
27
Method 1: Close-then-Open (cont.)
● Execution
28
Method 2: Open..close..dup..close
● The system call dup makes a second connection to an existing file
descriptor
○ open(file): open a file to which stdin should be attached
■ Will return a file descriptor with a non-zero, as 0 is still in use
○ close(0): close fd 0, which becomes now “unused”
○ dup(fd): makes a “duplicate” of fd
■ Uses the lowest but not yet used number for fd
■ So what would be the number?
● The duplicate of the connection to the file: located at spot 0 in the array open files
○ close(fd): invokes close(fd), the original connection to the file
■ Leaving only the connection to file descriptor 0
29
Method 2: Open..close..dup..close
(cont.)
31
Method 2: Open..close..dup..close
(cont.)
● Execution
32
Method 3: Open..dup2..close
● The code for stdinredir2.c includes #ifdef-ed code
○ to replace the close(0) and dup(fd) system calls with dup2 (fd,
0)
33
System Call Summary: dup
< A process about to fork and its standard output > < Standard output of child is copied from parent >
36
Redirecting I/O for Another Program:
who > userlist (cont.)
● Step 3. After child calls ● Step 4. After child calls
close(1): child can close its creat(“g”,m): child opens a
stdout new file, taking fd = 1
< The cold can close its standard output > < Child opens a new file, getting fd = 1 >
37
Redirecting I/O for Another Program:
who > userlist (cont.)
● Step 5. After child execs a new program (e.g., who)
○ child runs a program with the new
standard output (e.g., userlist)
39
Redirecting I/O for Another Program:
who > userlist (cont.)
● Execution
40
Summary of Redirection to Files
● (1) File descriptors 0, 1, and 2 represent standard input, output, and
error, respectively
● (2) The kernel always uses the lowest numbered unused file
descriptor
● (3) The set of file descriptors is passed unchanged across exec calls
○ To make I/O redirection to another program, the shell uses the interval in
the child process between fork and exec
■ Reason: for the purpose of attaching the standard data streams to (external) files
41
What is a Pipe? How It Works?
● Pipe
○ “One-way” data channel in the kernel
○ Has a “writing” end and “reading” end
■ e.g., who | sort
45
Creating a Pipe (cont.)
● Execution
○ Creates a pipe and then
○ Uses the pipe to send the data itself
46
Creating a Pipe (cont.)
● Depicts the flow of bytes:
○ From keyboard to process: 1 → 2 apipe[1]
● A pipe is “most effective” when one process writes data and the
other processes reads the data on the same host
○ Of course, processes can read and write together
49
Using fork to Share a Pipe (cont.)
● Shows how to combine pipe and fork
○ To create a pair of processes via pipe communication
● pipedemo2.c
50
Using fork to Share a Pipe (cont.)
● pipedemo2.c (cont.)
51
Using fork to Share a Pipe (cont.)
● Execution
52
The Finale: Combining All Skills
● Let’s write a general-purpose program, called pipe.
○ It takes the names of two programs as arguments in the following:
pipe who sort
pipe ls head
○ The logic of the program as follows:
53
The Finale: Combining All Skills (cont.)
● pipe.c
54
The Finale: Combining All Skills (cont.)
● pipe.c (cont.)
55
The Finale: Combining All Skills (cont.)
● Execution
56
Similarities between Pipes and Files
● Pipes look like regular files
○ Use write() to put data into a pipe
○ Use read() to get the data from a pipe
○ Appears as a sequence of bytes without any particular block or record
57
Differences between Pipes and Files
● Reading from Pipes
○ 1. read on a pipe blocks
■ When a process tries to read from a pipe, the call blocks until some bytes are written into
the pipe
○ 2. Reading EOF on a pipe
■ When all writers close the writing end of the pipe, attempts to read from the pipe return 0,
which means the end of file
○ 3. Multiple readers can cause trouble
■ A pipe is queue: first-in-first-out structure
■ When a process reads bytes from a pipe, those bytes (after reading) will be gone in the pipe
■ If two processes try to read from the same pipe, one process will get some of the bytes from
the pipe, and the other process get the other bytes
58
Differences between Pipes and Files (cont.)
● Writing to Pipes
○ 4. write to a pipe blocks until there is space
■ Pipes have a finite capacity, far lower than the file-size limit on disk files
■ The write call to a pipe will get blocked until enough space is prepared
○ 5. write guarantees a minimum chunk size
■ The kernel will not split up chunks of data into blocks no smaller than 512 bytes
■ Linux guarantees an unbroken buffer size of 4K bytes for pipes
○ 6. write fails if no readers
■ If all readers have closed the reading ends of pipe, then an attempt to write to the pipe can
lead to trouble
■ Kernel’s two methods of notifying a process that write is no long valid:
● 1) Sends SIGPIPE to that process, which will terminate
● 2) If the kernel doesn’t kill the process, then write returns -1 and sets errno to EPIPE
59
Summary
● Input/output redirection allows separate programs to work as a
team, each program a specialist
● Linux assumes that programs read input from fd 0 (stdin), write
results to fd 1 (stdout), and report errors fd 2 (stderr)
● The log-in procedure sets up fds 0, 1, and 2
○ These connections and all open file descriptors are passed from parent to
child and across the exec system call
60
Summary (cont.)
● System calls creating fds always use the lowest-number free fd
● Redirecting std input/output/error means changing where fds 0, 1,
or 2 connect
● Pipe is a data queue in the kernel with each end attached to a fd
○ pipe system call can create a pipe
● Both ends of a pipe are copied to a child process when the parent
calls fork
● Pipes can only connect processes sharing a common parent
61