Cobol Vsam File Access
Cobol Vsam File Access
Cobol Vsam File Access
edu/~mccloske/courses
/cmps340/lecture_notes/cobol_files.html
COBOL
Using Files
• File Descriptors
• File Organizations and Access Modes
• File Open Modes and I/O Operations/Verbs
• I/O operations on SEQUENTIAL files
• I/O operations on INDEXED files
• Random Access form of READ, WRITE, and REWRITE for INDEXED files
• Sequential Access form of READ, WRITE, and REWRITE for INDEXED files
• Random observations based on program testing
• Random Access form of READ, WRITE, and REWRITE for RELATIVE files
File Descriptors
The DATA DIVISION of a COBOL (sub)program contains two sections, the FILE
SECTION and the WORKING-STORGAGE SECTION. The latter is used to describe,
via "data description entries" (level numbers, PICTURE clauses, etc.), the hierarchical
structure of data items that exist during execution of the program. The former is used to
describe, in a similar way, the layout of records in any files that the program uses. For
each file that the program uses, the FILE SECTION contains a "file description entry",
the beginning of which is signaled by the keyword FD. The typical form of such an entry
(the general form includes a number of optional clauses not shown here) is as follows:
FD <file-name>
[RECORD CONTAINS <integer-literal> CHARACTERS]
[DATA RECORD IS <data-record-name>].
Note on notation: Square brackets surrounding an entity indicate that its appearance is
optional.
Immediately after the file description entry comes the "data description entry" for the
file's data record (beginning with the level number 01). Here is a typical example:
FD Employee-File
RECORD CONTAINS 65 CHARACTERS
DATA RECORD IS Employee-Rec.
01 Employee-Rec.
02 Employee-ID PIC X(10).
02 Employee-Name.
03 Last-Name PIC X(20).
03 First-Name PIC X(12).
03 Middle-Init PIC X.
02 Position.
03 Job-Code PIC X(4).
03 Department PIC X(3).
03 Manager-ID PIC X(10).
02 Hourly-Pay PIC 9(3)V99.
The above says that Employee-File is a file in which each record has a length of 65
characters, with the first ten containing an employee ID, the next twenty containing an
employee's last name, etc., etc.
When a COBOL program executes, enough main memory is allocated to hold not only
the data items described in the WORKING-STORAGE SECTION but also those
described in the FILE SECTION (i.e., one data record from each file). Thus, one could
view the data record of a file as being a one-record buffer for that file. When a record is
retrieved from a file (via the READ verb), it is placed into the file's data record.
Similarly, when a record is written to a file (via the WRITE verb), it is the contents of the
file's data record that are written into the file.
Note: From class discussion, you should recall that there are also file buffers that are not
directly accessible by the application programmer. An input buffer holds (typically)
several records that already have been read in (physically) but are waiting to be read in
logically (via READ) by the COBOL program. An output buffer holds (typically) several
records that already have been written logically (via WRITE) by the COBOL program
but are waiting to be written (physically) into a file.
A file's organization (i.e., the way it is structured) imposes restrictions upon how it can be
accessed (i.e., upon which access modes are applicable to it).
A file whose organization is SEQUENTIAL (which is the default) allows only the
SEQUENTIAL access mode, which means that its records may be accessed (i.e., read or
written) only in logical order, one after another. (This restriction makes sense, as such a
file has no index (or any other auxiliary fast-search-enabling structure) associated with it
to allow for efficient access to arbitrary records.)
An INDEXED file is one for which an index exists, thereby making it possible to locate a
record quickly, given the value of its key field (i.e., the indexing field). A RELATIVE file
is one that allows access by relative record number (RRN).
A file whose organization is INDEXED or RELATIVE allows any of the three access
modes to be applied to it: SEQUENTIAL, RANDOM, or DYNAMIC. The notion of
SEQUENTIAL access, as it applies to INDEXED and RELATIVE files, is the same as
with SEQUENTIAL files: records are accessed in their logical order. In INDEXED files,
the logical order of records corresponds to increasing order of key field value. (For
example, if Employee-ID were the key field of the Employee-File described above, then
the record containing 'Jones00001' in that field would occur before the record
containing 'Simpson012', as the former value is less than the latter according to
COBOL's rules for ordering character strings.) In RELATIVE files, the logical order of
records corresponds to their RRN's, with record i coming before record j if and only if i
< j.
As for RANDOM access, in the case of an INDEXED file it means access according to
the value stored in the field that is specified as the key of the file (in the SELECT
statement for the file). (Such a file has an index for which its key field is the indexing
field.) For example, if Employee-File (see below) has as its key the field Employee-ID
(an alphanumeric string of length ten), we have the ability to READ or WRITE a record
whose Employee-ID field contains a specified value, such as 'Simpson032'.
In the case of a RELATIVE file, RANDOM access means access according to the logical
position of a record within the file. A position is given in terms of a relative record
number (RRN), which is simply a positive integer. For example, we can issue a command
to READ or WRITE the record in position 327.
The organization of a file and the access mode to be used on that file by a particular
COBOL program are specified in a SELECT statement appearing in the FILE-
CONTROL paragraph of the INPUT-OUTPUT SECTION in the ENVIRONMENT
DIVISION. The form taken by the SELECT statement depends upon the file's
organization. (Note: In order to keep things simple, we do not describe the SELECT
statement in all its generality.) For a sequential file, it looks like this:
The data item specified in the FILE STATUS clause should be one defined with a PIC
X(2) picture clause. Each time an I/O operation is performed on the file, a two-digit code,
called the file status code, is placed into this data item. The file status code indicates
whether the operation completed successfully (value "00") or whether something
"unusual" occurred (e.g., value "41" indicates an attempt to OPEN a file that was already
open, "10" indicates the end-of-file condition, etc., etc.). For more details, see page 301
of Comprehensive COBOL.
The form taken by the SELECT statement when the file has INDEXED organization is
this:
For example, the SELECT statement for the Employee file mentioned above might look
like this:
SELECT Employee-File
ASSIGN TO "Employees.dat"
ORGANIZATION IS INDEXED
ACCESS MODE IS RANDOM
RECORD KEY IS Employee-ID.
The data-name specified in the RECORD KEY clause must be one of the fields within
the file's data record; it must be (or becomes, if the file doesn't yet exist) an indexing field
of the file (which is to say that, if the file already exists, so must an index on that field).
The form taken by the SELECT statement when the file has RELATIVE organization is
one of these two:
That is, for a RELATIVE file, if SEQUENTIAL access mode is chosen, specifying its
RELATIVE KEY is optional (and seemingly useless!), but specifying the RELATIVE
KEY is mandatory if the access mode is RANDOM or DYNAMIC. Whenever random
access is made to a RELATIVE file, the contents of the field that was identified as its
RELATIVE KEY are taken to be the RRN of the record to be accessed.
Note: Simply including the clause ORGANIZATION IS INDEXED (or RELATIVE),
when SELECT-ing a file, does not magically transform the specified file into one having
the appropriate structure. If, for example, you created a file using a standard file editor
and then tried to SELECT it using the ORGANIZATION IS INDEXED (or RELATIVE)
clause within the SELECT statement, you would not achieve the desired results. Rather,
to construct an INDEXED (or RELATIVE) file, you would create it via the execution of
some COBOL program in which the file is opened for OUTPUT and records are written
to that file. For an example, see this program, which creates a new INDEXED file,
populating it with the records in an already-existing text file. End of Note.
There are four "file open modes": INPUT, OUTPUT, EXTEND, and I-O. A COBOL
program "announces its intention" to access a file by opening it, via the OPEN verb.
When opening a file, one of these four modes must be specified, as in
When the program is finished using a file (perhaps only temporarily) it closes it via the
CLOSE verb, as in
CLOSE Course-File
A file opened in INPUT mode is one that may be accessed only via the READ verb (plus
the START verb, if the file is INDEXED or RELATIVE). A file opened in OUTPUT
mode is one that may be accessed only via the WRITE verb; furthermore, if the file
existed prior to being opened, its contents are destroyed (so that, when execution ends,
the file contains only those records written to the file during execution of the program). A
file opened in EXTEND mode, which applies only to SEQUENTIAL files, is one that
may be accessed only via the WRITE verb; furthermore, the file must have existed prior
to being opened (unless the word OPTIONAL appeared in the SELECT statement for that
file), and any records written to it during execution are placed after the ones already
there. (Note: A file opened in I-O mode is one on which both reading and writing of
records may be carried out, via the READ and REWRITE verbs. (The WRITE and
START verbs may be applied, too, if the file is INDEXED or RELATIVE.)
Note that a file may be opened more than once during execution of a program, possibly
with different open modes each time. However, a file that is open must be closed (via the
CLOSE verb) before it can be opened again. For example, a program may open a file for
OUTPUT, write records into it, close it, open it for INPUT, and then read records from it.
The OPEN and CLOSE verbs were described above (although not in full generality---see
a COBOL reference for more details).
Here we consider the remaining verbs that may be applied to a SEQUENTIAL file:
READ, WRITE, and REWRITE. Which of these three operations are applicable to a file
depends upon the mode into which the file was opened:
+--------------------------------------+
| M o d e |
Operation | |
| INPUT OUTPUT EXTEND I-O |
+---------+---------+----------+-------+
READ | x | | | x |
+---------+---------+----------+-------+
WRITE | | x | x | |
+---------+---------+----------+-------+
REWRITE | | | | x |
+---------+---------+----------+-------+
Example:
READ Employee-File
AT END SET Employee-Eof TO TRUE
NOT AT END PERFORM Process-Employee
END-READ
The effect of this command is as follows:
Note: As mentioned above, by the file's "data record" we mean the 01-level data item
declared in the data description entry immediately following the file's file description
entry (the stuff coming after the keyword FD). In the example above, the data record is
Employee-Rec.
The presence or absence of the word NEXT within this form of the READ statement makes
no difference.
Note that, in COBOL, the end-of-file condition does not become true until an attempt is
made to READ beyond the last record in the file. (When this attempt is made, the
imperative statement following the AT END clause of the READ statement is executed.)
This is in contrast to Ada and Pascal, in which the end-of-file condition becomes true
immediately after the last record has been read. For this reason, a typical file processing
loop in COBOL has a somewhat different form than an equivalent loop in Ada or Pascal.
Consider this Ada-like pseudocode:
In order to make the program on the left a little more concise, we could place the READ
statement into a separate paragraph ---call it Read-f-Rec--- and then replace each of the
two occurrences of the READ statement by PERFORM Read-f-Rec.
Syntactic format of the WRITE verb applied to a SEQUENTIAL file (necessarily opened
in OUTPUT or EXTEND mode):
The effect is that the file's data record (after the specified data item has been copied into
it, if the FROM clause is present) is written at the end of the file (i.e., after the last record
in the file). Recall that opening a file in OUTPUT mode destroys the file's previous
contents, whereas opening a file in EXTEND mode leaves its contents intact, allowing
the program to write new records after the ones already there.
Note that the WRITE verb cannot be applied to a SEQUENTIAL file opened in I-O
mode, as this mode allows REWRITE-ing but not WRITE-ing.
The effect is that the file's data record (or, if the FROM clause is present, the specified
data item) is written to the file, replacing the record most recently read from the file. An
example program that uses the REWRITE verb appears within the course web pages.
Note that, for reasons that I have never seen explained anywhere, the READ verb refers
to the file whereas the WRITE and REWRITE verbs refer to the file's data record.
As noted above, an INDEXED file may have any of three access modes
---SEQUENTIAL, RANDOM, or DYNAMIC--- and may be opened in any of three
modes ---INPUT, OUTPUT, or I-O. Which I/O operations are applicable to an
INDEXED file depend upon both its access mode and its open mode:
+-------------------------------------+
File | | O p e n M o d e |
Access | | |
Mode | Verb | INPUT OUTPUT I-O |
+---------+---------+---------+-------+
SEQUENTIAL | READ | x | | x | (sequential form
only)
| WRITE | | x | | (sequential form
only)
| REWRITE | | | x | (sequential form
only)
| DELETE | | | x |
| START | x | | x | (surprising!)
+---------+---------+---------+-------+
RANDOM | READ | x | | x | (random form only)
| WRITE | | ? | x | (random form only)
| REWRITE | | | x | (random form only)
| DELETE | | | x |
| START | | | |
+---------+---------+---------+-------+
DYNAMIC | READ | x | | x | (either form)
| WRITE | | x | x | (either form)
| REWRITE | | | x | (either form)
| DELETE | | | x |
| START | x | | x |
+---------+---------+---------+-------+
As suggested in the remarks to the right of the table above, each of the READ, WRITE,
and REWRITE verbs has two forms, one for sequential access and one for random
access.
This section pertains to an INDEXED file for which, in the program under consideration,
the ACCESS MODE has been specified to be either RANDOM or DYNAMIC.
To read a record ---with a specified value in its key field--- from an INDEXED file
opened in either INPUT or I-O mode:
1. Place desired value into the key field (in the file's data record)
2. READ <file-name> [INTO data-name]
3. [INVALID KEY <imperative statement>]
4. [NOT INVALID KEY <imperative statement>]
5. END-READ
For example,
DISPLAY 'Enter course ID:' WITH NO ADVANCING
ACCEPT Course-ID
READ Course-File
INVALID KEY DISPLAY 'No such record'
NOT INVALID KEY PERFORM Display-Course-Rec
END-READ
The effect is that, if a record with the specified value in the key field exists in the file,
that record is read into the file's data record (and is then copied into the data item
specified in the INTO clause, if present), and, if present, the imperative statement in the
NOT INVALID KEY clause is executed. Otherwise, if the INVALID KEY clause is
present, the imperative statement there is executed.
To write a record into an INDEXED file opened in I-O (or OUTPUT??) mode:
1. Place desired contents into file's data record (or the data item specified in the
FROM clause).
2. WRITE <data-record> [FROM data-name]
3. [INVALID KEY <imperative statement>]
4. [NOT INVALID KEY <imperative statement>]
5. END-WRITE
Example:
The effect is that, if the file contains no record whose key field matches that currently in
the file's data record (or, in the case that the FROM clause is present, that currently in the
specified data item), the data record is written, as a new record, into the file, and, if the
NOT INVALID KEY clause is present, the imperative statement there is executed.
Otherwise (i.e., there exists a record in the file whose key field equals that of the data
record), if the INVALID KEY clause is present, the imperative statement there is
executed.
Example:
REWRITE Employee-Rec
INVALID KEY DISPLAY 'Cannot REWRITE; no record with that key
exists'
NOT INVALID KEY DISPLAY 'Record rewritten successfully'
END-REWRITE
The effect is that, if there exists a record in the file having the same value in its key field
as the file's data record (or, in the case that the FROM clause is present, the data item
specified there), that record is replaced by the contents of the data record (or the FROM
data item) and the NOT INVALID KEY clause's imperative statement is executed.
Otherwise, the INVALID KEY clause's imperative statement is executed.
The difference between REWRITE and WRITE is that the former can only replace an
existing record whereas the latter can only insert a new record.
1. READ the record (into the file's data record) (Question: Depending upon the
implementation, it may suffice to place the desired value into the key field of the
file's data record, without necessarily doing so by reading the corresponding
record. However, it is a good idea to READ first anyway, just to verify that the
record to be deleted is really there.)
2. DELETE <file-name>
3. [INVALID KEY <imperative statement>]
4. [NOT INVALID KEY <imperative statement>]
5. END-DELETE
Example:
The effect is that, if the file contains a record whose key field matches that of the file's
data record, that record is deleted from the file and the imperative statement in the NOT
INVALID KEY clause, if present, is executed. Otherwise, the imperative statement in the
INVALID KEY clause, if present, is executed.
Note that, in order to apply either the REWRITE or DELETE verb to a record, the most
recent I/O operation must have been a successful READ of that record. (Warning: This
statement may be incorrect.)
This section pertains to an INDEXED file for which, in the program under consideration,
the ACCESS MODE has been specified to be either SEQUENTIAL or DYNAMIC.
To position the file pointer (i.e., to seek) to the first record satisfying a specified
condition in an INDEXED file opened in I-O or INPUT mode:
Example:
The effect is to place the file pointer to the first record (i.e., the one having smallest key
value) satisfying the condition specified, so that a sequential READ will cause that to be
the record read in. If no record satisfies the specified condition (e.g., the key value sought
is larger than any in the file), the imperative statement in the INVALID KEY clause, if
present, is executed. Otherwise, the imperative statement in the NOT INVALID KEY
clause, if present, is executed.
To read "the next" record (i.e., the one following the record most recently read, or the one
"found" by an application of the START verb) in an INDEXED file opened in either
INPUT or I-O mode:
Example:
To replace the record most recently read from an INDEXED file opened in I-O mode:
To write a new record (necessarily having a larger key than any already in the file??) into
an INDEXED file opened in I-O (or OUTPUT?) mode:
A syntax error occurs if you attempt to open an INDEXED file in EXTEND mode.
When a file whose SELECT clause specifies DYNAMIC ACCESS mode is opened in
OUTPUT or I-O mode:
• SEQUENTIAL WRITE seems to place the record in the "right" place (according
to its key value), rather than at the end of file.
• Sequential REWRITE replaces the record at the current file pointer position (i.e.,
the position of the last record read in by either sequential or random READ, or the
one found via START)
• If sequential REWRITE immediately follows START, then a sequential READ
will read in the rewritten record! However, if REWRITE follows a READ (seq or
ran), the next record read by a SEQ READ is the following one.
• REWRITE (both kinds) need not be preceded by a READ of the record to be
rewritten. There seems to be no difference between the two kinds of REWRITEs!
Oddly, trying to REWRITE with a non-present key field causes ia run-time error,
rather than the INVALID KEY clause being fired.
• When in SEQUENTIAL ACCESS mode, for some reason the program must
include a random WRITE and/or REWRITE, if there is a sequential one.
However, I could not get WRITE (sequential or random) to work at all. Each
time, a run-time error occurred. However, both random and sequential REWRITE
works! (However, the key of new record must match that of record just read in.
Having seeked to a record via START is not enough.)
OPEN
To connect the VSAM data set to your COBOL program for processing.
WRITE
To add records to a file or load a file.
START
To establish the current location in the cluster for a READ NEXT statement.
START does not retrieve a record; it only sets the current record pointer.
All of the following factors determine which input and output statements you can use
for a given VSAM data set:
The following table shows the possible combinations of statements and open modes
for sequential files (ESDS). The X indicates that you can use a statement with the
open mode shown at the top of the column.
Access mode COBOL statement OPEN INPUT OPEN OUTPUT OPEN I-O OPEN EXTEND
Sequential OPEN X X X X
WRITE X X
START
READ X X
REWRITE X
DELETE
CLOSE X X X X
The following table shows the possible combinations of statements and open modes
you can use with indexed (KSDS) files and relative (RRDS) files. The X indicates that
you can use the statement with the open mode shown at the top of the column.
Access mode COBOL statement OPEN INPUT OPEN OUTPUT OPEN I-O OPEN EXTEND
Sequential OPEN X X X X
WRITE X X
START X X
READ X X
REWRITE X
DELETE X
CLOSE X X X X
Random OPEN X X X
WRITE X X
Table 2. I/O statements for VSAM relative and indexed files
Access mode COBOL statement OPEN INPUT OPEN OUTPUT OPEN I-O OPEN EXTEND
START
READ X X
REWRITE X
DELETE X
CLOSE X X X
Dynamic OPEN X X X
WRITE X X
START X X
READ X X
REWRITE X
DELETE X
CLOSE X X X
The fields that you code in the FILE STATUS clause are updated by VSAM after each
input-output statement to indicate the success or failure of the operation.
related concepts
File position indicator
related tasks
Opening a file (ESDS, KSDS, or RRDS)
Reading records from a VSAM file
Updating records in a VSAM file
Adding records to a VSAM file
Replacing records in a VSAM file
Deleting records from a VSAM file
Closing VSAM files
or or or