Day 08 - Binary IO
Day 08 - Binary IO
9
10
❖ Introduction to Binary I/O
❖ Text I/O vs. Binary I/O
❖ Binary I/O Classes
❖ Object I/O
❖ Random-Access File
❖ Final Touches
Text I/O vs. Binary I/O 11
❖ Computers do not differentiate between binary files and text files. All files are stored in
binary format, and thus all files are essentially binary files. Text I/O is built upon binary
I/O to provide a level of abstraction for character encoding and decoding, as shown in
Figure below. Encoding and decoding are automatically performed for text I/O. The
JVM converts Unicode to a file-specific encoding when writing a character, and it
converts a file-specific encoding to Unicode when reading a character.
Proposition
❑ Suppose you write the string "199" using text I/O to a file, each character is written to the file. Since the Unicode for character 1
is 0x0031, the Unicode 0x0031 is converted to a code that depends on the encoding scheme for the file. (Note that the prefix 0x
denotes a hex number.) In the United States, the default encoding for text files on Windows is ASCII. The ASCII code for
character 1 is 49 (0x31 in hex) and for character 9 is 57 (0x39 in hex). Thus, to write the characters 199, three bytes - 0x31, 0x39,
and 0x39 - are sent to the output.
Text I/O vs. Binary I/O 12
❖ Binary I/O does not require conversions. If you write a numeric value to a file using
binary I/O, the exact value in the memory is copied into the file. For example, a byte-
type value 199 is represented as 0xC7 (199 = 12 * 161 + 7) in the memory and appears
exactly as 0xC7 in the file, as shown in Figure below. When you read a byte using binary
I/O, one byte value is read from the input.
Text I/O vs. Binary I/O 13
❖ In general, you should use text input to read a file created by a text editor or a text output
program, and use binary input to read a file created by a Java binary output program.
❖ On the other hand, binary I/O is more efficient than text I/O, because binary I/O does not
require encoding and decoding. Binary files are independent of the encoding scheme on
the host machine and thus are portable. Java programs on any machine can read a binary
file created by a Java program. This is why Java class files are binary files. Java class
files can run on a JVM on any machine.
14
❖ Introduction to Binary I/O
❖ Text I/O vs. Binary I/O
❖ Binary I/O Classes
❖ Object I/O
❖ Random-Access File
❖ Final Touches
Binary I/O Classes 15
❖ The binary I/O classes has two “root” classes: InputStream and OutputStream. The
abstract InputStream is the root class for reading binary data, and the abstract
OutputStream is the root class for writing binary data. The design of the Java I/O
classes is a good example of applying inheritance, where common operations are
generalized in superclasses, and subclasses provide specialized operations.
try (
// Create an input stream for the file
FileInputStream input = new FileInputStream("temp.dat");) {
// Read values from the file
int value;
while ((value = input.read()) != -1)
System.out.print(value + " ");
}
}
}
25
Nobita
I’ve been using FileInputStream and FileOutputStream to read and write
files. They work fine, but I noticed that I have to handle everything as
raw bytes. Isn’t there a better way to handle data like numbers or text
directly without converting them?
Good question! While they are great for raw data, they’re not
ideal when you need to work with higher-level data types like
integers, floating-point numbers, or even strings. In this case,
we can use DataInputStream and DataOutputStream.
Nobita
Delivered
Oh, interesting! So, how do
they make things easier?
Well, instead of manually converting
data into bytes, DataInputStream and
DataOutputStream allow you to read
and write primitive data types directly.
Delivered
Binary I/O Classes - DataInputStream/DataOutputStream 26
❖ Filter streams are streams that filter bytes for some purpose. The basic byte input stream provides a read
method that can be used only for reading bytes. If you want to read integers, doubles, or strings, you need
a filter class to wrap the byte input stream. Using a filter class enables you to read integers, doubles, and
strings instead of bytes and characters.
❖ FilterInputStream and FilterOutputStream are the base classes for filtering data. When you need to
process primitive numeric types, use DataInputStream and DataOutputStream to filter bytes.
DataInputStream reads bytes from the stream and converts them into appropriate primitive-type values or
strings. DataOutputStream converts primitive-type values or strings into bytes and outputs the bytes to
the stream.
Binary I/O Classes - DataInputStream/DataOutputStream 27
❖ DataInputStream implements the methods defined in the DataInput interface to read
primitive data-type values and strings. DataOutputStream implements the methods
defined in the DataOutput interface to write primitive data-type values and strings.
Primitive values are copied from memory to the output without any conversions.
Binary I/O Classes - DataInputStream/DataOutputStream 28
❖ Characters in a string may be written in several ways since a Unicode character consists
of two bytes.
❑ The writeChar(char c) method writes the Unicode of character c to the output.
❑ The writeChars(String s) method writes the Unicode for each character in the string s
to the output.
❑ The writeBytes(String s) method writes the lower byte of the Unicode for each
character in the string s to the output. The high byte of the Unicode is discarded. The
writeBytes method is suitable for strings that consist of ASCII characters, since an
ASCII code is stored only in the lower byte of a Unicode. If a string consists of non-
ASCII characters, you have to use the writeChars method to write the string.
❑ The writeUTF(String s) method writes two bytes of length information to the output
stream, followed by the modified UTF-8 representation of every character in the
string s.
Binary I/O Classes - DataInputStream/DataOutputStream 29
❖ For example,
import java.io.*;
Proposition
❑ BufferedInputStream/BufferedOutputStream does not contain new methods. All the methods in
BufferedInputStream/BufferedOutputStream are inherited from the InputStream/OutputStream classes.
BufferedInputStream/BufferedOutputStream manages a buffer behind the scene and automatically
reads/writes data from/to disk on demand.
That’s almost everything about Binary I/O,
Do you have any question?
32
33
Uhm… Good! Now, let's take a step back and see the bigger picture.
We've explored many essential classes for handling different types
of data. But there's one important element we haven't touched on
yet: Object I/O. While we've mastered reading and writing
primitive data types and bytes, what about entire objects?
34
❖ Introduction to Binary I/O
❖ Text I/O vs. Binary I/O
❖ Binary I/O Classes
❖ Object I/O
❖ Random-Access File
❖ Final Touches
Object I/O 35
❖ While DataInputStream/DataOutputStream enables you to perform I/O for primitive
type values and strings. ObjectInputStream/ObjectOutputStream enables you to
perform I/O for objects in addition to primitive-type values and strings. Since Object I/O
Stream contains all the functions of Data I/O Stream, you can replace Data I/O Stream
completely with Object I/O Stream.
You know, all of the streams you have used so far are known as read-only or
write-only streams. These streams are called sequential streams. A file that is
opened using a sequential stream is called a sequential-access file. The contents of
a sequential-access file cannot be updated. However, it is often necessary to
modify files. Java provides the RandomAccessFile class to allow data to be read
from and written to at any locations in a file. A file that is opened using the
RandomAccessFile class is known as a random-access file.
41
❖ Introduction to Binary I/O
❖ Text I/O vs. Binary I/O
❖ Binary I/O Classes
❖ Object I/O
❖ Random-Access File
❖ Final Touches
Random-Access Files 42
❖ Java provides the RandomAccessFile class to allow data to be read from and written to
at any locations in the file. The RandomAccessFile class implements the DataInput and
DataOutput interfaces, therefore, it can use all the methods as shown below:
When creating a
RandomAccessFile, you can specify
one of two modes: r or rw. Mode r
means that the stream is read-only,
and mode rw indicates that the
stream allows both read and write.
Random-Access Files 43
❖ A random-access file consists of a sequence of bytes. A special marker called a file
pointer is positioned at one of these bytes. A read or write operation takes place at the
location of the file pointer. When a file is opened, the file pointer is set at the beginning
of the file. When you read or write data to the file, the file pointer moves forward to the
next data item. For example, if you read an int value using readInt(), the JVM reads 4
bytes from the file pointer, and now the file pointer is 4 bytes ahead of the previous
location, as shown below:
44
47
Final Touches 48
❖UTF-8
❑ UTF-8 is a coding scheme that allows systems to operate with both ASCII and Unicode. Most operating systems
use ASCII. Java uses Unicode. The ASCII character set is a subset of the Unicode character set. Since most
applications need only the ASCII character set, it is a waste to represent an 8-bit ASCII character as a 16-bit
Unicode character. The modified UTF-8 scheme stores a character using one, two, or three bytes. Characters are
coded in one byte if their code is less than or equal to 0x7F, in two bytes if their code is greater than 0x7F and
less than or equal to 0x7FF, or in three bytes if their code is greater than 0x7FF.
❑ The initial bits of a UTF-8 character indicate whether a character is stored in one byte, two bytes, or three bytes.
If the first bit is 0, it is a one-byte character. If the first bits are 110, it is the first byte of a two-byte sequence. If
the first bits are 1110, it is the first byte of a threebyte sequence. The information that indicates the number of
characters in a string is stored in the first two bytes preceding the UTF-8 characters. For example,
writeUTF("ABCDEF") actually writes eight bytes (i.e., 00 06 41 42 43 44 45 46) to the file, because the first two
bytes store the number of characters in the string.
❑ The writeUTF(String s) method converts a string into a series of bytes in the UTF-8 format and writes them into
an output stream. The readUTF() method reads a string that has been written using the writeUTF method.
❑ The UTF-8 format has the advantage of saving a byte for each ASCII character, because a Unicode character
takes up two bytes and an ASCII character in UTF-8 only one byte. If most of the characters in a long string are
regular ASCII characters, using UTF-8 is more efficient.
Final Touches 49
❖ Detecting the End of a File
❑ If you keep reading data at the end of an InputStream, an EOFException will occur. This exception
can be used to detect the end of a file.
Final Touches 50
❖Serialization
❑ Many classes in the Java API implement Serializable. All the wrapper classes for
primitive type values, java.math.BigInteger, java.math.BigDecimal,
java.lang.String, java.lang.StringBuilder, java.lang.StringBuffer, java.util.Date, and
java.util.ArrayList implement java.io.Serializable. Attempting to store an object that
does not support the Serializable interface would cause a NotSerializableException.
❑ When a serializable object is stored, the class of the object is encoded; this includes
the class name and the signature of the class, the values of the object’s instance
variables, and the closure of any other objects referenced by the object. The values
of the object’s static variables are not stored.
❑ If an object is an instance of Serializable but contains nonserializable instance data
fields, can it be serialized? The answer is no. To enable the object to be serialized,
mark these data fields with the transient keyword to tell the JVM to ignore them
when writing the object to an object stream.
Final Touches 51
❖Serialization
❑ If an object is written to an object stream more than once, will it be stored in
multiple copies? No, it will not. When an object is written for the first time, a serial
number is created for it. The JVM writes the complete contents of the object along
with the serial number into the object stream. After the first time, only the serial
number is stored if the same object is written again. When the objects are read
back, their references are the same since only one object is actually created in the
memory.
❑ For consistency, this course uses the extension .txt to name text files and .dat
to name binary files.
❑ When a stream is no longer needed, always close it using the close() method or
automatically close it using a try-with-resource statement. Not closing streams may
cause data corruption in the output file, or other programming errors.
Final Touches 52
❖Binary I/O Classes
❑ An instance of FileInputStream can be used as an argument to construct a Scanner,
and an instance of FileOutputStream can be used as an argument to construct a
PrintWriter. You can create a PrintWriter to append text into a file using
new PrintWriter(new FileOutputStream("temp.txt", true));
If temp.txt does not exist, it is created. If temp.txt already exists, new data are
appended to the file.
❑ You have to read data in the same order and format in which they are stored. For
example, since names are written in UTF-8 using writeUTF, you must read names
using readUTF.
❑ You should always use buffered I/O to speed up input and output. For small files,
you may not notice performance improvements. However, for large files—over 100
MB— you will see substantial improvements using buffered I/O.
Final Touches 53
❖Object I/O
❑ Multiple objects or primitives can be written to the stream. The objects must be read
back from the corresponding ObjectInputStream with the same types and in the
same order as they were written. Java’s safe casting should be used to get the desired
type.
❑ The readObject() method may throw java.lang.ClassNotFoundException, because
when the JVM restores an object, it first loads the class for the object if the class has
not been loaded. Since ClassNotFoundException is a checked exception, the main
method declares to throw it. An ObjectInputStream is created to read input from
object.dat. You have to read the data from the file in the same order and format as
they were written to the file. A string, a double value, and an object are read. Since
readObject() returns an Object, it is cast and assigned to a variable.
Final Touches 54
❖Random-Access Files
public static void main(String[] args) throws IOException {
try ( // Create a random access file
RandomAccessFile inout = new RandomAccessFile("inout.dat", "rw");) {
// Clear the file to destroy the old contents if exists
inout.setLength(0);
Thanks!
Any questions?
For an in-depth understanding of Java, I highly recommend
referring to the textbooks. This slide provides a brief overview
and may not cover all the details you're eager to explore!