0% found this document useful (0 votes)
4 views

Day 08 - Binary IO

binary io trong java

Uploaded by

Thảo Phương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Day 08 - Binary IO

binary io trong java

Uploaded by

Thảo Phương
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Binary I/O

Faculty of Information Technology, Hanoi University


What You Are About To Achieve 2
❖ By the end of this lecture, we are going to:
❑ Understand the differences between Text I/O and Binary I/O.
❑ Learn how to work with core classes for binary data manipulation.
❑ Explore the concept of Object I/O, including Serialization and Deserialization.
❑ Implement file reading and writing operations using Binary I/O.
❑ Discover how to use Random-Access Files for non-sequential file operations.
3
❖ Introduction to Binary I/O
❖ Text I/O vs. Binary I/O
❖ Binary I/O Classes
❖ Object I/O
❖ Random-Access File
❖ Final Touches
Introduction to Binary I/O 4
❖ As mentioned a few week ago, files can be classified as either text or binary. A file that
can be processed (read, created, or modified) using a text editor such as Notepad on
Windows or vi on UNIX is called a text file. All the other files are called binary files.
You cannot read binary files using a text editor - they are designed to be read by
programs. For example, Java source programs are text files and can be read by a text
editor, but Java class files are binary files and are read by the JVM.
Introduction to Binary I/O 5
❖ Although it is not technically precise and correct,
you can envision a text file as consisting of a
sequence of characters and a binary file as
consisting of a sequence of bits. Characters in a text
file are encoded using a character encoding scheme
such as ASCII or Unicode. For example, the
decimal integer 199 is stored as a sequence of three
characters 1, 9, 9 in a text file, and the same integer
is stored as a byte-type value C7 in a binary file,
because decimal 199 equals hex C7 (199 = 12 * 161
+ 7). The advantage of binary files is that they are
more efficient to process than text files.
Introduction to Binary I/O 6
❖ Java offers many classes for performing file input and output. These can be categorized
as text I/O classes and binary I/O classes. In previous lecture, you learned how to read
and write strings and numeric values from/to a text file using Scanner and PrintWriter.
This lecture introduces the classes for performing binary I/O.
Introduction to Binary I/O 7
❖ Recall that a File object encapsulates the properties of a file or a path but does not
contain the methods for reading/writing data from/to a file. In order to perform I/O, you
need to create objects using appropriate Java I/O classes. The objects contain the
methods for reading/ writing data from/to a file. For example, to write text to a file
named temp.txt, you can create an object using the PrintWriter class as follows:
PrintWriter output = new PrintWriter("temp.txt");
❖ You can now invoke the print method on the object to write a string to the file. For
example, the following statement writes Java 101 to the file.
output.print("Java 101");
❖ The next statement closes the file.
output.close();
Introduction to Binary I/O 8
❖ There are many I/O classes for various purposes. In general, these can be classified as
input classes and output classes. An input class contains the methods to read data, and an
output class contains the methods to write data. PrintWriter is an example of an output
class, and Scanner is an example of an input class.
❖ Figure below illustrates Java I/O programming. An input object reads a stream of data
from a file, and an output object writes a stream of data to a file. An input object is also
called an input stream and an output object an output stream.
So, there are types of Java I/O,

What makes them different?

9
10
❖ Introduction to Binary I/O
❖ Text I/O vs. Binary I/O
❖ Binary I/O Classes
❖ Object I/O
❖ Random-Access File
❖ Final Touches
Text I/O vs. Binary I/O 11
❖ Computers do not differentiate between binary files and text files. All files are stored in
binary format, and thus all files are essentially binary files. Text I/O is built upon binary
I/O to provide a level of abstraction for character encoding and decoding, as shown in
Figure below. Encoding and decoding are automatically performed for text I/O. The
JVM converts Unicode to a file-specific encoding when writing a character, and it
converts a file-specific encoding to Unicode when reading a character.

Proposition
❑ Suppose you write the string "199" using text I/O to a file, each character is written to the file. Since the Unicode for character 1
is 0x0031, the Unicode 0x0031 is converted to a code that depends on the encoding scheme for the file. (Note that the prefix 0x
denotes a hex number.) In the United States, the default encoding for text files on Windows is ASCII. The ASCII code for
character 1 is 49 (0x31 in hex) and for character 9 is 57 (0x39 in hex). Thus, to write the characters 199, three bytes - 0x31, 0x39,
and 0x39 - are sent to the output.
Text I/O vs. Binary I/O 12
❖ Binary I/O does not require conversions. If you write a numeric value to a file using
binary I/O, the exact value in the memory is copied into the file. For example, a byte-
type value 199 is represented as 0xC7 (199 = 12 * 161 + 7) in the memory and appears
exactly as 0xC7 in the file, as shown in Figure below. When you read a byte using binary
I/O, one byte value is read from the input.
Text I/O vs. Binary I/O 13
❖ In general, you should use text input to read a file created by a text editor or a text output
program, and use binary input to read a file created by a Java binary output program.
❖ On the other hand, binary I/O is more efficient than text I/O, because binary I/O does not
require encoding and decoding. Binary files are independent of the encoding scheme on
the host machine and thus are portable. Java programs on any machine can read a binary
file created by a Java program. This is why Java class files are binary files. Java class
files can run on a JVM on any machine.
14
❖ Introduction to Binary I/O
❖ Text I/O vs. Binary I/O
❖ Binary I/O Classes
❖ Object I/O
❖ Random-Access File
❖ Final Touches
Binary I/O Classes 15
❖ The binary I/O classes has two “root” classes: InputStream and OutputStream. The
abstract InputStream is the root class for reading binary data, and the abstract
OutputStream is the root class for writing binary data. The design of the Java I/O
classes is a good example of applying inheritance, where common operations are
generalized in superclasses, and subclasses provide specialized operations.

Don't worry if you're not familiar with inheritance, superclasses,


or subclasses yet - you'll dive deeper into those concepts in
Programming 02 (PR2). For now, just focus on understanding how
to use these types of classes in your code. The key is to practice
working with them, and the theory will follow as you progress.
Binary I/O Classes 16
❖ And here are some of the classes for performing binary I/O:
Binary I/O Classes 17
❖ The abstract InputStream class defines the methods for the input stream of bytes
Binary I/O Classes 18
❖ The abstract OutputStream class defines the methods for the output stream of bytes

Note that all the methods in the


binary I/O classes are declared to
throw java.io.IOException or a
subclass of java.io.IOException.
“Need more operations?
Let’s see what can you do with some following subclasses...
19
Binary I/O Classes - FileInputStream/FileOutputStream 20
❖ FileInputStream/FileOutputStream is for reading/writing bytes from/to files. All the
methods in these classes are inherited from InputStream and OutputStream.
❖ FileInputStream/FileOutputStream does not introduce new methods, so you can use all
methods defined in InputStream and OutputStream.
Binary I/O Classes - FileInputStream/FileOutputStream 21
❖ To construct a FileInputStream, use the constructors shown below. A
FileNotFoundException will occur if you attempt to create a FileInputStream with a
nonexistent file.
Binary I/O Classes - FileInputStream/FileOutputStream 22
❖ To construct a FileOutputStream, use the constructors shown in Figure below. If the file
does not exist, a new file will be created. If the file already exists, the first two
constructors will delete the current content of the file. To retain the current content and
append new data into the file, use the last two constructors and pass true to the append
parameter.
Binary I/O Classes - FileInputStream/FileOutputStream 23
❖ Note that almost all the methods in the I/O classes throw java.io.IOException. Therefore,
you have to declare to throw java.io.IOException in the method or place the code in a
trycatch block, as shown below:
Binary I/O Classes - FileInputStream/FileOutputStream 24
❖ Here is an example:
import java.io.*;

public class TestFileStream {


public static void main(String[] args) throws IOException {
try (
// Create an output stream to the file
FileOutputStream output = new FileOutputStream("temp.dat");) {
// Output values to the file
for (int i = 1; i <= 10; i++)
output.write(i);
}

try (
// Create an input stream for the file
FileInputStream input = new FileInputStream("temp.dat");) {
// Read values from the file
int value;
while ((value = input.read()) != -1)
System.out.print(value + " ");
}
}
}
25
Nobita
I’ve been using FileInputStream and FileOutputStream to read and write
files. They work fine, but I noticed that I have to handle everything as
raw bytes. Isn’t there a better way to handle data like numbers or text
directly without converting them?

Good question! While they are great for raw data, they’re not
ideal when you need to work with higher-level data types like
integers, floating-point numbers, or even strings. In this case,
we can use DataInputStream and DataOutputStream.
Nobita
Delivered
Oh, interesting! So, how do
they make things easier?
Well, instead of manually converting
data into bytes, DataInputStream and
DataOutputStream allow you to read
and write primitive data types directly.
Delivered
Binary I/O Classes - DataInputStream/DataOutputStream 26
❖ Filter streams are streams that filter bytes for some purpose. The basic byte input stream provides a read
method that can be used only for reading bytes. If you want to read integers, doubles, or strings, you need
a filter class to wrap the byte input stream. Using a filter class enables you to read integers, doubles, and
strings instead of bytes and characters.
❖ FilterInputStream and FilterOutputStream are the base classes for filtering data. When you need to
process primitive numeric types, use DataInputStream and DataOutputStream to filter bytes.
DataInputStream reads bytes from the stream and converts them into appropriate primitive-type values or
strings. DataOutputStream converts primitive-type values or strings into bytes and outputs the bytes to
the stream.
Binary I/O Classes - DataInputStream/DataOutputStream 27
❖ DataInputStream implements the methods defined in the DataInput interface to read
primitive data-type values and strings. DataOutputStream implements the methods
defined in the DataOutput interface to write primitive data-type values and strings.
Primitive values are copied from memory to the output without any conversions.
Binary I/O Classes - DataInputStream/DataOutputStream 28
❖ Characters in a string may be written in several ways since a Unicode character consists
of two bytes.
❑ The writeChar(char c) method writes the Unicode of character c to the output.
❑ The writeChars(String s) method writes the Unicode for each character in the string s
to the output.
❑ The writeBytes(String s) method writes the lower byte of the Unicode for each
character in the string s to the output. The high byte of the Unicode is discarded. The
writeBytes method is suitable for strings that consist of ASCII characters, since an
ASCII code is stored only in the lower byte of a Unicode. If a string consists of non-
ASCII characters, you have to use the writeChars method to write the string.
❑ The writeUTF(String s) method writes two bytes of length information to the output
stream, followed by the modified UTF-8 representation of every character in the
string s.
Binary I/O Classes - DataInputStream/DataOutputStream 29
❖ For example,
import java.io.*;

public class TestDataStream {


public static void main(String[] args) throws IOException {
try ( // Create an output stream for file temp.dat
DataOutputStream output = new DataOutputStream(new FileOutputStream("temp.dat"));) {
// Write student test scores to the file
output.writeUTF("Liam");
output.writeDouble(85.5);
output.writeUTF("Susan");
output.writeDouble(185.5);
output.writeUTF("Chandra");
output.writeDouble(105.25);
}

try ( // Create an input stream for file temp.dat


DataInputStream input = new DataInputStream(new FileInputStream("temp.dat"));) {
// Read student test scores from the file
System.out.println(input.readUTF() + " " + input.readDouble());
System.out.println(input.readUTF() + " " + input.readDouble());
System.out.println(input.readUTF() + " " + input.readDouble());
}
}
}
Binary I/O Classes - DataInputStream/DataOutputStream 30
❖ Note that DataInputStream and DataOutputStream read and write Java primitive-type
values and strings in a machine-independent fashion, thereby enabling you to write a
data file on one machine and read it on another machine that has a different operating
system or file structure. An application uses a data output stream to write data that can
later be read by a program using a data input stream. You can view
DataInputStream/FileInputStream and DataOutputStream/FileOutputStream working
in a pipe line as shown below:
Binary I/O Classes - BufferedInputStream/BufferedOutputStream 31
❖ You can also speed up input and output by reducing the number of disk reads and writes
by using BufferedInputStream/BufferedOutputStream. Using BufferedInputStream,
the whole block of data on the disk is read into the buffer in the memory once. The
individual data are then delivered to your program from the buffer. Using
BufferedOutputStream, the individual data are first written to the buffer in the memory.
When the buffer is full, all data in the buffer are written to the disk once.

Proposition
❑ BufferedInputStream/BufferedOutputStream does not contain new methods. All the methods in
BufferedInputStream/BufferedOutputStream are inherited from the InputStream/OutputStream classes.
BufferedInputStream/BufferedOutputStream manages a buffer behind the scene and automatically
reads/writes data from/to disk on demand.
That’s almost everything about Binary I/O,
Do you have any question?

32
33

Uhm… Good! Now, let's take a step back and see the bigger picture.
We've explored many essential classes for handling different types
of data. But there's one important element we haven't touched on
yet: Object I/O. While we've mastered reading and writing
primitive data types and bytes, what about entire objects?
34
❖ Introduction to Binary I/O
❖ Text I/O vs. Binary I/O
❖ Binary I/O Classes
❖ Object I/O
❖ Random-Access File
❖ Final Touches
Object I/O 35
❖ While DataInputStream/DataOutputStream enables you to perform I/O for primitive
type values and strings. ObjectInputStream/ObjectOutputStream enables you to
perform I/O for objects in addition to primitive-type values and strings. Since Object I/O
Stream contains all the functions of Data I/O Stream, you can replace Data I/O Stream
completely with Object I/O Stream.

ObjectInputStream extends InputStream


and implements ObjectInput and
ObjectStreamConstants. ObjectInput is a
subinterface of DataInput.
ObjectStreamConstants contains the
constants to support Object I/O Stream.
Object I/O 36
❖ Whereas, ObjectOutputStream extends OutputStream and implements ObjectOutput
and ObjectStreamConstants. ObjectOutput is a subinterface of DataOutput

You can wrap an Object I/O Stream


on any InputStream/OutputStream
using their own constructors.
Object I/O - Serializable 37
❖ One more important thing is that not every object can be written to an output stream. Objects
that can be so written are said to be serializable. A serializable object is an instance of the
java.io.Serializable interface, so the object’s class must implement Serializable.
❖ The Serializable interface is a marker interface. Since it has no methods, you don’t need to add
additional code in your class that implements Serializable. Implementing this interface enables
the Java serialization mechanism to automate the process of storing objects and arrays.
Object I/O - Serializable 38
❖ Suppose you wish to store an ArrayList object. To do this you need to store all the elements in
the list. Each element is an object that may contain other objects. As you can see, this would be a
very tedious process. Fortunately, you don’t have to go through it manually. Java provides a
built-in mechanism to automate the process of writing objects. This process is referred as object
serialization, which is implemented in ObjectOutputStream. In contrast, the process of reading
objects is referred as object deserialization, which is implemented in ObjectInputStream.
Object I/O - Serializable 39
❖ Let’s see how can we serialize an array:

Note that an array is


serializable if all its elements
are serializable. An entire array
can be saved into a file using
writeObject and later can be
restored using readObject.
40

You know, all of the streams you have used so far are known as read-only or
write-only streams. These streams are called sequential streams. A file that is
opened using a sequential stream is called a sequential-access file. The contents of
a sequential-access file cannot be updated. However, it is often necessary to
modify files. Java provides the RandomAccessFile class to allow data to be read
from and written to at any locations in a file. A file that is opened using the
RandomAccessFile class is known as a random-access file.
41
❖ Introduction to Binary I/O
❖ Text I/O vs. Binary I/O
❖ Binary I/O Classes
❖ Object I/O
❖ Random-Access File
❖ Final Touches
Random-Access Files 42
❖ Java provides the RandomAccessFile class to allow data to be read from and written to
at any locations in the file. The RandomAccessFile class implements the DataInput and
DataOutput interfaces, therefore, it can use all the methods as shown below:

When creating a
RandomAccessFile, you can specify
one of two modes: r or rw. Mode r
means that the stream is read-only,
and mode rw indicates that the
stream allows both read and write.
Random-Access Files 43
❖ A random-access file consists of a sequence of bytes. A special marker called a file
pointer is positioned at one of these bytes. A read or write operation takes place at the
location of the file pointer. When a file is opened, the file pointer is set at the beginning
of the file. When you read or write data to the file, the file pointer moves forward to the
next data item. For example, if you read an int value using readInt(), the JVM reads 4
bytes from the file pointer, and now the file pointer is 4 bytes ahead of the previous
location, as shown below:
44

Please read the Final Touches


carefully to see what can you do
with Random-Access Files.
Now that you have learned
Binary I/O. Do you have any
question? If you don’t, in
the next section, I will show
the summary of this
lecture...
Summary 46
1. I/O can be classified into text I/O and binary I/O. Text I/O interprets data in sequences of
characters. Binary I/O interprets data as raw binary values. How text is stored in a file depends on
the encoding scheme for the file. Java automatically performs encoding and decoding for text I/O.
2. The InputStream and OutputStream classes are the roots of all binary I/O classes. File I/O
Stream associates a file for input/output. Buffered I/O Stream can be used to wrap any binary I/O
stream to improve performance. Data I/O Stream can be used to read/write primitive values and
strings.
3. Object I/O Stream can be used to read/write objects in addition to primitive values and strings.
To enable object serialization, the object’s defining class must implement the java.io.Serializable
marker interface.
4. The RandomAccessFile class enables you to read and write data to a file. You can open a file
with the r mode to indicate that it is read-only or with the rw mode to indicate that it is updateable.
Since the RandomAccessFile class implements DataInput and DataOutput interfaces, many
methods in RandomAccessFile are the same as those in DataInputStream and DataOutputStream.
This brings us to the
end of this lecture!
It’s time for Final
Touches…

47
Final Touches 48
❖UTF-8
❑ UTF-8 is a coding scheme that allows systems to operate with both ASCII and Unicode. Most operating systems
use ASCII. Java uses Unicode. The ASCII character set is a subset of the Unicode character set. Since most
applications need only the ASCII character set, it is a waste to represent an 8-bit ASCII character as a 16-bit
Unicode character. The modified UTF-8 scheme stores a character using one, two, or three bytes. Characters are
coded in one byte if their code is less than or equal to 0x7F, in two bytes if their code is greater than 0x7F and
less than or equal to 0x7FF, or in three bytes if their code is greater than 0x7FF.
❑ The initial bits of a UTF-8 character indicate whether a character is stored in one byte, two bytes, or three bytes.
If the first bit is 0, it is a one-byte character. If the first bits are 110, it is the first byte of a two-byte sequence. If
the first bits are 1110, it is the first byte of a threebyte sequence. The information that indicates the number of
characters in a string is stored in the first two bytes preceding the UTF-8 characters. For example,
writeUTF("ABCDEF") actually writes eight bytes (i.e., 00 06 41 42 43 44 45 46) to the file, because the first two
bytes store the number of characters in the string.
❑ The writeUTF(String s) method converts a string into a series of bytes in the UTF-8 format and writes them into
an output stream. The readUTF() method reads a string that has been written using the writeUTF method.
❑ The UTF-8 format has the advantage of saving a byte for each ASCII character, because a Unicode character
takes up two bytes and an ASCII character in UTF-8 only one byte. If most of the characters in a long string are
regular ASCII characters, using UTF-8 is more efficient.
Final Touches 49
❖ Detecting the End of a File
❑ If you keep reading data at the end of an InputStream, an EOFException will occur. This exception
can be used to detect the end of a file.
Final Touches 50
❖Serialization
❑ Many classes in the Java API implement Serializable. All the wrapper classes for
primitive type values, java.math.BigInteger, java.math.BigDecimal,
java.lang.String, java.lang.StringBuilder, java.lang.StringBuffer, java.util.Date, and
java.util.ArrayList implement java.io.Serializable. Attempting to store an object that
does not support the Serializable interface would cause a NotSerializableException.
❑ When a serializable object is stored, the class of the object is encoded; this includes
the class name and the signature of the class, the values of the object’s instance
variables, and the closure of any other objects referenced by the object. The values
of the object’s static variables are not stored.
❑ If an object is an instance of Serializable but contains nonserializable instance data
fields, can it be serialized? The answer is no. To enable the object to be serialized,
mark these data fields with the transient keyword to tell the JVM to ignore them
when writing the object to an object stream.
Final Touches 51
❖Serialization
❑ If an object is written to an object stream more than once, will it be stored in
multiple copies? No, it will not. When an object is written for the first time, a serial
number is created for it. The JVM writes the complete contents of the object along
with the serial number into the object stream. After the first time, only the serial
number is stored if the same object is written again. When the objects are read
back, their references are the same since only one object is actually created in the
memory.
❑ For consistency, this course uses the extension .txt to name text files and .dat
to name binary files.
❑ When a stream is no longer needed, always close it using the close() method or
automatically close it using a try-with-resource statement. Not closing streams may
cause data corruption in the output file, or other programming errors.
Final Touches 52
❖Binary I/O Classes
❑ An instance of FileInputStream can be used as an argument to construct a Scanner,
and an instance of FileOutputStream can be used as an argument to construct a
PrintWriter. You can create a PrintWriter to append text into a file using
new PrintWriter(new FileOutputStream("temp.txt", true));
If temp.txt does not exist, it is created. If temp.txt already exists, new data are
appended to the file.
❑ You have to read data in the same order and format in which they are stored. For
example, since names are written in UTF-8 using writeUTF, you must read names
using readUTF.
❑ You should always use buffered I/O to speed up input and output. For small files,
you may not notice performance improvements. However, for large files—over 100
MB— you will see substantial improvements using buffered I/O.
Final Touches 53
❖Object I/O
❑ Multiple objects or primitives can be written to the stream. The objects must be read
back from the corresponding ObjectInputStream with the same types and in the
same order as they were written. Java’s safe casting should be used to get the desired
type.
❑ The readObject() method may throw java.lang.ClassNotFoundException, because
when the JVM restores an object, it first loads the class for the object if the class has
not been loaded. Since ClassNotFoundException is a checked exception, the main
method declares to throw it. An ObjectInputStream is created to read input from
object.dat. You have to read the data from the file in the same order and format as
they were written to the file. A string, a double value, and an object are read. Since
readObject() returns an Object, it is cast and assigned to a variable.
Final Touches 54
❖Random-Access Files
public static void main(String[] args) throws IOException {
try ( // Create a random access file
RandomAccessFile inout = new RandomAccessFile("inout.dat", "rw");) {
// Clear the file to destroy the old contents if exists
inout.setLength(0);

// Write new integers to the file


for (int i = 0; i < 200; i++)
inout.writeInt(i);

// Display the current length of the file


System.out.println("Current file length is " + inout.length());

// Retrieve the first number


inout.seek(0); // Move the file pointer to the beginning
System.out.println("The first number is " + inout.readInt());

// Retrieve the second number


inout.seek(1 * 4); // Move the file pointer to the second number
System.out.println("The second number is " + inout.readInt());
}
}
Final Touches 55
❖Random-Access Files
public static void main(String[] args) throws IOException {
try ( // Create a random access file
RandomAccessFile inout = new RandomAccessFile("inout.dat", "rw");) {
// Retrieve the tenth number
inout.seek(9 * 4); // Move the file pointer to the tenth number
System.out.println("The tenth number is " + inout.readInt());

// Modify the eleventh number


inout.writeInt(555);

// Append a new number


inout.seek(inout.length()); // Move the file pointer to the end
inout.writeInt(999);

// Display the new length


System.out.println("The new length is " + inout.length());

// Retrieve the new eleventh number


inout.seek(10 * 4); // Move the file pointer to the eleventh number
System.out.println("The eleventh number is " + inout.readInt());
}
}
56

Thanks!
Any questions?
For an in-depth understanding of Java, I highly recommend
referring to the textbooks. This slide provides a brief overview
and may not cover all the details you're eager to explore!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy