Section 11.4 Reading Files
Subsection 11.4.1 The Nature of a File
There is no doubt that you are familiar with the concept of a file. We use files to store documents, spreadsheets, pictures, music, games, compressed archives, and more. Files are organized in a hierarchy and stored in a file system, where folders or directories are the containers that contain files or other subfolders.
A more fundamental way to think of a file is as an ordered sequence of bytes that is assigned a unique name plus location (a path) in the file system’s storage. A file system also stores properties about a file, such as who has access to the file’s contents and whether or not it should be interpreted as data or an executable program. How to interpret these bytes varies widely. Even within a single category of file type, algorithms for extracting file contents can be drastically different. For example, the difference between a JPEG and BMP is significant.
The file extension of a file name is a hint to the operating system about the file’s type, but not a guarantee. A file extension is the final few characters that make up a file name after the final "." character. Examples are ".txt", ".docx", ".html", ".exe". Operating systems often use the extension to identify the program that knows how to open the file and interpet its contents. Never open a file unless you trust the author, especially a file that is executable. On the Windows operating system, an executable file often has a ".exe" extension. On other operating systems, a file is executable if one of its file system properties identifies it as such.
The format that we’ll work with here is often referred to as plain text files. The way to interpret the sequence of bytes in a plain text file is that each encodes a character in the file. The most common standard for translating between a numerical byte value and its character equivalent is the ASCII (See Appendix A). To interpret the contents of a plain text file, read the byte sequence and substitute each value with an equivalent character from the ASCII lookup table. Writing a plain text file is the inverse process. Convert your file contents to a String, and then write the ASCII bytes to a file that correspond to the character sequence of that String.
One shortcoming of plain text files is that they can be very large. There is no method of compression employed and so we might find significant content duplication. Another shortcoming is that the content is limited to characters. There is no method for storing native numerical values, which can be very inefficient. For example, the number 3.1415926 as a floating point number is stored in memory using four bytes. In a plain text file each of the nine characters must be stored separately, and reading the number involves the execution of a parsing algorithm to reinterpret the characters as the number. The advantage of plain text files is that the content is easily interpreted, with ASCII character code translation built right into all programming languages, including Java. Another advantage is that many file types are actually plain text files. For example, our Java programs are plain text, although, our compiled ".class" files are not. Also, HTML files and CSV files (comma-separated values) are stored as plain text.
Let’s explore how to read and write plain text files using Java.
Subsection 11.4.2 Reading with Scanner and File
In Section 3.4 we learned how to prompt the user for input from the terminal and to use the Scanner class to read the response. Recall that the Scanner read and interpreted user-entered characters from
System.in
. In a similar manner, we can use a Scanner to read and interpret bytes from a file. The same Scanner methods apply; only the source of byte data is different. Refer to Table 3.4.1 for a refresher.A Scanner may be instantiated with a java.io.File object constructor argument representing a file on the local file system. One File constructor takes a String containing the unique file location. A file location can be a complete path to the file from the file system root, or a relative path from the location of the Java program. We’ll assume that the file we want to read is in the same folder as our Java program, and so we can pass only the file name as a String to the File constructor.
For the examples that follow, we assume that you have a plain text file named "data.txt" available to read. You can create your own text file using your programming editor or a program like Notepad or TextEdit. You can also enter data into a spreadsheet like Microsoft Excel and save as a ".csv" or ".txt" file. The same goes for a program like Microsoft Word. You must save your file as a ".txt" file. Otherwise, it may not be in a plain text format.
The first step to read from a file is to instantiate a File object representing the file stored in your file system. Assuming the data file is named "data.txt" and is saved to the same file system location as your Java program, you can accomplish this with the following statement, where
fileName
is a String variable with the name of the file to be opened. Before you can instantiate a File object, you must import it from java.io.File
. Refer to Listing 11.4.1 for the completed program.File fp = new File(fileName);
With the File object instantiated, the next step is to pass it to the Scanner class constructor to create a new Scanner configured to read from the File object. A Scanner that wraps a File object throws a FileNotFoundException. Because this is a checked exception, the Scanner must be created in a try-block. In the following code snippet, we declare the Scanner variable in the outer method scope because we will use it later, but the instantiation of the Scanner object must be done in the try-block.
import java.util.Scanner; // Required imports
import java.io.File;
import java.io.FileNotFoundException;
// …
Scanner scn = null; // Scanner variable in method scope
File fp = new File(fileName); // Create a new File object
try { // Catch exceptions
scn = new Scanner(fp); // Wrap File in a Scanner
// …
catch( FileNotFoundException e ) { // Handle checked exception
// …
finally { // Always execute
scn.close(); // Close the Scanner
// …
In the above code snippet, note that we used a finally-block to close the Scanner. If the Scanner is not closed properly, we run the risk of leaking the resources.
For our file reading examples we will read files one line at a time, rather than reading data item by data item. We use the Scanner’s
hasNextLine()
method to check if the file has another line of text to be read, and its readLine()
method to read the next line as a String. Listing 11.4.1 includes the complete Java program with a utility method named printFile(…)
to read all lines from a file and print them to the terminal. Following this listing is a shell session demonstrating compilation and execution of the program. Our example "data.txt" file contains random floating point numbers.// ReadFile1.java
import java.util.Scanner; // Required imports
import java.io.File;
import java.io.FileNotFoundException;
public class ReadFile1 {
public static void main(String[] args) {
printFile("data.txt"); // Print file given path
}
// Utility to print a file's contents
public static void printFile(String fileName) {
String line; // Helper to hold read line String
Scanner scn = null; // Scanner variable in method scope
File fp = new File(fileName); // Create a new File object
try { // Catch exceptions
scn = new Scanner(fp); // Wrap File in a Scanner
while (scn.hasNextLine()) { // Continue while more to read
line = scn.nextLine(); // Read line from file
System.out.println( line ); // Print line
}
}
catch( FileNotFoundException e ) { // Handle checked exception
System.out.println(e);
}
finally { // Always execute
scn.close(); // Close the Scanner
}
}
}
ReadFile1.java
javac ReadFile1.java java ReadFile1 0.493201369 0.197927567 0.213527982 0.888503267 0.509179639 …
Subsection 11.4.3 Reading with FileReader and BufferedReader
An alternative way to read plain text files in Java is to use the FileReader and BufferedReader classes. The FileReader class reads files byte-by-byte. This is not useful if we are interested in reading a file line-by-line. The BufferedReader class will wrap a FileReader object and use it to buffer file contents and the return these contents in a more convenient manner. For our purposes, we are most interested in the BufferedReader’s
readLine()
method, which returns the next line from the buffered file contents.Find a complete example of reading a plain text file line-by-line in Listing 11.4.2. A BufferedReader returns
null
from its readLine()
method when there are no more lines to read. With this in mind, pay close attention to the way the while-statement in Listing 11.4.2 is constructed. We invoke readLine()
once before the while-statement, and enter the statement only if the line
value is not null
. This prevents further action to be taken on an empty file. Also, we read the next line at the end of the while-statement body to check if there is another line to be processed, and continue the while-statement only if there is another line.
Listing 11.4.2 works in a manner very similar to Listing 11.4.1. We must catch the IOException thrown by the FileReader constructor, and we must use the finally-statement to close the opened BufferedReader to prevent a resource leak. Output generated from running this program is identical to running Listing 11.4.1.
// ReadFile2.java
import java.io.*; // Required imports
public class ReadFile2 {
public static void main(String[] args) {
printFile("data.txt"); // Print file given path
}
// Utility to print a file's contents
public static void printFile(String fileName) {
String line; // Helper variables
BufferedReader br = null;
FileReader fr = null;
try { // Catch exceptions
fr = new FileReader(fileName); // Throws IOException
br = new BufferedReader(fr); // Wrap FileReader
line = br.readLine(); // Read next line
while(line != null) { // null when no more to read
System.out.println( line ); // Print read line
line = br.readLine(); // Try again
}
br.close(); // Close BufferedReader
}
catch (IOException e) { // Handle checked exception
System.out.println(e);
}
}
}
ReadFile2.java
Subsection 11.4.4 Defering Exception Handling
Each of the above versions of the
printFile(…)
method was designed to handle any IOExceptions that might occur right in the method itself. Rather than handling the exceptions in the method, exception handling can be deferred and handled in the invoking method instead. To make the compiler happy about not handling a checked exception in the place it may be thrown, we must declare that the method throws
the exception. This has the effect of deferring the requirement to the invoking method. Eventually the checked exception must be handled in your program.
Listing 11.4.3 is an alternative version of Listing 11.4.1. The definition of
printFile(…)
in Listing 11.4.3 had no try-catch. Instead, it declares that it throws the checked FileNotFoundException by adding throws FileNotFoundException
to the method declaration. This requires us to move the try-catch to main(…)
where printFile(…)
is invoked. The invocation of this method must be placed in the try-block of the try-catch otherwise the compiler will print an error and fail.// ReadFile1B.java
import java.util.Scanner; // Required imports
import java.io.File;
import java.io.FileNotFoundException;
public class ReadFile1B {
public static void main(String[] args) {
try { // Catch exceptions
printFile("data.txt"); // Print file Scanner
}
catch( FileNotFoundException e ) { // Handle checked exception
System.out.println(e);
}
}
// Utility to print a file's contents. Defer handling FileNotFoundException
public static void printFile(String fileName) throws FileNotFoundException {
String line; // Helper to hold read line String
File fp = new File(fileName); // Create a new File object
Scanner scn = new Scanner(fp); // Wrap File in a Scanner
while (scn.hasNextLine()) { // Continue while more to read
line = scn.nextLine(); // Read line from file
System.out.println( line ); // Print line
}
scn.close(); // Close the Scanner
}
}
ReadFile1B.java