Advertisement
Advertisement


Reading a plain text file in Java


Question

It seems there are different ways to read and write data of files in Java.

I want to read ASCII data from a file. What are the possible ways and their differences?

2016/01/15
1
940
1/15/2016 1:12:57 PM

Accepted Answer

ASCII is a TEXT file so you would use Readers for reading. Java also supports reading from a binary file using InputStreams. If the files being read are huge then you would want to use a BufferedReader on top of a FileReader to improve read performance.

Go through this article on how to use a Reader

I'd also recommend you download and read this wonderful (yet free) book called Thinking In Java

In Java 7:

new String(Files.readAllBytes(...))

(docs) or

Files.readAllLines(...)

(docs)

In Java 8:

Files.lines(..).forEach(...)

(docs)

2019/02/05
571
2/5/2019 11:21:29 PM


The easiest way is to use the Scanner class in Java and the FileReader object. Simple example:

Scanner in = new Scanner(new FileReader("filename.txt"));

Scanner has several methods for reading in strings, numbers, etc... You can look for more information on this on the Java documentation page.

For example reading the whole content into a String:

StringBuilder sb = new StringBuilder();
while(in.hasNext()) {
    sb.append(in.next());
}
in.close();
outString = sb.toString();

Also if you need a specific encoding you can use this instead of FileReader:

new InputStreamReader(new FileInputStream(fileUtf8), StandardCharsets.UTF_8)
2017/05/24

Here is a simple solution:

String content;

content = new String(Files.readAllBytes(Paths.get("sample.txt")));
2015/01/29

Here's another way to do it without using external libraries:

import java.io.File;
import java.io.FileReader;
import java.io.IOException;

public String readFile(String filename)
{
    String content = null;
    File file = new File(filename); // For example, foo.txt
    FileReader reader = null;
    try {
        reader = new FileReader(file);
        char[] chars = new char[(int) file.length()];
        reader.read(chars);
        content = new String(chars);
        reader.close();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        if(reader != null){
            reader.close();
        }
    }
    return content;
}
2018/02/28

I had to benchmark the different ways. I shall comment on my findings but, in short, the fastest way is to use a plain old BufferedInputStream over a FileInputStream. If many files must be read then three threads will reduce the total execution time to roughly half, but adding more threads will progressively degrade performance until making it take three times longer to complete with twenty threads than with just one thread.

The assumption is that you must read a file and do something meaningful with its contents. In the examples here is reading lines from a log and count the ones which contain values that exceed a certain threshold. So I am assuming that the one-liner Java 8 Files.lines(Paths.get("/path/to/file.txt")).map(line -> line.split(";")) is not an option.

I tested on Java 1.8, Windows 7 and both SSD and HDD drives.

I wrote six different implementations:

rawParse: Use BufferedInputStream over a FileInputStream and then cut lines reading byte by byte. This outperformed any other single-thread approach, but it may be very inconvenient for non-ASCII files.

lineReaderParse: Use a BufferedReader over a FileReader, read line by line, split lines by calling String.split(). This is approximatedly 20% slower that rawParse.

lineReaderParseParallel: This is the same as lineReaderParse, but it uses several threads. This is the fastest option overall in all cases.

nioFilesParse: Use java.nio.files.Files.lines()

nioAsyncParse: Use an AsynchronousFileChannel with a completion handler and a thread pool.

nioMemoryMappedParse: Use a memory-mapped file. This is really a bad idea yielding execution times at least three times longer than any other implementation.

These are the average times for reading 204 files of 4 MB each on an quad-core i7 and SSD drive. The files are generated on the fly to avoid disk caching.

rawParse                11.10 sec
lineReaderParse         13.86 sec
lineReaderParseParallel  6.00 sec
nioFilesParse           13.52 sec
nioAsyncParse           16.06 sec
nioMemoryMappedParse    37.68 sec

I found a difference smaller than I expected between running on an SSD or an HDD drive being the SSD approximately 15% faster. This may be because the files are generated on an unfragmented HDD and they are read sequentially, therefore the spinning drive can perform nearly as an SSD.

I was surprised by the low performance of the nioAsyncParse implementation. Either I have implemented something in the wrong way or the multi-thread implementation using NIO and a completion handler performs the same (or even worse) than a single-thread implementation with the java.io API. Moreover the asynchronous parse with a CompletionHandler is much longer in lines of code and tricky to implement correctly than a straight implementation on old streams.

Now the six implementations followed by a class containing them all plus a parametrizable main() method that allows to play with the number of files, file size and concurrency degree. Note that the size of the files varies plus minus 20%. This is to avoid any effect due to all the files being of exactly the same size.

rawParse

public void rawParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
    overrunCount = 0;
    final int dl = (int) ';';
    StringBuffer lineBuffer = new StringBuffer(1024);
    for (int f=0; f<numberOfFiles; f++) {
        File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
        FileInputStream fin = new FileInputStream(fl);
        BufferedInputStream bin = new BufferedInputStream(fin);
        int character;
        while((character=bin.read())!=-1) {
            if (character==dl) {

                // Here is where something is done with each line
                doSomethingWithRawLine(lineBuffer.toString());
                lineBuffer.setLength(0);
            }
            else {
                lineBuffer.append((char) character);
            }
        }
        bin.close();
        fin.close();
    }
}

public final void doSomethingWithRawLine(String line) throws ParseException {
    // What to do for each line
    int fieldNumber = 0;
    final int len = line.length();
    StringBuffer fieldBuffer = new StringBuffer(256);
    for (int charPos=0; charPos<len; charPos++) {
        char c = line.charAt(charPos);
        if (c==DL0) {
            String fieldValue = fieldBuffer.toString();
            if (fieldValue.length()>0) {
                switch (fieldNumber) {
                    case 0:
                        Date dt = fmt.parse(fieldValue);
                        fieldNumber++;
                        break;
                    case 1:
                        double d = Double.parseDouble(fieldValue);
                        fieldNumber++;
                        break;
                    case 2:
                        int t = Integer.parseInt(fieldValue);
                        fieldNumber++;
                        break;
                    case 3:
                        if (fieldValue.equals("overrun"))
                            overrunCount++;
                        break;
                }
            }
            fieldBuffer.setLength(0);
        }
        else {
            fieldBuffer.append(c);
        }
    }
}

lineReaderParse

public void lineReaderParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
    String line;
    for (int f=0; f<numberOfFiles; f++) {
        File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
        FileReader frd = new FileReader(fl);
        BufferedReader brd = new BufferedReader(frd);

        while ((line=brd.readLine())!=null)
            doSomethingWithLine(line);
        brd.close();
        frd.close();
    }
}

public final void doSomethingWithLine(String line) throws ParseException {
    // Example of what to do for each line
    String[] fields = line.split(";");
    Date dt = fmt.parse(fields[0]);
    double d = Double.parseDouble(fields[1]);
    int t = Integer.parseInt(fields[2]);
    if (fields[3].equals("overrun"))
        overrunCount++;
}

lineReaderParseParallel

public void lineReaderParseParallel(final String targetDir, final int numberOfFiles, final int degreeOfParalelism) throws IOException, ParseException, InterruptedException {
    Thread[] pool = new Thread[degreeOfParalelism];
    int batchSize = numberOfFiles / degreeOfParalelism;
    for (int b=0; b<degreeOfParalelism; b++) {
        pool[b] = new LineReaderParseThread(targetDir, b*batchSize, b*batchSize+b*batchSize);
        pool[b].start();
    }
    for (int b=0; b<degreeOfParalelism; b++)
        pool[b].join();
}

class LineReaderParseThread extends Thread {

    private String targetDir;
    private int fileFrom;
    private int fileTo;
    private DateFormat fmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    private int overrunCounter = 0;

    public LineReaderParseThread(String targetDir, int fileFrom, int fileTo) {
        this.targetDir = targetDir;
        this.fileFrom = fileFrom;
        this.fileTo = fileTo;
    }

    private void doSomethingWithTheLine(String line) throws ParseException {
        String[] fields = line.split(DL);
        Date dt = fmt.parse(fields[0]);
        double d = Double.parseDouble(fields[1]);
        int t = Integer.parseInt(fields[2]);
        if (fields[3].equals("overrun"))
            overrunCounter++;
    }

    @Override
    public void run() {
        String line;
        for (int f=fileFrom; f<fileTo; f++) {
            File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
            try {
            FileReader frd = new FileReader(fl);
            BufferedReader brd = new BufferedReader(frd);
            while ((line=brd.readLine())!=null) {
                doSomethingWithTheLine(line);
            }
            brd.close();
            frd.close();
            } catch (IOException | ParseException ioe) { }
        }
    }
}

nioFilesParse

public void nioFilesParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
    for (int f=0; f<numberOfFiles; f++) {
        Path ph = Paths.get(targetDir+filenamePreffix+String.valueOf(f)+".txt");
        Consumer<String> action = new LineConsumer();
        Stream<String> lines = Files.lines(ph);
        lines.forEach(action);
        lines.close();
    }
}


class LineConsumer implements Consumer<String> {

    @Override
    public void accept(String line) {

        // What to do for each line
        String[] fields = line.split(DL);
        if (fields.length>1) {
            try {
                Date dt = fmt.parse(fields[0]);
            }
            catch (ParseException e) {
            }
            double d = Double.parseDouble(fields[1]);
            int t = Integer.parseInt(fields[2]);
            if (fields[3].equals("overrun"))
                overrunCount++;
        }
    }
}

nioAsyncParse

public void nioAsyncParse(final String targetDir, final int numberOfFiles, final int numberOfThreads, final int bufferSize) throws IOException, ParseException, InterruptedException {
    ScheduledThreadPoolExecutor pool = new ScheduledThreadPoolExecutor(numberOfThreads);
    ConcurrentLinkedQueue<ByteBuffer> byteBuffers = new ConcurrentLinkedQueue<ByteBuffer>();

    for (int b=0; b<numberOfThreads; b++)
        byteBuffers.add(ByteBuffer.allocate(bufferSize));

    for (int f=0; f<numberOfFiles; f++) {
        consumerThreads.acquire();
        String fileName = targetDir+filenamePreffix+String.valueOf(f)+".txt";
        AsynchronousFileChannel channel = AsynchronousFileChannel.open(Paths.get(fileName), EnumSet.of(StandardOpenOption.READ), pool);
        BufferConsumer consumer = new BufferConsumer(byteBuffers, fileName, bufferSize);
        channel.read(consumer.buffer(), 0l, channel, consumer);
    }
    consumerThreads.acquire(numberOfThreads);
}


class BufferConsumer implements CompletionHandler<Integer, AsynchronousFileChannel> {

        private ConcurrentLinkedQueue<ByteBuffer> buffers;
        private ByteBuffer bytes;
        private String file;
        private StringBuffer chars;
        private int limit;
        private long position;
        private DateFormat frmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

        public BufferConsumer(ConcurrentLinkedQueue<ByteBuffer> byteBuffers, String fileName, int bufferSize) {
            buffers = byteBuffers;
            bytes = buffers.poll();
            if (bytes==null)
                bytes = ByteBuffer.allocate(bufferSize);

            file = fileName;
            chars = new StringBuffer(bufferSize);
            frmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
            limit = bufferSize;
            position = 0l;
        }

        public ByteBuffer buffer() {
            return bytes;
        }

        @Override
        public synchronized void completed(Integer result, AsynchronousFileChannel channel) {

            if (result!=-1) {
                bytes.flip();
                final int len = bytes.limit();
                int i = 0;
                try {
                    for (i = 0; i < len; i++) {
                        byte by = bytes.get();
                        if (by=='\n') {
                            // ***
                            // The code used to process the line goes here
                            chars.setLength(0);
                        }
                        else {
                                chars.append((char) by);
                        }
                    }
                }
                catch (Exception x) {
                    System.out.println(
                        "Caught exception " + x.getClass().getName() + " " + x.getMessage() +
                        " i=" + String.valueOf(i) + ", limit=" + String.valueOf(len) +
                        ", position="+String.valueOf(position));
                }

                if (len==limit) {
                    bytes.clear();
                    position += len;
                    channel.read(bytes, position, channel, this);
                }
                else {
                    try {
                        channel.close();
                    }
                    catch (IOException e) {
                    }
                    consumerThreads.release();
                    bytes.clear();
                    buffers.add(bytes);
                }
            }
            else {
                try {
                    channel.close();
                }
                catch (IOException e) {
                }
                consumerThreads.release();
                bytes.clear();
                buffers.add(bytes);
            }
        }

        @Override
        public void failed(Throwable e, AsynchronousFileChannel channel) {
        }
};

FULL RUNNABLE IMPLEMENTATION OF ALL CASES

https://github.com/sergiomt/javaiobenchmark/blob/master/FileReadBenchmark.java

2017/01/14

Here are the three working and tested methods:

Using BufferedReader

package io;
import java.io.*;
public class ReadFromFile2 {
    public static void main(String[] args)throws Exception {
        File file = new File("C:\\Users\\pankaj\\Desktop\\test.java");
        BufferedReader br = new BufferedReader(new FileReader(file));
        String st;
        while((st=br.readLine()) != null){
            System.out.println(st);
        }
    }
}

Using Scanner

package io;

import java.io.File;
import java.util.Scanner;

public class ReadFromFileUsingScanner {
    public static void main(String[] args) throws Exception {
        File file = new File("C:\\Users\\pankaj\\Desktop\\test.java");
        Scanner sc = new Scanner(file);
        while(sc.hasNextLine()){
            System.out.println(sc.nextLine());
        }
    }
}

Using FileReader

package io;
import java.io.*;
public class ReadingFromFile {

    public static void main(String[] args) throws Exception {
        FileReader fr = new FileReader("C:\\Users\\pankaj\\Desktop\\test.java");
        int i;
        while ((i=fr.read()) != -1){
            System.out.print((char) i);
        }
    }
}

Read the entire file without a loop using the Scanner class

package io;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class ReadingEntireFileWithoutLoop {

    public static void main(String[] args) throws FileNotFoundException {
        File file = new File("C:\\Users\\pankaj\\Desktop\\test.java");
        Scanner sc = new Scanner(file);
        sc.useDelimiter("\\Z");
        System.out.println(sc.next());
    }
}
2017/01/14

Source: https://stackoverflow.com/questions/4716503
Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]