How to read a file using multiple threads in Java when a high throughput(3GB/s) file system is available

前端 未结 2 1538
情深已故
情深已故 2020-12-06 15:09

I understand that for a normal Spindle Drive system, reading files using multiple threads is inefficient.

This is a different case, I have a high-throughput

相关标签:
2条回答
  • 2020-12-06 15:53

    You should first try the java 7 Files.readAllLines:

    List<String> lines = Files.readAllLines(Paths.get(path), encoding);
    

    Using a multi threaded approach is probably not a good option as it will force the filesystem to perform random reads (which is never a good thing on a file system)

    0 讨论(0)
  • 2020-12-06 16:09

    Here is the solution to read a single file with multiple threads.

    Divide the file into N chunks, read each chunk in a thread, then merge them in order. Beware of lines that cross chunk boundaries. It is the basic idea as suggested by user slaks

    Bench-marking below implementation of multiple-threads for a single 20 GB file:

    1 Thread : 50 seconds : 400 MB/s

    2 Threads: 30 seconds : 666 MB/s

    4 Threads: 20 seconds : 1GB/s

    8 Threads: 60 seconds : 333 MB/s

    Equivalent Java7 readAllLines() : 400 seconds : 50 MB/s

    Note: This may only work on systems that are designed to support high-throughput I/O , and not on usual personal computers

    package filereadtests;
    
    import java.io.*;
    import static java.lang.Math.toIntExact;
    import java.nio.*;
    import java.nio.channels.*;
    import java.nio.charset.Charset;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Executors;
    
    public class FileRead implements Runnable
    {
    
    private FileChannel _channel;
    private long _startLocation;
    private int _size;
    int _sequence_number;
    
    public FileRead(long loc, int size, FileChannel chnl, int sequence)
    {
        _startLocation = loc;
        _size = size;
        _channel = chnl;
        _sequence_number = sequence;
    }
    
    @Override
    public void run()
    {
        try
        {
            System.out.println("Reading the channel: " + _startLocation + ":" + _size);
    
            //allocate memory
            ByteBuffer buff = ByteBuffer.allocate(_size);
    
            //Read file chunk to RAM
            _channel.read(buff, _startLocation);
    
            //chunk to String
            String string_chunk = new String(buff.array(), Charset.forName("UTF-8"));
    
            System.out.println("Done Reading the channel: " + _startLocation + ":" + _size);
    
        } catch (Exception e)
        {
            e.printStackTrace();
        }
    }
    
    //args[0] is path to read file
    //args[1] is the size of thread pool; Need to try different values to fing sweet spot
    public static void main(String[] args) throws Exception
    {
        FileInputStream fileInputStream = new FileInputStream(args[0]);
        FileChannel channel = fileInputStream.getChannel();
        long remaining_size = channel.size(); //get the total number of bytes in the file
        long chunk_size = remaining_size / Integer.parseInt(args[1]); //file_size/threads
    
        //Max allocation size allowed is ~2GB
        if (chunk_size > (Integer.MAX_VALUE - 5))
        {
            chunk_size = (Integer.MAX_VALUE - 5);
        }
    
        //thread pool
        ExecutorService executor = Executors.newFixedThreadPool(Integer.parseInt(args[1]));
    
        long start_loc = 0;//file pointer
        int i = 0; //loop counter
        while (remaining_size >= chunk_size)
        {
            //launches a new thread
            executor.execute(new FileRead(start_loc, toIntExact(chunk_size), channel, i));
            remaining_size = remaining_size - chunk_size;
            start_loc = start_loc + chunk_size;
            i++;
        }
    
        //load the last remaining piece
        executor.execute(new FileRead(start_loc, toIntExact(remaining_size), channel, i));
    
        //Tear Down
        executor.shutdown();
    
        //Wait for all threads to finish
        while (!executor.isTerminated())
        {
            //wait for infinity time
        }
        System.out.println("Finished all threads");
        fileInputStream.close();
    }
    
    }
    
    0 讨论(0)
提交回复
热议问题