Multithreading a massive file read

后端 未结 4 840
后悔当初
后悔当初 2020-12-28 17:55

I\'m still in the process of wrapping my brain around how concurrency works in Java. I understand that (if you\'re subscribing to the OO Java 5 concurrency model) you implem

4条回答
  •  灰色年华
    2020-12-28 18:29

    If you system supported high-throughput I/O , here is how you can do it:

    How to read a file using multiple threads in Java when a high throughput(3GB/s) file system is available

    Here is the solution to read a single file with multiple threads.

    Divide the file into N chunks, read each chunk in a thread, then merge them in order. Beware of lines that cross chunk boundaries. It is the basic idea as suggested by user slaks

    Bench-marking below implementation of multiple-threads for a single 20 GB file:

    1 Thread : 50 seconds : 400 MB/s

    2 Threads: 30 seconds : 666 MB/s

    4 Threads: 20 seconds : 1GB/s

    8 Threads: 60 seconds : 333 MB/s

    Equivalent Java7 readAllLines() : 400 seconds : 50 MB/s

    Note: This may only work on systems that are designed to support high-throughput I/O , and not on usual personal computers

    Here is the essential nits of the code, for complete details , follow the link

    public class FileRead implements Runnable
    {
    
    private FileChannel _channel;
    private long _startLocation;
    private int _size;
    int _sequence_number;
    
    public FileRead(long loc, int size, FileChannel chnl, int sequence)
    {
        _startLocation = loc;
        _size = size;
        _channel = chnl;
        _sequence_number = sequence;
    }
    
    @Override
    public void run()
    {
            System.out.println("Reading the channel: " + _startLocation + ":" + _size);
    
            //allocate memory
            ByteBuffer buff = ByteBuffer.allocate(_size);
    
            //Read file chunk to RAM
            _channel.read(buff, _startLocation);
    
            //chunk to String
            String string_chunk = new String(buff.array(), Charset.forName("UTF-8"));
    
            System.out.println("Done Reading the channel: " + _startLocation + ":" + _size);
    
    }
    
    //args[0] is path to read file
    //args[1] is the size of thread pool; Need to try different values to fing sweet spot
    public static void main(String[] args) throws Exception
    {
        FileInputStream fileInputStream = new FileInputStream(args[0]);
        FileChannel channel = fileInputStream.getChannel();
        long remaining_size = channel.size(); //get the total number of bytes in the file
        long chunk_size = remaining_size / Integer.parseInt(args[1]); //file_size/threads
    
    
        //thread pool
        ExecutorService executor = Executors.newFixedThreadPool(Integer.parseInt(args[1]));
    
        long start_loc = 0;//file pointer
        int i = 0; //loop counter
        while (remaining_size >= chunk_size)
        {
            //launches a new thread
            executor.execute(new FileRead(start_loc, toIntExact(chunk_size), channel, i));
            remaining_size = remaining_size - chunk_size;
            start_loc = start_loc + chunk_size;
            i++;
        }
    
        //load the last remaining piece
        executor.execute(new FileRead(start_loc, toIntExact(remaining_size), channel, i));
    
        //Tear Down
    
    }
    
    }
    

提交回复
热议问题