Using threadpools/threading for reading large txt files?

こ雲淡風輕ζ 提交于 2019-12-01 20:22:07

Ok, bear with me on this, because I need to explain a few things.

First off, unless you have multiple disks or perhaps a single disk which is SSD, it's not recommended to use more than one thread to read from the disk. Many questions on this topic have been posted and the conclusion was the same: using multiple threads to read from a single mechanical disk will hurt performance instead of improving it.

The above happens because the disk's mechanical head needs to keep seeking the next position to read. Using multiple threads means that when each thread gets a chance to run it will direct the head to a different section of the disk, thus making it bounce between disk areas inefficiently.

The accepted solution for processing multiple files is to have a single producer (a reader thread) - multiple consumer (processing threads) system. The ideal mechanism is a thread pool in this case, with a thread acting as the producer and putting tasks in the pool queue for the workers to process.

Something like this:

int numFiles = 20;
int threads = 4;

ExecutorService exec = Executors.newFixedThreadPool(threads);

for(int i = 0; i < numFiles; i++){
    String[] fileContents = // read current file;
    exec.submit(new ThreadTask(fileContents));
}

exec.shutdown();
exec.awaitTermination(Long.MAX_VALUE, TimeUnit.SECONDS);
...

class ThreadTask implements Runnable {

   private String[] fileContents;

   public ThreadTask(String[] fileContents) {
        this.fileContents = fileContents;
   }

   public void run(){
      //processes txt file
   }
}

I would start by reading this tutorial on high level concurrency. I recommend reading the whole concurrency tutorial because it sounds like you are new to multithreading.

So, the newFixedThreadPool() call will return an instance of ExecutorService. You can reference the JavaDoc, which is pretty comprehensive and contains a workable example. You will want to either submit or invokeAll a number of Callables implementing your file-processing tasks, giving you a number of Futures in return. Their get() methods will give you the result of the task execution upon completion (you have to write that part yourself :))

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!