Reading from disk and processing in parallel

你。 提交于 2019-12-12 23:21:50

问题


This is going to be the most basic and even may be stupid question here. When we talk about using multi threading for better resource utilization. For example, an application reads and processes files from the local file system. Lets say that reading of file from disk takes 5 seconds and processing it takes 2 seconds.

In above scenario, we say that using two threads one to read and other to process will save time. Because even when one thread is processing first file, other thread in parallel can start reading second file.

Question: Is this because of the way CPUs are designed. As in there is a different processing unit and different read/write unit so these two threads can work in parallel on even a single core machine as they are actually handled by different modules? Or this needs multiple core.

Sorry for being stupid. :)


回答1:


On a single processor, multithreading is achieved through time slicing. One thread will do some work then it will switch to the other thread.

When a thread is waiting on some I/O, such as a file read, it will give up it's CPU time-slice prematurely allowing another thread to make use of the CPU.

The result is overall improved throughput compared to a single thread even on a single core.

Key for below:

  • = Doing work on CPU
  • - I/O
  • _ Idle

Single thread:

====--====--====--====--

Two threads:

====--__====--__====--__
____====--__====--__====

So you can see how more can get done in the same time as the CPU is kept busy where it would have been kept waiting before. The storage device is also being used more.




回答2:


In theory yes. Single core has same parallelism. One thread waiting for read from file (I/O Wait), another thread is process file that already read before. First thread actually can not running state until I/O operations is completed. Rougly not use cpu resource at this state. Second thread consume CPU resource and complete task. Indeed, multi core CPU has better performance.




回答3:


To start with, there is a difference between concurrency and parallelism. Theoretically, a single core machine does not support parallelism.

About the question on performance improvement as a result of concurrency (using threads), it is very implementation dependent. Take for instance, Android or Swing. Both of them have a main thread (or the UI thread). Doing large calculation on the main thread will block the UI and make in unresponsive. So from a layman perspective that would be a bad performance.

In your case(I am assuming there is no UI Thread) where you will benefit from delegating your processing to another thread depends on a lot of factors, specially the implementation of your threads. e.g. Synchronized threads would not be as good as the unsynchronized ones. Your problem statement reminds me of classic consumer producer problem. So use of threads should not really be the best thing for your work as you need synchronized threads. IMO It's better to do all the reading and processing in a single thread.

Multithreading will also have a context switching cost. It is not as big as Process's context switching, but it's still there. See this link.

[EDIT] You should preferably be using BlockingQueue for such producer consumer scenario.



来源:https://stackoverflow.com/questions/28087393/reading-from-disk-and-processing-in-parallel

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!