Considering a really huge file(maybe more than 4GB) on disk,I want to scan through this file and calculate the times of a specific binary pattern occurs.
My thought
Creating 20 threads, each supposing to handle some 100 MB of the file is likely to only worsen performance since The HD will have to read from several unrelated places at the same time.
HD performance is at its peak when it reads sequential data. So assuming your huge file is not fragmented, the best thing to do would probably be to use just one thread and read from start to end in chunks of a few (say 4) MB.
But what do I know. File systems and caches are complex. Do some testing and see what works best.