Is there a way to determine the ideal number of threads? [duplicate]

独自空忆成欢 提交于 2019-12-19 07:28:20

问题


I am doing a webcrawler and using threads to download pages.

The first limiting factor to the performance of my program is the bandwidth, I can never download more pages that it can get.

The second thing is what I interested. I am using threads to download many pages at same time, but as I create more threads, more sharing of processor occurs. Is there some metric/way/class of tests to determine what is the ideal number of threads or if after certain number, the performance doesn't change or decrease?


回答1:


we've developped a multithreaded parrallel web crawler. Benchmarking troughput is the best way to get ideas on how the beast will handle his job. For a dedicated java server, one thread per core is a base to start, then the I/O comes into play and change.

Performances do decrease after certain number of threads. But it depends on the site you crawl too, on the OS you use, etc. Try to find a site with a merely constant response time to do your first benchmarks (like Google, but take differents services)

With slow websites, higher number of threads tends to compensate i/o blocking




回答2:


Have a look at my answer in this thread

How to find out the optimal amount of threads?

Your example will likely be CPU bound, so you need a way to work out the contention to be able to work out the right number of threads on your box to use and be able to keep them all busy. Profiling will help there but remember it'll depend on the number of cores (as well as the network latency already mentioned etc) so use the runtime to get the number of cores when wiring up your thread pool size.

No quick answer I'm afraid, there will be an element of test, measure, adjust, repeat I'm afraid!




回答3:


The ideal number of thread should be close to the number of cores (virtual cores) your hardware provides. This is to avoid thread context switching and thread scheduling. If you're doing heavy IO operations with many blocking reads (your thread blocks on a socket read) I suggest you redesign your code to use non-blocking IO APIs. Typically this will involve one "selector" thread that will monitor the activity of thousands of sockets and a small number of worker threads that will do the processing. If you code is in Java, the APIs are NIO. The only blocking call will be when you call selector.select() and it will only block if there is nothing to be processed on any of the thousands of sockets. Event-driven frameworks such as netty.io use this model and have proven to be very scalable and to best use the hardware resources of the system.




回答4:


I say use something like Akka manage the threads for u. Use Jersey http client lib with non blocking IO which works with callback if i remember correctly. It's possibly the ideal setting for that type of tasks.



来源:https://stackoverflow.com/questions/6065374/is-there-a-way-to-determine-the-ideal-number-of-threads

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!