What does it mean to say a web crawler is I/O bound and not CPU bound?

匿名 (未验证) 提交于 2019-12-03 08:52:47

问题:

I've seen this in some answers on S/O where the point is made that the programming language doesn't matter as much for a crawler and so C++ is overkill vs say Python. Can someone please explain this in layman's terms so that there's no ambiguity about what is implied? Clarification of the underlying assumption here is also appreciated.

Thanks

回答1:

It means that I/O is the bottleneck here. The act of going out to the net to retrieve a page (I/O) is slower than analysing the page (CPU).

So, making the CPU bit ten times faster will have little effect on the overall time taken. On the other hand, doubling the I/O speed will have a very beneficial effect, right up to the point where CPU starts being the bottleneck.



回答2:

It means that the program takes more time reading and writing (via disk or network) then it does actually running the algorithms in the code. I/O is vastly slower than most CPUs, and using it will usually slow down a program greatly.



回答3:

One thing to add is that during Input/Output operations your program (unless poorly written) isn't actively using the CPU, it's in inactive state (sleep).



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!