Optimal Block Size for a hadoop Cluster

为君一笑 提交于 2019-12-13 04:43:02

问题


I am working on a four node multi cluster in hadoop. I have run a series of experiments with the block sizes as follows and calculated run time as follows.

All of them are performed on 20GB input file. 64MB - 32 min, 128MB - 19 Min, 256MB - 15 min, 1GB - 12.5 min.

Should I proceed further in going for 2GB block size? Also kindly explain an optimal block size if similar operations are performed on 90GB file. Thanks!


回答1:


You should test with 2Gb and compare results.

Only you consider the next: More biggest block size minimize the overhead of create maps tasks, but for non-local tasks, Hadoop need transfer all the block to the remote node (network bandwidth limit here), then more smallest block size perform better here.

In your case, 4 nodes (I assume connected by a switch or router local in a LAN), 2Gb isn't a problem. But the answer isn't true in others enviroments, which more error rate.



来源:https://stackoverflow.com/questions/28145178/optimal-block-size-for-a-hadoop-cluster

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!