Why submitting job to mapreduce takes so much time in General?

后端未结

关注

 3  1416

南方客 2020-12-19 07:19

So usually for 20 node cluster submitting job to process 3GB(200 splits) of data takes about 30sec and actual execution about 1m. I want to understand what is the bottleneck

3条回答

暖寄归人 (楼主)

2020-12-19 07:49
I have seen similar issue and I can state the solution to be broken in following steps :
1. When the HDFS stores too many small files with fixed chunk size, there will be issues on efficiency in HDFS, the best way would be to remove all unnecessary files and small files having data. Try again.
2. Try with the data nodes and name nodes:
  - Stop all the services using stop-all.sh.
  - Format name-node
  - Reboot machine
  - Start all services using start-all.sh
  - Check data and name nodes.
3. Try installing lower version of hadoop (hadoop 2.5.2) which worked in two cases and it worked in hit and trial.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...