发表新帖

发表新帖

Hadoop input split size vs block size

后端未结

关注

 7  928

爱一瞬间的悲伤

I am going through hadoop definitive guide, where it clearly explains about input splits. It goes like

Input splits doesn’t contain actual data, rath

相关标签:

7条回答

[愿得一人]

2020-12-01 02:24

Input splits are a logical division of your records whereas HDFS blocks are a physical division of the input data. It’s extremely efficient when they’re the same, but in practice it’s never perfectly aligned. Records may cross block boundaries. Hadoop guarantees the processing of all records . A machine processing a particular split may fetch a fragment of a record from a block other than its “main” block and which may reside remotely. The communication cost for fetching a record fragment is inconsequential because it happens relatively rarely.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2

热议问题