Hadoop input split size vs block size

后端 未结 7 910
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-01 01:27

I am going through hadoop definitive guide, where it clearly explains about input splits. It goes like

Input splits doesn’t contain actual data, rath

相关标签:
7条回答
  • 2020-12-01 02:24

    Input splits are a logical division of your records whereas HDFS blocks are a physical division of the input data. It’s extremely efficient when they’re the same, but in practice it’s never perfectly aligned. Records may cross block boundaries. Hadoop guarantees the processing of all records . A machine processing a particular split may fetch a fragment of a record from a block other than its “main” block and which may reside remotely. The communication cost for fetching a record fragment is inconsequential because it happens relatively rarely.

    0 讨论(0)
提交回复
热议问题