I am going through hadoop definitive guide, where it clearly explains about input splits. It goes like
Input splits doesn’t contain actual data, rath
Input splits are a logical division of your records whereas HDFS blocks are a physical division of the input data. It’s extremely efficient when they’re the same, but in practice it’s never perfectly aligned. Records may cross block boundaries. Hadoop guarantees the processing of all records . A machine processing a particular split may fetch a fragment of a record from a block other than its “main” block and which may reside remotely. The communication cost for fetching a record fragment is inconsequential because it happens relatively rarely.