I know brief about hadoop
I am curious to know how does it work.
To be precise I want to know, how exactly it divides/splits the input file.
Does it
This is dependent on the InputFormat, which for most file-based formats is defined in the FileInputFormat base class.
There are a number of configurable options which denote how hadoop will take a single file and either process it as a single split, or divide the file into multiple splits:
InputFormat.isSplittable() implementation for your input format for more informationmapred.min.split.size and mapred.max.split.size which help the input format when breaking up blocks into splits. Note that the minimum size may be overriden by the input format (which may have a fixed minumum input size)If you want to know more, and are comfortable looking through the source, check out the getSplits() method in FileInputFormat (both the new and old api have the same method, but they may have some suttle differences).