I am going through hadoop definitive guide, where it clearly explains about input splits. It goes like
Input splits doesn’t contain actual data, rath
To 1) and 2): i'm not 100% sure, but if the task cannot complete - for whatever reason, including if something is wrong with the input split - then it is terminated and another one started in it's place: so each maptask gets exactly one split with file info (you can quickly tell if this is the case by debugging against a local cluster to see what information is held in the input split object: I seem to recall it's just the one location).
to 3): if the file format is splittable, then Hadoop will attempt to cut the file down to "inputSplit" size chunks; if not, then it's one task per file, regardless of the file size. If you change the value of minimum-input-split, then you can prevent there being too many mapper tasks that are spawned if each of your input files are divided into the block size, but you can only combine inputs if you do some magic with the combiner class (I think that's what it's called).