why it is used? -> to enhance the speed while fetching the data from rdbms to hadoop
How it works? -> By default there are 4 mappers in sqoop , so the import works parallely. The entire data is divided into equal partitions. Sqoop considers primary key column for splitting the data and then finds out the maximum and minimum range from it and then makes the 4 ranges for 4 mappers to work.
Eg. 1000 records in primary key column and max value =1000 and min value -0 so sqoop will create 4 ranges - (0-250) , (250-500),(500-750),(750-1000) and depending on values of column the data will be partitioned and given to 4 mappers to store it on HDFS.
so if in case the primary key column is not evenly distributed so with split-by you can change the column-name for evenly partitioning.
In short:Used for partitioning of data to support parallelism and improve performance