what are the following commands in sqoop?

后端 未结 5 1916
礼貌的吻别
礼貌的吻别 2020-12-23 23:10

Can anyone tell me what is the use of --split-by and boundary query in sqoop?

sqoop import --connect jdbc:mysql://localhost/my --username user --passw

5条回答
  •  庸人自扰
    2020-12-24 00:06

    Split by :

    1. why it is used? -> to enhance the speed while fetching the data from rdbms to hadoop
    2. How it works? -> By default there are 4 mappers in sqoop , so the import works parallely. The entire data is divided into equal partitions. Sqoop considers primary key column for splitting the data and then finds out the maximum and minimum range from it and then makes the 4 ranges for 4 mappers to work. Eg. 1000 records in primary key column and max value =1000 and min value -0 so sqoop will create 4 ranges - (0-250) , (250-500),(500-750),(750-1000) and depending on values of column the data will be partitioned and given to 4 mappers to store it on HDFS. so if in case the primary key column is not evenly distributed so with split-by you can change the column-name for evenly partitioning.

    In short: Used for partitioning of data to support parallelism and improve performance

提交回复
热议问题