发表新帖

发表新帖

what are the following commands in sqoop?

后端未结

关注

 5  1916

礼貌的吻别 2020-12-23 23:10

Can anyone tell me what is the use of --split-by and boundary query in sqoop?

sqoop import --connect jdbc:mysql://localhost/my --username user --passw

5条回答

庸人自扰 (楼主)

2020-12-24 00:06
Split by :
1. why it is used? -> to enhance the speed while fetching the data from rdbms to hadoop
2. How it works? -> By default there are 4 mappers in sqoop , so the import works parallely. The entire data is divided into equal partitions. Sqoop considers primary key column for splitting the data and then finds out the maximum and minimum range from it and then makes the 4 ranges for 4 mappers to work. Eg. 1000 records in primary key column and max value =1000 and min value -0 so sqoop will create 4 ranges - (0-250) , (250-500),(500-750),(750-1000) and depending on values of column the data will be partitioned and given to 4 mappers to store it on HDFS. so if in case the primary key column is not evenly distributed so with split-by you can change the column-name for evenly partitioning.
In short: Used for partitioning of data to support parallelism and improve performance
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题