what are the following commands in sqoop?

后端 未结 5 1945
礼貌的吻别
礼貌的吻别 2020-12-23 23:10

Can anyone tell me what is the use of --split-by and boundary query in sqoop?

sqoop import --connect jdbc:mysql://localhost/my --username user --passw

5条回答
  •  北荒
    北荒 (楼主)
    2020-12-23 23:54

    Sqoop allows you to import data in parallel and --split-by and --boundary-query allow you more control. If you're just importing a table then it'll use the PRIMARY KEY however if you're doing a more advanced query, you'll need to specify the column to do the parallel split.

    i.e.,

      sqoop import \
        --connect 'jdbc:mysql://.../...' \
        --direct \
        --username uname --password pword \
        --hive-import \
        --hive-table query_import \
        --boundary-query 'SELECT 0, MAX(id) FROM a' \
        --query 'SELECT a.id, a.name, b.id, b.name FROM a, b WHERE a.id = b.id AND $CONDITIONS'\
        --num-mappers 3
        --split-by a.id \
        --target-dir /data/import \
        --verbose
    

    Boundary Query lets you specify an optimized query to get the max, min. else it will attempt to do MIN(a.id), MAX(a.id) ON your --query statement.

    The results will be (if min=0, max=30) is 3 queries that get run in parallel:

    SELECT a.id, a.name, b.id, b.name FROM a, b WHERE a.id = b.id AND a.id BETWEEN 0 AND 10;
    SELECT a.id, a.name, b.id, b.name FROM a, b WHERE a.id = b.id AND a.id BETWEEN 11 AND 20;
    SELECT a.id, a.name, b.id, b.name FROM a, b WHERE a.id = b.id AND a.id BETWEEN 21 AND 30;
    

提交回复
热议问题