Sqoop Incremental Import

后端未结

关注

 8  1786

别那么骄傲 2021-01-30 15:27

Need advice on Sqoop Incremental Imports. Say I have a Customer with Policy 1 on Day 1 and I imported those records in HDFS on Day 1 and I see them in Part Files.
On Day 2,

8条回答

没有蜡笔的小新 (楼主)

2021-01-30 15:51

let's take example here, you are having customer table with two columns cust_id and policy, also custid is your primary key and you just want to insert data cust id 100 onward

scenario 1:- append new data on the basis of cust_id field

phase1:-

below 3 records are there which are inserted recently in customer table which we want to import in HDFS

| custid | Policy |
| 101 | 1 |
| 102 | 2 | 
| 103 | 3 |

here is sqoop command for that

sqoop import \ 
--connect jdbc:mysql://localhost:3306/db \ 
--username root -P \ 
--table customer \ 
--target-dir /user/hive/warehouse// \
--append \
--check-column custid \
--incremental append \
--last-value 100
phase2:-
below 4 records are there which are inserted recently in customer table which we want to import in HDFS
| custid | Policy |
| 104 | 4 |
| 105 | 5 | 
| 106 | 6 | 
| 107 | 7 | 
here is sqoop command for that
sqoop import \ 
--connect jdbc:mysql://localhost:3306/db \
--username root -P \ 
--table customer \ 
--target-dir /user/hive/warehouse//








 \ 
--append \
--check-column custid \
--incremental append \
--last-value 103
so these four properties we will have to cosider for inserting new records
--append \
--check-column  \
--incremental append \
--last-value 
scenario 2:- append new data +update existing data on the basis of cust_id field
below 1 new record with cust id 108 has inserted and cust id 101 and 102 has updated recently in customer table which we want to import in HDFS
| custid | Policy |
| 108 | 8 |
| 101 | 11 | 
| 102 | 12 | 

sqoop import \ 
--connect jdbc:mysql://localhost:3306/db \ 
--username root -P \ 
--table customer \ 
--target-dir /user/hive/warehouse//










 \
--append \
--check-column custid \
--incremental lastmodified \
--last-value 107
so these four properties we will have to cosider for insert/update records in same command
--append \
--check-column  \
--incremental lastmodified \
--last-value 
I am specifically mentioning primary key as if table is not having primary key then few more properties needs to be consider which are:-
multiple mapper perform the sqoop job by default so mapper need data to be split on the basis of some key so
either we have to specifically define --m 1 option to say that only one mapper will perform this operation 
or we have to specify any other key (by using sqoop property --split-by ) through with you can uniquely identify the data then you can use 

                                                        

              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它8个回答


            

                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                

          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复
            
          
        

          
 
     
 
        热议问题