Using Spark to write a parquet file to s3 over s3a is very slow

前端未结
关注
 4  1015
闹比i 2020-12-04 18:16
I\'m trying to write a parquet file out to Amazon S3 using Spark 1.6.1. The small parquet that I\'m generating is

      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   情书的邮戳
                                             
                
                
                (楼主)
            
              
              
                2020-12-04 18:49
              

            
            
                        
The direct output committer is gone from the spark codebase; you are to write your own/resurrect the deleted code in your own JAR. IF you do so, turn speculation off in your work, and know that other failures can cause problems too, where problem is "invalid data".

On a brighter note, Hadoop 2.8 is going to add some S3A speedups specifically for reading optimised binary formats (ORC, Parquet) off S3; see HADOOP-11694 for details. And some people are working on using Amazon Dynamo for the consistent metadata store which should be able to do a robust O(1) commit at the end of work.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复