How to read a file from HDFS in map() quickly with Spark

后端未结

关注

 1  377

I need to read a different file in every map() ,the file is in HDFS

  val rdd=sc.parallelize(1 to 10000)
  val rdd2=rdd.map{x=>
    val hdfs = org.apache.


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  鱼传尺愫        
                
              
                            
                2020-12-18 12:04
              
            
            
                                                                       
In your case, I recommend the use of wholeTextFiles method wich will return pairRdd with the key is the file full path, and the value is the content of the file in string.
val filesPariRDD = sc.wholeTextFiles("hdfs://ITS-Hadoop10:9000/")
val filesLineCount = filesPariRDD.map( x => (x._1, x._2.length ) ) //this will return a map of fileName , number of lines of each file. You could apply any other function on the file contents
filesLineCount.collect() 

Edit
If your files are in directories which are under the same directory ( as mentioned in comments)you could use some kind of regular expression
val filesPariRDD = sc.wholeTextFiles("hdfs://ITS-Hadoop10:9000/*/")

Hope this is clear and helpful
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复