pyspark error: : java.io.IOException: No FileSystem for scheme: gs

前端未结

关注

 1  1551

I am trying to read a json file from a google bucket into a pyspark dataframe on a local spark machine. Here\'s the code:

import pandas as pd
import numpy a


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2020-12-21 17:23
              
            
            
                                                                       
Some config params are required to recognize "gs" as a distributed filesystem.
Use this setting for google cloud storage connector, gcs-connector-hadoop2-latest.jar
spark = SparkSession \
        .builder \
        .config("spark.jars", "/path/to/gcs-connector-hadoop2-latest.jar") \
        .getOrCreate()

Other configs that can be set from pyspark
spark._jsc.hadoopConfiguration().set('fs.gs.impl', 'com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem')
# This is required if you are using service account and set true, 
spark._jsc.hadoopConfiguration().set('fs.gs.auth.service.account.enable', 'ture')
spark._jsc.hadoopConfiguration().set('google.cloud.auth.service.account.json.keyfile', "/path/to/keyfile")
# Following are required if you are using oAuth
spark._jsc.hadoopConfiguration().set('fs.gs.auth.client.id', 'YOUR_OAUTH_CLIENT_ID')
spark._jsc.hadoopConfiguration().set('fs.gs.auth.client.secret', 'OAUTH_SECRET')

Alternatively you can set up these configs in core-site.xml or spark-defaults.conf.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复