While submit job with pyspark, how to access static files upload with --files argument?

后端 未结 3 1056
佛祖请我去吃肉
佛祖请我去吃肉 2021-02-08 00:53

for example, i have a folder:

/
  - test.py
  - test.yml

and the job is submited to spark cluster with:

gcloud beta dataproc jobs

3条回答
  •  我寻月下人不归
    2021-02-08 01:30

    Files distributed using SparkContext.addFile (and --files) can be accessed via SparkFiles. It provides two methods:

    • getRootDirectory() - returns root directory for distributed files
    • get(filename) - returns absolute path to the file

    I am not sure if there are any Dataproc specific limitations but something like this should work just fine:

    from pyspark import SparkFiles
    
    with open(SparkFiles.get('test.yml')) as test_file:
        logging.info(test_file.read())
    

提交回复
热议问题