Submit a Python project to Dataproc job

后端 未结 2 2026
别那么骄傲
别那么骄傲 2021-01-20 18:16

I have a python project, whose folder has the structure

main_directory - lib - lib.py
               - run - script.py

script.py

2条回答
  •  旧巷少年郎
    2021-01-20 18:51

    If you want to preserve project structure when submitting Dataroc job then you should package your project into a .zip file and specify it in --py-files parameter when submitting a job:

    gcloud dataproc jobs submit pyspark --cluster=$CLUSTER_NAME --region=$REGION \
      --py-files lib.zip \
      run/script.py
    

    To create zip archive you need to run script:

    cd main_directory/
    zip -x run/script.py -r libs.zip .
    

    Refer to this blog post for more details on how to package dependencies in zip archive for PySpark jobs.

提交回复
热议问题