How to bootstrap installation of Python modules on Amazon EMR?

后端 未结 4 1095
孤独总比滥情好
孤独总比滥情好 2020-12-01 07:11

I want to do something really basic, simply fire up a Spark cluster through the EMR console and run a Spark script that depends on a Python package (for example, Arrow). Wha

4条回答
  •  孤城傲影
    2020-12-01 07:51

    Depending if you are using Python 2 (default in EMR) or Python 3, the pip install command should be different. As recommended in noli's answer, you should create a shell script, upload it to a bucket in S3, and use it as a Bootstrap action.

    For Python 2 (in Jupyter: used as default for pyspark kernel):

    #!/bin/bash -xe
    sudo pip install your_package
    

    For Python 3 (in Jupyter: used as default for Python 3 and pyspark3 kernel):

    #!/bin/bash -xe
    sudo pip-3.4 install your_package
    

提交回复
热议问题