How to bootstrap installation of Python modules on Amazon EMR?

后端 未结 4 1097
孤独总比滥情好
孤独总比滥情好 2020-12-01 07:11

I want to do something really basic, simply fire up a Spark cluster through the EMR console and run a Spark script that depends on a Python package (for example, Arrow). Wha

4条回答
  •  长情又很酷
    2020-12-01 08:02

    The most straightforward way would be to create a bash script containing your installation commands, copy it to S3, and set a bootstrap action from the console to point to your script.

    Here's an example I'm using in production:

    s3://mybucket/bootstrap/install_python_modules.sh

    #!/bin/bash -xe
    
    # Non-standard and non-Amazon Machine Image Python modules:
    sudo pip install -U \
      awscli            \
      boto              \
      ciso8601          \
      ujson             \
      workalendar
    
    sudo yum install -y python-psycopg2
    

提交回复
热议问题