how to install custom packages on amazon EMR bootstrap action in code?

做~自己de王妃 提交于 2019-12-05 08:33:55

There is a class boto.emr.bootstrap_action.BootstrapAction for the bootstrap action.

Define it like the below. Most of the code is from the boto example page.

import boto.emr
from boto.emr.bootstrap_action import BootstrapAction

action = BootstrapAction(name="Bootstrap to add SimpleCV",
                         path="s3n://<my bucket uri>/bootstrap-simplecv.sh")

conn = boto.emr.connect_to_region('us-west-2')
jobid = conn.run_jobflow(name='My jobflow',
                         log_uri='s3://<my log uri>/jobflow_logs',
                         steps=[step],  # step defined elsewhere
                         bootstrap_actions=[action])

And you need to define the bootstrap action. If you need another version of Python then yes, it would save time to precompile it on the exact same computer, tar it, put it in an S3 bucket, and then untar it during the bootstrap.

#!/bin/sh
# filename: bootstrap-simplecv.sh  (save it in an S3 bucket)
set -e -x

sudo apt-get install python-setuptools
sudo easy_install pip 
sudo pip install -U SimpleCV

I think you can leave EMR instances spinning from within boto so that the bootstrap only occurs the first time in your session. Just be careful to shut them down before you log out so you don't get a surprise on your bill.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!