Re-use Amazon Elastic MapReduce instance

孤街浪徒 提交于 2019-11-27 21:29:35
Matthew Rathbone

The answer is yes.

here's how you do it using the command line client:

When you create an instance pass the --alive flag, this tells emr to keep the cluster around after your job has run.

Then you can submit more tasks to the cluster:

elastic-mapreduce --jobflow <job-id> --stream --input <s3dir> --output <s3dir> --mapper <script1> --reducer  <script2>

To terminate the cluster later, simply run:

elastic-mapreduce <jobid> --terminate

try running elastic-mapreduce --help to see all the commands you can run.

If you don't have the command line client, get it here.

Using:

elastic-mapreduce --jobflow job-id \
    --jar s3n://some-path/x.jar \
    --step-name "New step name" \
    --args ...

you can also add non-streaming steps to your cluster. (just so you don't have to try it your yourself ;-) )

http://aws.amazon.com/elasticmapreduce/faqs/#dev-6

Q: Can I run a persistent job flow? Yes. Amazon Elastic MapReduce job flows that are started with the –alive flag will continue until explicitly terminated. This allows customers to add steps to a job flow on demand. You may want to use this to debug your job flow logic without having to repeatedly wait for job flow startup. You may also use a persistent job flow to run a long-running data warehouse cluster. This can be combined with data warehouse and analytics packages that runs on top of Hadoop such as Hive and Pig.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!