问题
I have tried a simple Map/Reduce task using Amazon Elastic MapReduce
and it took just 3 mins to complete the task. Is it possible to re-use the same instance to run another task.
Even though I have just used the instance for 3 mins Amazon will charge for 1 hr
, so I want to use the balance 57 mins to run several other tasks.
回答1:
The answer is yes.
here's how you do it using the command line client:
When you create an instance pass the --alive flag, this tells emr to keep the cluster around after your job has run.
Then you can submit more tasks to the cluster:
elastic-mapreduce --jobflow <job-id> --stream --input <s3dir> --output <s3dir> --mapper <script1> --reducer <script2>
To terminate the cluster later, simply run:
elastic-mapreduce <jobid> --terminate
try running elastic-mapreduce --help to see all the commands you can run.
If you don't have the command line client, get it here.
回答2:
Using:
elastic-mapreduce --jobflow job-id \
--jar s3n://some-path/x.jar \
--step-name "New step name" \
--args ...
you can also add non-streaming steps to your cluster. (just so you don't have to try it your yourself ;-) )
回答3:
http://aws.amazon.com/elasticmapreduce/faqs/#dev-6
Q: Can I run a persistent job flow? Yes. Amazon Elastic MapReduce job flows that are started with the –alive flag will continue until explicitly terminated. This allows customers to add steps to a job flow on demand. You may want to use this to debug your job flow logic without having to repeatedly wait for job flow startup. You may also use a persistent job flow to run a long-running data warehouse cluster. This can be combined with data warehouse and analytics packages that runs on top of Hadoop such as Hive and Pig.
来源:https://stackoverflow.com/questions/6880283/re-use-amazon-elastic-mapreduce-instance