elastic-map-reduce

Re-use Amazon Elastic MapReduce instance

孤街浪徒 提交于 2019-11-27 21:29:35
I have tried a simple Map/Reduce task using Amazon Elastic MapReduce and it took just 3 mins to complete the task. Is it possible to re-use the same instance to run another task. Even though I have just used the instance for 3 mins Amazon will charge for 1 hr , so I want to use the balance 57 mins to run several other tasks. Matthew Rathbone The answer is yes. here's how you do it using the command line client: When you create an instance pass the --alive flag, this tells emr to keep the cluster around after your job has run. Then you can submit more tasks to the cluster: elastic-mapreduce -

Why does Yarn on EMR not allocate all nodes to running Spark jobs?

回眸只為那壹抹淺笑 提交于 2019-11-27 20:02:22
问题 I'm running a job on Apache Spark on Amazon Elastic Map Reduce (EMR). Currently I'm running on emr-4.1.0 which includes Amazon Hadoop 2.6.0 and Spark 1.5.0. When I start the job, YARN correctly has allocated all the worker nodes to the spark job (with one for the driver, of course). I have the magic "maximizeResourceAllocation" property set to "true", and the spark property "spark.dynamicAllocation.enabled" also set to "true". However, if I resize the emr cluster by adding nodes to the CORE

Backup AWS Dynamodb to S3

天大地大妈咪最大 提交于 2019-11-27 18:00:46
It has been suggested on Amazon docs http://aws.amazon.com/dynamodb/ among other places, that you can backup your dynamodb tables using Elastic Map Reduce, I have a general understanding of how this could work but I couldn't find any guides or tutorials on this, So my question is how can I automate dynamodb backups (using EMR)? So far, I think I need to create a "streaming" job with a map function that reads the data from dynamodb and a reduce that writes it to S3 and I believe these could be written in Python (or java or a few other languages). Any comments, clarifications, code samples,

How do you use Python UDFs with Pig in Elastic MapReduce?

允我心安 提交于 2019-11-27 07:05:44
问题 I really want to take advantage of Python UDFs in Pig on our AWS Elastic MapReduce cluster, but I can't quite get things to work properly. No matter what I try, my pig job fails with the following exception being logged: ERROR 2998: Unhandled internal error. org/python/core/PyException java.lang.NoClassDefFoundError: org/python/core/PyException at org.apache.pig.scripting.jython.JythonScriptEngine.registerFunctions(JythonScriptEngine.java:127) at org.apache.pig.PigServer.registerCode

Backup AWS Dynamodb to S3

守給你的承諾、 提交于 2019-11-27 04:14:28
问题 It has been suggested on Amazon docs http://aws.amazon.com/dynamodb/ among other places, that you can backup your dynamodb tables using Elastic Map Reduce, I have a general understanding of how this could work but I couldn't find any guides or tutorials on this, So my question is how can I automate dynamodb backups (using EMR)? So far, I think I need to create a "streaming" job with a map function that reads the data from dynamodb and a reduce that writes it to S3 and I believe these could be

Re-use Amazon Elastic MapReduce instance

流过昼夜 提交于 2019-11-26 23:03:58
问题 I have tried a simple Map/Reduce task using Amazon Elastic MapReduce and it took just 3 mins to complete the task. Is it possible to re-use the same instance to run another task. Even though I have just used the instance for 3 mins Amazon will charge for 1 hr , so I want to use the balance 57 mins to run several other tasks. 回答1: The answer is yes. here's how you do it using the command line client: When you create an instance pass the --alive flag, this tells emr to keep the cluster around