问题
i'm trying to learn how to run a java Map/Reduce (M/R) job on amazon's EMR. the documentation that i am following is here http://aws.amazon.com/articles/3938. i am on a windows 7 computer.
when i try to run this command, i am shown the help information.
./elasticmapreduce-client.rb RunJobFlow streaming_jobflow.json
of course, since i am on a windows machine, i actually type in this command. i am not sure why, but for this particular command, there was not a windows version (all commands where shown in pairs, one for *nix and one for windows).
ruby elastic-mapreduce RunJobFlow my_job.json
my question is how do we submit/run a job from windows to amazon's EMR using the command line interface (on windows)? i've tried searching online, but i get taken to wild places. any help is appreciated.
thanks.
回答1:
Hmmm. I'm not sure how old the example with RunJobFlow is... I'd personally ignore it.
Are you able to run?
localhost$ elastic-mapreduce --describe
Once you can then you should play directly on a cluster to shake out the exact steps you need to do... It's worth doing this so you don't have to start/stop a cluster a bazillion times.
localhost$ elastic-mapreduce --create --alive --num-instances 1
localhost$ elastic-mapreduce -j j-YOUR_ID_HERE --ssh
cluster$ hadoop jar my.jar -D some=1 -D args=1 blah blah
cluster$ hadoop jar some_other_jar.jar -D foo -D bar
cluster$ ^D
localhost$ elastic-mapreduce -j j-YOUR_ID_HERE --terminate
Then when you're happy with the steps and you need to have it run headless (say, from cron) you can have the EMR orchestrate the steps (including the cluster self terminating at the end)
localhost$ elastic-mapreduce --create --num-instances 1
localhost$ elastic-mapreduce --jar my_jar.jar --args "-D,some=1,-D,args=1,blah,blah"
localhost$ elastic-mapreduce --jar some_other_jar.jar --args "-D,foo,-D,bar"
I'd only explore the --json stuff if you need more complex steps, it's a bit cryptic and hard to get right first time...
回答2:
To run a streaming job on EMR, first you will need to create a cluster by a command like :
ruby elastic-mapreduce --create --alive --plain-output --master-instance-type m1.xlarge
--slave-instance-type m1.xlarge --num-instances 6 --name "Some Job Cluster" --bootstrap-action s3://<path-to-a-bootstrap-script>
This would return a jobid, which would look something like : j-ABCD7EF763
Now you can submit you job step by following command:
ruby elastic-mapreduce -j j-ABCD7EF763 --stream --step-name "my step name" --mapper
s3://<some-path>/mapper-script.rb --reducer s3://<some=path>/reducer-script.rb --input
s3://<input-path> --output s3://<output-path>
You can also direct run a job instead of running a streaming job, in which case the cluster will terminate itself when the job ends.
回答3:
Try using the --json option.
e.g. ./elastic-mapreduce --create --name Multisteps --json wordcount_jobflow.json
You will need to trim your json file with only the Steps (removing everything outside the []). There is a thread discussing that: https://forums.aws.amazon.com/thread.jspa?threadID=35093
来源:https://stackoverflow.com/questions/9621579/how-to-run-a-mapreduce-job-on-amazons-elastic-mapreduce-emr-cluster-from-wind