问题
I created an AWS EMR Cluster through the regular EMR Cluster wizard on the AWS Management Console and I was able to select a security-configuration e.g., when you export the CLI command it's --security-configuration 'mySecurityConfigurationValue'.
I now need to create a similar EMR through the AWS Data Pipeline but I don't see any options where I can specify this security-configuration field.
The only similar fields I see are EmrManagedSlaveSecurityGroup, EmrManagedMasterSecurityGroup, AdditionalSlaveSecurityGroups, AdditionalMasterSecurityGroups, and SubnetId. I already have all of those filled out in my Pipeline configuration but I just need to also specify the security-configuration. Any thoughts?
回答1:
Unfortunately, DataPipeline does not support the Security Configurations feature (as well as other features that were introduced in the EMR 5.x versions like using a custom AMI).
One solution for this is to:
- Replace the
EmrClusterin your pipeline with an EC2 resource - Use a
ShellCommandActivityon the EC2 resource to run theaws emr create-clusterCLI command - Use a bootstrap step to install TaskRunner on the cluster
- Replace all the
runsOnproperties in your pipeline withworkerGroupso the tasks run on the EMR cluster you created in step 2 - Add a final
ShellCommandActivityat the end of the pipeline to terminate the cluster using CLI
Now since you are spinning up your cluster using the CLI you have access to all kinds of features like security configurations, custom AMI, instance fleets, etc. and you can still orchestrate the tasks using DataPipeline.
来源:https://stackoverflow.com/questions/50353136/security-configuration-field-for-aws-data-pipeline-emrcluster