问题
I need to set a custom environment variable in EMR to be available when running a spark application.
I have tried adding this:
...
--configurations '[
{
"Classification": "spark-env",
"Configurations": [
{
"Classification": "export",
"Configurations": [],
"Properties": { "SOME-ENV-VAR": "qa1" }
}
],
"Properties": {}
}
]'
...
and also tried to replace "spark-env with hadoop-env
but nothing seems to work.
There is this answer from the aws forums. but I can't figure out how to apply it.
I'm running on EMR 5.3.1 and launch it with a preconfigured step from the cli: aws emr create-cluster...
回答1:
Add the custom configurations like below JSON to a file say, custom_config.json
[
{
"Classification": "spark-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"VARIABLE_NAME": VARIABLE_VALUE,
}
}
]
}
]
And, On creating the emr cluster, pass the file reference to the --configurations
option
aws emr create-cluster --configurations file://custom_config.json --other-options...
回答2:
For me replacing spark-env to yarn-env fixed issue.
来源:https://stackoverflow.com/questions/42395020/how-to-set-a-custom-environment-variable-in-emr-to-be-available-for-a-spark-appl