问题
I am creating clusters on EMR and configure Zeppelin to read the notebooks from S3. To do that I am using a json object that looks like that:
[
{
"Classification": "zeppelin-env",
"Properties": {
},
"Configurations": [
{
"Classification": "export",
"Properties": {
"ZEPPELIN_NOTEBOOK_STORAGE":"org.apache.zeppelin.notebook.repo.S3NotebookRepo",
"ZEPPELIN_NOTEBOOK_S3_BUCKET":"hs-zeppelin-notebooks",
"ZEPPELIN_NOTEBOOK_USER":"user"
},
"Configurations": [
]
}
]
}
]
I am pasting this object in the Stoftware configuration page of EMR: My question is, how/where I can configure the Spark interpreter directly without the need to manually configure it from Zeppelin each time I start a cluster?
回答1:
This is a bit involved, you will need to do 2 things:
- Edit the interpreter.json of Zeppelin
- Restart the interpreter
So what you need to do is write a shell script and then add an extra step to the EMR cluster configuration that runs this shell script.
The Zeppelin configuration is in json, you can use jq (a tool) to manipulate json. I don't know what you want to change exactly, but here is an example that adds the (mysteriously missing) DepInterpreter:
#!/bin/bash
# 1 edit the Spark interpreter
set -e
cat /etc/zeppelin/conf/interpreter.json | jq '.interpreterSettings."2ANGGHHMQ".interpreterGroup |= .+ [{"class":"org.apache.zeppelin.spark.DepInterpreter", "name":"dep"}]' | sudo -u zeppelin tee /etc/zeppelin/conf/interpreter.json
# Trigger restart of Spark interpreter
curl -X PUT http://localhost:8890/api/interpreter/setting/restart/2ANGGHHMQ
Put this shell script in a s3 bucket. Then start your EMR cluster with
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://eu-west-1.elasticmapreduce/libs/script-runner/script-runner.jar,Args=[s3://mybucket/script.sh]
回答2:
I suggest use Terraform to create your cluster there is a command :
configurations_json = "${file("config.json")}"
that can let you inject a json file as a configuration file for your emr cluster
https://www.terraform.io/docs/providers/aws/r/emr_cluster.html
regards
来源:https://stackoverflow.com/questions/45328671/configure-zeppelins-spark-interpreter-on-emr-when-starting-a-cluster