on Amazon EMR 4.0.0, setting /etc/spark/conf/spark-env.conf is ineffective

前端 未结 2 1259
梦如初夏
梦如初夏 2021-01-23 03:56

I\'m launching my spark-based hiveserver2 on Amazon EMR, which has an extra classpath dependency. Due to this bug in Amazon EMR:

https://petz2000.wordpress.com/2015/08/1

相关标签:
2条回答
  • 2021-01-23 04:07

    Have you tried setting spark.driver.extraClassPath in spark-defaults? Something like this:

    [
      {
        "Classification": "spark-defaults",
        "Properties": {
          "spark.driver.extraClassPath": "${SPARK_CLASSPATH}:${HADOOP_HOME}/*:${HADOOP_HOME}/../hadoop-hdfs/*:${HADOOP_HOME}/../hadoop-mapreduce/*:${HADOOP_HOME}/../hadoop-yarn/*:/home/hadoop/git/datapassport/*"
        }
      }
    ]
    
    0 讨论(0)
  • 2021-01-23 04:22

    You can use the --driver-classpath.

    Start a spark-shell on the master node from a fresh EMR cluster.

    spark-shell --master yarn-client
    scala> sc.getConf.get("spark.driver.extraClassPath")
    res0: String = /etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
    

    Add your JAR files to the EMR cluster using a --bootstrap-action.

    When you call spark-submit prepend (or append) your JAR files to the value of extraClassPath you got from spark-shell

    spark-submit --master yarn-cluster --driver-classpath /home/hadoop/my-custom-jar.jar:/etc/hadoop/conf:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-yarn/*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*
    

    This worked for me using EMR release builds 4.1 and 4.2.

    The process for building spark.driver.extraClassPath may change between releases, which may be the reason why SPARK_CLASSPATH doesn't work anymore.

    0 讨论(0)
提交回复
热议问题