How to submit a SPARK job of which the jar is hosted in S3 object store

问题

I have a SPARK cluster with Yarn, and I want to put my job's jar into a S3 100% compatible Object Store. If I want to submit the job, I search from google and seems that just simply as this way: spark-submit --master yarn --deploy-mode cluster <...other parameters...> s3://my_ bucket/jar_file However the S3 Object Store required user name and password to access. So how I can config those credential information to let SPARRK download the jar from S3? Many thanks!

回答1:

You can use Default Credential Provider Chain from AWS docs:

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
./bin/spark-submit \
    --master local[2] \
    --class org.apache.spark.examples.SparkPi \
    s3a://your_bucket/.../spark-examples_2.11-2.4.6-SNAPSHOT.jar

I needed to download the following jars from Maven and put it to Spark jar dir in order to allow to use s3a schema in spark-submit (note, you can use --packages directive to reference these dependencies from inside your jar, but not from spark-submit itself):

// build Spark `assembly` project
sbt "project assembly" package
cd assembly/target/scala-2.11/jars/ 
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar 
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.7/hadoop-aws-2.7.7.jar

来源：https://stackoverflow.com/questions/60900601/how-to-submit-a-spark-job-of-which-the-jar-is-hosted-in-s3-object-store

标签

amazon-s3

spark-submit

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!