Hadoop 2.9.2, Spark 2.4.0 access AWS s3a bucket

前端 未结 3 1074
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-31 18:25

It\'s been a couple of days but I could not download from public Amazon Bucket using Spark :(

Here is spark-shell command:

spark-shell           


        
3条回答
  •  没有蜡笔的小新
    2020-12-31 19:09

    I use spark 2.4.5 and this is what I did and it worked for me. I am able to connect to AWS s3 from Spark in my local.

    (1) Download spark 2.4.5 from here:https://archive.apache.org/dist/spark/spark-2.4.5/spark-2.4.5-bin-without-hadoop-scala-2.12.tgz. This spark does not have hadoop in it.
    (2) Download hadoop. https://archive.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
    (3) Update .bash_profile
    SPARK_HOME =  #example /home/spark-2.4.5/spark-2.4.5-bin-without-hadoop-scala-2.12
    PATH=$SPARK_HOME/bin
    (4) Add Hadoop in spark env
    Copy spark-env.sh.template as spark-env.sh
    add export SPARK_DIST_CLASSPATH=$( classpath)
    here  is path to your hadoop /home/hadoop-3.2.1/bin/hadoop
    

提交回复
热议问题