Spark + s3 - error - java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found

前端 未结 4 1172
抹茶落季
抹茶落季 2020-12-20 14:35

I have a spark ec2 cluster where I am submitting a pyspark program from a Zeppelin notebook. I have loaded the hadoop-aws-2.7.3.jar and aws-java-sdk-1.11.179.jar and place

4条回答
  •  -上瘾入骨i
    2020-12-20 15:01

    I was able to address the above to make sure I had the correct versions of the hadoop aws jar per the version of spark hadoop that I was running, downloading the correct version of aws-java-sdk, and lastly downloading the dependency jets3t library

    In the /opt/spark/jars

    sudo wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.11.30/aws-java-sdk-1.11.30.jar
    sudo wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.3/hadoop-aws-2.7.3.jar
    sudo wget https://repo1.maven.org/maven2/net/java/dev/jets3t/jets3t/0.9.4/jets3t-0.9.4.jar
    

    Testing it out

    scala> sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", [ACCESS KEY ID])
    scala> sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", [SECRET ACCESS KEY] )
    scala> val myRDD = sc.textFile("s3n://adp-px/baby-names.csv")
    scala> myRDD.count()
    res2: Long = 49
    

提交回复
热议问题