Spark History Server on S3A FileSystem: ClassNotFoundException

前端未结

关注

 3  2149

隐瞒了意图╮ 2021-02-06 12:03

Spark can use Hadoop S3A file system org.apache.hadoop.fs.s3a.S3AFileSystem. By adding the following into the conf/spark-defaults.conf, I can get spark

3条回答

太阳男子 (楼主)

2021-02-06 12:30
Did some more digging and figured it out. Here's what was wrong:
1. The JARs necessary for S3A can be added to $SPARK_HOME/jars (as described in SPARK-15965)
2. The line
```
spark.history.provider     org.apache.hadoop.fs.s3a.S3AFileSystem
```
  in $SPARK_HOME/conf/spark-defaults.conf will cause
```
Exception in thread "main" java.lang.NoSuchMethodException: org.apache.hadoop.fs.s3a.S3AFileSystem.(org.apache.spark.SparkConf)
```
  exception. That line can be safely removed as suggested in this answer.
To summarize:

I added the following JARs to $SPARK_HOME/jars:
- jets3t-0.9.3.jar (may be already present with your pre-built Spark binaries, seems to not matter which 0.9.x version)
- guava-14.0.1.jar (may be already present with your pre-built Spark binaries, seems to not matter which 14.0.x version)
- aws-java-sdk-1.7.4.jar (must be 1.7.4)
- hadoop-aws.jar (version 2.7.3) (probably should match the version of Hadoop in your Spark build)
and added this line to $SPARK_HOME/conf/spark-defaults.conf
```
spark.history.fs.logDirectory     s3a://spark-logs-test/
```
You'll need some other configuration to enable logging in the first place, but once the S3 bucket has the logs, this is the only configuration that is needed for the History Server.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...