Zeppelin + Spark: Reading Parquet from S3 throws NoSuchMethodError: com.fasterxml.jackson

问题

Using Zeppelin 0.7.2 binaries from the main download, and Spark 2.1.0 w/ Hadoop 2.6, the following paragraph:

val df = spark.read.parquet(DATA_URL).filter(FILTER_STRING).na.fill("")

Produces the following:

java.lang.NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
  at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<init>(ScalaNumberDeserializersModule.scala:49)
  at com.fasterxml.jackson.module.scala.deser.NumberDeserializers$.<clinit>(ScalaNumberDeserializersModule.scala)
  at com.fasterxml.jackson.module.scala.deser.ScalaNumberDeserializersModule$class.$init$(ScalaNumberDeserializersModule.scala:61)
  at com.fasterxml.jackson.module.scala.DefaultScalaModule.<init>(DefaultScalaModule.scala:20)
  at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<init>(DefaultScalaModule.scala:37)
  at com.fasterxml.jackson.module.scala.DefaultScalaModule$.<clinit>(DefaultScalaModule.scala)
  at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
  at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
  at org.apache.spark.SparkContext.withScope(SparkContext.scala:701)
  at org.apache.spark.SparkContext.parallelize(SparkContext.scala:715)
  at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.mergeSchemasInParallel(ParquetFileFormat.scala:594)
  at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.inferSchema(ParquetFileFormat.scala:235)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:184)
  at scala.Option.orElse(Option.scala:289)
  at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$getOrInferFileFormatSchema(DataSource.scala:183)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:387)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425)
  ... 47 elided

This error does not happen in the normal spark-shell, only in Zeppelin. I have attempted the following fixes, which do nothing:

Download jackson 2.6.2 jars to the zeppelin lib folder and restart
Add jackson 2.9 dependencies from the maven repositories to the interpreter settings
Deleting the jackson jars from the zeppelin lib folder

Googling is turning up no similar situations. Please don't hesitate to ask for more information, or make suggestions. Thanks!

回答1:

I had the same problem. I added com.amazonaws:aws-java-sdk and org.apache.hadoop:hadoop-aws as dependencies for the Spark interpreter. These dependencies bring in their own versions of com.fasterxml.jackson.core:* and conflict with Spark's.

You also must exclude com.fasterxml.jackson.core:* from other dependencies, this is an example ${ZEPPELIN_HOME}/conf/interpreter.json Spark interpreter depenency section:

"dependencies": [ { "groupArtifactVersion": "com.amazonaws:aws-java-sdk:1.7.4", "local": false, "exclusions": ["com.fasterxml.jackson.core:*"] }, { "groupArtifactVersion": "org.apache.hadoop:hadoop-aws:2.7.1", "local": false, "exclusions": ["com.fasterxml.jackson.core:*"] } ]

回答2:

Another way is to include it right in the notebook cell:

%dep
z.load("com.fasterxml.jackson.core:jackson-core:2.6.2")

来源：https://stackoverflow.com/questions/45511804/zeppelin-spark-reading-parquet-from-s3-throws-nosuchmethoderror-com-fasterxm

标签

apache-spark

apache-zeppelin