avro error on AWS EMR

前端未结

关注

 4  951

轻奢々 2021-01-27 01:52

I\'m using spark-redshift (https://github.com/databricks/spark-redshift) which uses avro for transfer.

Reading from Redshift is OK, while writing I\'m getting

4条回答

甜味超标 (楼主)

2021-01-27 02:14

A runntime conflict error in EMR related to Avro is very common. Avro is widely used and a lot of jars have it as a dependancy. I saw few variations of this question with different method in the 'NoSuchMethodError' or different Avro versions.

I failed to solve it with 'spark.executor.userClassPathFirst' flag, because I got LinkageError.

Here is the solution which solved the conflict for me:

Use Intellij's Dependancy Analyzer (Maven plugin) to exclude Avro from all dependancies which cause conflict.
When setting the EMR, add a bootstrap action which calls a bash script that download the specific Avro JAR:

#!/bin/bash
```
mkdir -p /home/hadoop/lib/
cd /home/hadoop/lib/
wget http://apache.spd.co.il/avro/avro-1.8.0/java/avro-1.8.0.jar
```

When setting the EMR, add the following configuration:

[
{"classification":"spark-defaults", "properties":{
"spark.driver.extraLibraryPath":"/home/hadoop/lib/avro-1.8.0.jar:/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*", 
"spark.executor.extraClassPath":"/home/hadoop/lib/avro-1.8.0.jar:/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*", 
"spark.driver.extraClassPath":"/home/hadoop/lib/avro-1.8.0.jar:/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*"}, 
"configurations":[]}
]

As you can see, I had to add my new library WITH the existing libraries. It didn't work otherwise.

0 讨论(0)

查看其它4个回答