I\'m using spark-redshift (https://github.com/databricks/spark-redshift) which uses avro for transfer.
Reading from Redshift is OK, while writing I\'m getting
A runntime conflict error in EMR related to Avro is very common. Avro is widely used and a lot of jars have it as a dependancy. I saw few variations of this question with different method in the 'NoSuchMethodError' or different Avro versions.
I failed to solve it with 'spark.executor.userClassPathFirst' flag, because I got LinkageError.
Here is the solution which solved the conflict for me:
When setting the EMR, add a bootstrap action which calls a bash script that download the specific Avro JAR:
#!/bin/bash
mkdir -p /home/hadoop/lib/
cd /home/hadoop/lib/
wget http://apache.spd.co.il/avro/avro-1.8.0/java/avro-1.8.0.jar
When setting the EMR, add the following configuration:
[
{"classification":"spark-defaults", "properties":{
"spark.driver.extraLibraryPath":"/home/hadoop/lib/avro-1.8.0.jar:/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*",
"spark.executor.extraClassPath":"/home/hadoop/lib/avro-1.8.0.jar:/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*",
"spark.driver.extraClassPath":"/home/hadoop/lib/avro-1.8.0.jar:/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*"},
"configurations":[]}
]
As you can see, I had to add my new library WITH the existing libraries. It didn't work otherwise.