avro error on AWS EMR

前端 未结 4 951
轻奢々
轻奢々 2021-01-27 01:52

I\'m using spark-redshift (https://github.com/databricks/spark-redshift) which uses avro for transfer.

Reading from Redshift is OK, while writing I\'m getting



        
4条回答
  •  甜味超标
    2021-01-27 02:14

    A runntime conflict error in EMR related to Avro is very common. Avro is widely used and a lot of jars have it as a dependancy. I saw few variations of this question with different method in the 'NoSuchMethodError' or different Avro versions.

    I failed to solve it with 'spark.executor.userClassPathFirst' flag, because I got LinkageError.

    Here is the solution which solved the conflict for me:

    1. Use Intellij's Dependancy Analyzer (Maven plugin) to exclude Avro from all dependancies which cause conflict.
    2. When setting the EMR, add a bootstrap action which calls a bash script that download the specific Avro JAR:

      #!/bin/bash

      mkdir -p /home/hadoop/lib/
      cd /home/hadoop/lib/
      wget http://apache.spd.co.il/avro/avro-1.8.0/java/avro-1.8.0.jar
      
    3. When setting the EMR, add the following configuration:

      [
      {"classification":"spark-defaults", "properties":{
      "spark.driver.extraLibraryPath":"/home/hadoop/lib/avro-1.8.0.jar:/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*", 
      "spark.executor.extraClassPath":"/home/hadoop/lib/avro-1.8.0.jar:/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*", 
      "spark.driver.extraClassPath":"/home/hadoop/lib/avro-1.8.0.jar:/usr/lib/hadoop/*:/usr/lib/hadoop/../hadoop-hdfs/*:/usr/lib/hadoop/../hadoop-mapreduce/*:/usr/lib/hadoop/../hadoop-yarn/*:/etc/hive/conf:/usr/lib/hadoop/../hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*"}, 
      "configurations":[]}
      ]
      

    As you can see, I had to add my new library WITH the existing libraries. It didn't work otherwise.

提交回复
热议问题