发表新帖

发表新帖

avro error on AWS EMR

前端未结

关注

 4  954

轻奢々 2021-01-27 01:52

I\'m using spark-redshift (https://github.com/databricks/spark-redshift) which uses avro for transfer.

Reading from Redshift is OK, while writing I\'m getting

4条回答

死守一世寂寞 (楼主)

2021-01-27 02:13

Jonathan from EMR here. Part of the problem is that Hadoop depends upon Avro 1.7.4, and the full Hadoop classpath is included in the Spark path on EMR. It might help for us to upgrade Hadoop's Avro dependency to 1.7.7 so that it matches with Spark's Avro dependency, though I'm a little afraid that this might break something else, but I can try it out anyway.

BTW, one problem I noticed with your example EMR cluster config is that you're using the "spark-env" config classification, whereas the "spark-defaults" classification would be the appropriate one for setting spark.{driver,executor}.userClassPathFirst. I'm not sure this by itself would solve your problem though.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题