Use Spark fileoutputcommitter.algorithm.version=2 with AWS Glue

 ̄綄美尐妖づ 提交于 2020-01-15 10:37:10

问题


I haven't been able to figure this out, but I'm trying to use a direct output committer with AWS Glue:

spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2

Is it possible to use this configuration with AWS Glue?


回答1:


Option 1 :

Glue uses spark context you can set hadoop configuration to aws glue as well. since internally dynamic frame is kind of dataframe.

sc._jsc.hadoopConfiguration().set("mykey","myvalue")

I think you neeed to add the correspodning class also like this

sc._jsc.hadoopConfiguration().set("mapred.output.committer.class", "org.apache.hadoop.mapred.FileOutputCommitter")

example snippet :

 sc = SparkContext()

    sc._jsc.hadoopConfiguration().set("mapreduce.fileoutputcommitter.algorithm.version","2")

    glueContext = GlueContext(sc)
    spark = glueContext.spark_session

To prove that that configuration exists ....

Debug in python :

sc._conf.getAll() // print this

Debug in scala :

sc.getConf.getAll.foreach(println)

Option 2:

Other side you try using job parameters of the glue :

https://docs.aws.amazon.com/glue/latest/dg/add-job.html which has key value properties like mentioned in docs

'--myKey' : 'value-for-myKey'  

you can follow below screen shot for editing job and specifying the parameters with --conf

Option 3:
If you are using, aws cli you can try below... https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

Fun is they mentioned in the docs dont set message like below. but i dont know why it was exposed.

To sum up : I personally prefer option1 since you have programmatic control.




回答2:


Go to glue job console and edit your job as follows :

Glue> Jobs > Edit your Job> Script libraries and job parameters (optional) > Job parameters

Set the following:

key: --conf value:

spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2



来源:https://stackoverflow.com/questions/56432696/use-spark-fileoutputcommitter-algorithm-version-2-with-aws-glue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!