How to set hadoop configuration values from pyspark

前端 未结 3 1218
生来不讨喜
生来不讨喜 2020-12-08 06:58

The Scala version of SparkContext has the property

sc.hadoopConfiguration

I have successfully used that to set Hadoop properties (in Scala)

3条回答
  •  悲哀的现实
    2020-12-08 07:30

    I looked into the PySpark source code (context.py) and there is not a direct equivalent. Instead some specific methods support sending in a map of (key,value) pairs:

    fileLines = sc.newAPIHadoopFile('dev/*', 
    'org.apache.hadoop.mapreduce.lib.input.TextInputFormat',
    'org.apache.hadoop.io.LongWritable',
    'org.apache.hadoop.io.Text',
    conf={'mapreduce.input.fileinputformat.input.dir.recursive':'true'}
    ).count()
    

提交回复
热议问题