发表新帖

发表新帖

How to set hadoop configuration values from pyspark

前端未结

关注

 3  1218

生来不讨喜 2020-12-08 06:58

The Scala version of SparkContext has the property

sc.hadoopConfiguration

I have successfully used that to set Hadoop properties (in Scala)

3条回答

悲哀的现实 (楼主)

2020-12-08 07:30
I looked into the PySpark source code (context.py) and there is not a direct equivalent. Instead some specific methods support sending in a map of (key,value) pairs:
```
fileLines = sc.newAPIHadoopFile('dev/*', 
'org.apache.hadoop.mapreduce.lib.input.TextInputFormat',
'org.apache.hadoop.io.LongWritable',
'org.apache.hadoop.io.Text',
conf={'mapreduce.input.fileinputformat.input.dir.recursive':'true'}
).count()
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题