Spark Context Textfile: load multiple files

后端未结

关注

 4  1273

南旧 2020-12-23 17:40

I need to process multiple files scattered across various directories. I would like to load all these up in a single RDD and then perform map/reduce on it. I see that SparkC

4条回答

無奈伤痛 (楼主)

2020-12-23 18:10
How about this phrasing instead?
```
sc.union([sc.textFile(basepath + "/" + f) for f in files])
```
In Scala SparkContext.union() has two variants, one that takes vararg arguments, and one that takes a list. Only the second one exists in Python (since Python does not have polymorphism).

UPDATE

You can use a single textFile call to read multiple files.
```
sc.textFile(','.join(files))
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...