Spark Context Textfile: load multiple files

后端 未结 4 1270
南旧
南旧 2020-12-23 17:40

I need to process multiple files scattered across various directories. I would like to load all these up in a single RDD and then perform map/reduce on it. I see that SparkC

4条回答
  •  無奈伤痛
    2020-12-23 18:10

    How about this phrasing instead?

    sc.union([sc.textFile(basepath + "/" + f) for f in files])
    

    In Scala SparkContext.union() has two variants, one that takes vararg arguments, and one that takes a list. Only the second one exists in Python (since Python does not have polymorphism).

    UPDATE

    You can use a single textFile call to read multiple files.

    sc.textFile(','.join(files))
    

提交回复
热议问题