Read all files in a nested folder in Spark

前端 未结 4 1431
无人共我
无人共我 2020-12-30 03:15

If we have a folder folder having all .txt files, we can read them all using sc.textFile(\"folder/*.txt\"). But what if I have a folde

4条回答
  •  轮回少年
    2020-12-30 03:48

    sc.wholeTextFiles("/directory/201910*/part-*.lzo") get all match files name, not files content.

    if you want to load the contents of all matched files in a directory, you should use

    sc.textFile("/directory/201910*/part-*.lzo")
    

    and setting reading directory recursive!

    sc._jsc.hadoopConfiguration().set("mapreduce.input.fileinputformat.input.dir.recursive", "true")
    

    TIPS: scala differ with python, below set use to scala!

    sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive", "true")
    

提交回复
热议问题