If we have a folder folder
having all .txt
files, we can read them all using sc.textFile(\"folder/*.txt\")
. But what if I have a folde
sc.wholeTextFiles("/directory/201910*/part-*.lzo")
get all match files name, not files content.
if you want to load the contents of all matched files in a directory, you should use
sc.textFile("/directory/201910*/part-*.lzo")
and setting reading directory recursive!
sc._jsc.hadoopConfiguration().set("mapreduce.input.fileinputformat.input.dir.recursive", "true")
TIPS: scala differ with python, below set use to scala!
sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive", "true")