Spark iterate HDFS directory

后端未结

关注

 8  2013

耶瑟儿～ 2020-12-01 01:48

I have a directory of directories on HDFS, and I want to iterate over the directories. Is there any easy way to do this with Spark using the SparkContext object?

8条回答

心在旅途 (楼主)

2020-12-01 02:40

I had some issues with other answers(like 'JavaObject' object is not iterable), but this code works for me

fs = self.spark_contex._jvm.org.apache.hadoop.fs.FileSystem.get(spark_contex._jsc.hadoopConfiguration())
i = fs.listFiles(spark_contex._jvm.org.apache.hadoop.fs.Path(path), False)
while i.hasNext():
  f = i.next()
  print(f.getPath())

0 讨论(0)

查看其它8个回答