Spark iterate HDFS directory

后端 未结 8 2013
耶瑟儿~
耶瑟儿~ 2020-12-01 01:48

I have a directory of directories on HDFS, and I want to iterate over the directories. Is there any easy way to do this with Spark using the SparkContext object?

8条回答
  •  心在旅途
    2020-12-01 02:40

    I had some issues with other answers(like 'JavaObject' object is not iterable), but this code works for me

    fs = self.spark_contex._jvm.org.apache.hadoop.fs.FileSystem.get(spark_contex._jsc.hadoopConfiguration())
    i = fs.listFiles(spark_contex._jvm.org.apache.hadoop.fs.Path(path), False)
    while i.hasNext():
      f = i.next()
      print(f.getPath())
    

提交回复
热议问题