Spark Scala list folders in directory

前端 未结 9 2439
北恋
北恋 2020-12-05 09:41

I want to list all folders within a hdfs directory using Scala/Spark. In Hadoop I can do this by using the command: hadoop fs -ls hdfs://sandbox.hortonworks.com/demo/<

9条回答
  •  天命终不由人
    2020-12-05 10:15

    I was looking for the same, however instead of HDFS, for S3.

    I solved creating the FileSystem with my S3 path as below:

      def getSubFolders(path: String)(implicit sparkContext: SparkContext): Seq[String] = {
        val hadoopConf = sparkContext.hadoopConfiguration
        val uri = new URI(path)
    
        FileSystem.get(uri, hadoopConf).listStatus(new Path(path)).map {
          _.getPath.toString
        }
      }
    

    I know this question was related for HDFS, but maybe others like me will come here looking for S3 solution. Since without specifying the URI in FileSystem, it will look for HDFS ones.

    java.lang.IllegalArgumentException: Wrong FS: s3:///dummy_path
    expected: hdfs://.eu-west-1.compute.internal:8020
    

提交回复
热议问题