Spark Scala list folders in directory

前端 未结 9 2459
北恋
北恋 2020-12-05 09:41

I want to list all folders within a hdfs directory using Scala/Spark. In Hadoop I can do this by using the command: hadoop fs -ls hdfs://sandbox.hortonworks.com/demo/<

9条回答
  •  南笙
    南笙 (楼主)
    2020-12-05 10:21

    in Ajay Ahujas answer isDir is deprecated..

    use isDirectory... pls see complete example and output below.

    package examples
    
        import org.apache.log4j.Level
        import org.apache.spark.sql.SparkSession
    
        object ListHDFSDirectories  extends  App{
          val logger = org.apache.log4j.Logger.getLogger("org")
          logger.setLevel(Level.WARN)
          val spark = SparkSession.builder()
            .appName(this.getClass.getName)
            .config("spark.master", "local[*]").getOrCreate()
    
          val hdfspath = "." // your path here
          import org.apache.hadoop.fs.{FileSystem, Path}
          val fs = org.apache.hadoop.fs.FileSystem.get(spark.sparkContext.hadoopConfiguration)
          fs.listStatus(new Path(s"${hdfspath}")).filter(_.isDirectory).map(_.getPath).foreach(println)
        }
    

    Result :

    file:/Users/user/codebase/myproject/target
    file:/Users/user/codebase/myproject/Rel
    file:/Users/user/codebase/myproject/spark-warehouse
    file:/Users/user/codebase/myproject/metastore_db
    file:/Users/user/codebase/myproject/.idea
    file:/Users/user/codebase/myproject/src
    

提交回复
热议问题