How to list all files in a directory and its subdirectories in hadoop hdfs

后端 未结 9 1053
故里飘歌
故里飘歌 2020-12-01 05:50

I have a folder in hdfs which has two subfolders each one has about 30 subfolders which,finally,each one contains xml files. I want to list all xml files giving only the mai

9条回答
  •  粉色の甜心
    2020-12-01 06:27

    Now, one can use Spark to do the same and its way faster than other approaches (such as Hadoop MR). Here is the code snippet.

    def traverseDirectory(filePath:String,recursiveTraverse:Boolean,filePaths:ListBuffer[String]) {
        val files = FileSystem.get( sparkContext.hadoopConfiguration ).listStatus(new Path(filePath))
                files.foreach { fileStatus => {
                    if(!fileStatus.isDirectory() && fileStatus.getPath().getName().endsWith(".xml")) {                
                        filePaths+=fileStatus.getPath().toString()      
                    }
                    else if(fileStatus.isDirectory()) {
                        traverseDirectory(fileStatus.getPath().toString(), recursiveTraverse, filePaths)
                    }
                }
        }   
    }
    

提交回复
热议问题