How to list all files in a directory and its subdirectories in hadoop hdfs

后端未结

关注

 9  1053

故里飘歌 2020-12-01 05:50

I have a folder in hdfs which has two subfolders each one has about 30 subfolders which,finally,each one contains xml files. I want to list all xml files giving only the mai

9条回答

粉色の甜心 (楼主)

2020-12-01 06:27

Now, one can use Spark to do the same and its way faster than other approaches (such as Hadoop MR). Here is the code snippet.

def traverseDirectory(filePath:String,recursiveTraverse:Boolean,filePaths:ListBuffer[String]) {
    val files = FileSystem.get( sparkContext.hadoopConfiguration ).listStatus(new Path(filePath))
            files.foreach { fileStatus => {
                if(!fileStatus.isDirectory() && fileStatus.getPath().getName().endsWith(".xml")) {                
                    filePaths+=fileStatus.getPath().toString()      
                }
                else if(fileStatus.isDirectory()) {
                    traverseDirectory(fileStatus.getPath().toString(), recursiveTraverse, filePaths)
                }
            }
    }   
}

0 讨论(0)

查看其它9个回答