How to list all files in a directory and its subdirectories in hadoop hdfs

后端 未结 9 1069
故里飘歌
故里飘歌 2020-12-01 05:50

I have a folder in hdfs which has two subfolders each one has about 30 subfolders which,finally,each one contains xml files. I want to list all xml files giving only the mai

9条回答
  •  时光说笑
    2020-12-01 06:18

    If you are using hadoop 2.* API there are more elegant solutions:

        Configuration conf = getConf();
        Job job = Job.getInstance(conf);
        FileSystem fs = FileSystem.get(conf);
    
        //the second boolean parameter here sets the recursion to true
        RemoteIterator fileStatusListIterator = fs.listFiles(
                new Path("path/to/lib"), true);
        while(fileStatusListIterator.hasNext()){
            LocatedFileStatus fileStatus = fileStatusListIterator.next();
            //do stuff with the file like ...
            job.addFileToClassPath(fileStatus.getPath());
        }
    

提交回复
热议问题