How do I count the number of files in HDFS from an MR job?

纵饮孤独 提交于 2019-12-08 08:05:54

问题


I'm new to Hadoop and Java for that matter. I'm trying to count the number of files in a folder on HDFS from the MapReduce driver I'm writing. I'd like to do this without calling the HDFS Shell as I want to be able to pass in the directory I use when I run the MapReduce job. I've tried a number of methods but have had no success in implementation due to my inexperience with Java.

Any help would be greatly appreciated.

Thanks,

Nomad.


回答1:


You can just use the FileSystem and iterate over the files inside the path. Here is some example code

int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
    count++;
    ri.next();
}


来源:https://stackoverflow.com/questions/16344441/how-do-i-count-the-number-of-files-in-hdfs-from-an-mr-job

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!