问题
I'm new to Hadoop and Java for that matter. I'm trying to count the number of files in a folder on HDFS from the MapReduce driver I'm writing. I'd like to do this without calling the HDFS Shell as I want to be able to pass in the directory I use when I run the MapReduce job. I've tried a number of methods but have had no success in implementation due to my inexperience with Java.
Any help would be greatly appreciated.
Thanks,
Nomad.
回答1:
You can just use the FileSystem and iterate over the files inside the path. Here is some example code
int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
count++;
ri.next();
}
来源:https://stackoverflow.com/questions/16344441/how-do-i-count-the-number-of-files-in-hdfs-from-an-mr-job