Hadoop : Provide directory as input to MapReduce job

前端 未结 4 2059
自闭症患者
自闭症患者 2020-12-16 04:11

I\'m using Cloudera Hadoop. I\'m able to run simple mapreduce program where I provide a file as input to MapReduce program.

This file contains all the other files to

4条回答
  •  南方客
    南方客 (楼主)
    2020-12-16 04:26

    you could use FileSystem.listStatus to get the file list from given dir, the code could be as below:

    //get the FileSystem, you will need to initialize it properly
    FileSystem fs= FileSystem.get(conf); 
    //get the FileStatus list from given dir
    FileStatus[] status_list = fs.listStatus(new Path(args[0]));
    if(status_list != null){
        for(FileStatus status : status_list){
            //add each file to the list of inputs for the map-reduce job
            FileInputFormat.addInputPath(conf, status.getPath());
        }
    }
    

提交回复
热议问题