How to get the input file name in the mapper in a Hadoop program?

后端 未结 10 2021
粉色の甜心
粉色の甜心 2020-11-29 18:48

How I can get the name of the input file within a mapper? I have multiple input files stored in the input directory, each mapper may read a different file, and I need to kno

相关标签:
10条回答
  • 2020-11-29 19:20

    This helped me:

    String fileName = ((org.apache.hadoop.mapreduce.lib.input.FileSplit) context.getInputSplit()).getPath().getName();
    
    0 讨论(0)
  • 2020-11-29 19:24

    Use this inside your mapper :

    FileSplit fileSplit = (FileSplit)context.getInputSplit();
    String filename = fileSplit.getPath().getName();
    

    Edit :

    Try this if you want to do it inside configure() through the old API :

    String fileName = new String();
    public void configure(JobConf job)
    {
       filename = job.get("map.input.file");
    }
    
    0 讨论(0)
  • 2020-11-29 19:24

    You have to first convert in to InputSplit by typecasting and then you need to type cast to FileSplit.

    Example:

    InputSplit inputSplit= (InputSplit)context.getInputSplit();
    Path filePath = ((FileSplit) inputSplit).getPath();
    String filePathString = ((FileSplit) context.getInputSplit()).getPath().toString()
    
    0 讨论(0)
  • 2020-11-29 19:31

    First you need to get the input split, using the newer mapreduce API it would be done as follows:

    context.getInputSplit();
    

    But in order to get the file path and the file name you will need to first typecast the result into FileSplit.

    So, in order to get the input file path you may do the following:

    Path filePath = ((FileSplit) context.getInputSplit()).getPath();
    String filePathString = ((FileSplit) context.getInputSplit()).getPath().toString();
    

    Similarly, to get the file name, you may just call upon getName(), like this:

    String fileName = ((FileSplit) context.getInputSplit()).getPath().getName();
    
    0 讨论(0)
提交回复
热议问题