How to get the input file name in the mapper in a Hadoop program?

后端 未结 10 2028
粉色の甜心
粉色の甜心 2020-11-29 18:48

How I can get the name of the input file within a mapper? I have multiple input files stored in the input directory, each mapper may read a different file, and I need to kno

10条回答
  •  一整个雨季
    2020-11-29 19:13

    If you are using Hadoop Streaming, you can use the JobConf variables in a streaming job's mapper/reducer.

    As for the input file name of mapper, see the Configured Parameters section, the map.input.file variable (the filename that the map is reading from) is the one can get the jobs done. But note that:

    Note: During the execution of a streaming job, the names of the "mapred" parameters are transformed. The dots ( . ) become underscores ( _ ). For example, mapred.job.id becomes mapred_job_id and mapred.jar becomes mapred_jar. To get the values in a streaming job's mapper/reducer use the parameter names with the underscores.


    For example, if you are using Python, then you can put this line in your mapper file:

    import os
    file_name = os.getenv('map_input_file')
    print file_name
    

提交回复
热议问题