How I can get the name of the input file within a mapper? I have multiple input files stored in the input directory, each mapper may read a different file, and I need to kno
If you are using Hadoop Streaming, you can use the JobConf variables in a streaming job's mapper/reducer.
As for the input file name of mapper, see the Configured Parameters section, the map.input.file
variable (the filename that the map is reading from) is the one can get the jobs done. But note that:
Note: During the execution of a streaming job, the names of the "mapred" parameters are transformed. The dots ( . ) become underscores ( _ ). For example, mapred.job.id becomes mapred_job_id and mapred.jar becomes mapred_jar. To get the values in a streaming job's mapper/reducer use the parameter names with the underscores.
For example, if you are using Python, then you can put this line in your mapper file:
import os
file_name = os.getenv('map_input_file')
print file_name