How to get the input file name in the mapper in a Hadoop program?

后端未结

关注

 10  2028

粉色の甜心 2020-11-29 18:48

How I can get the name of the input file within a mapper? I have multiple input files stored in the input directory, each mapper may read a different file, and I need to kno

10条回答

一整个雨季 (楼主)

2020-11-29 19:13
If you are using Hadoop Streaming, you can use the JobConf variables in a streaming job's mapper/reducer.

As for the input file name of mapper, see the Configured Parameters section, the map.input.file variable (the filename that the map is reading from) is the one can get the jobs done. But note that:

Note: During the execution of a streaming job, the names of the "mapred" parameters are transformed. The dots ( . ) become underscores ( _ ). For example, mapred.job.id becomes mapred_job_id and mapred.jar becomes mapred_jar. To get the values in a streaming job's mapper/reducer use the parameter names with the underscores.

For example, if you are using Python, then you can put this line in your mapper file:
```
import os
file_name = os.getenv('map_input_file')
print file_name
```
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...