Get input file name in streaming hadoop program

后端 未结 3 1149
面向向阳花
面向向阳花 2020-12-28 11:59

I am able to find the name if the input file in a mapper class using FileSplit when writing the program in Java.

Is there a corresponding way to do this when I write

3条回答
  •  南方客
    南方客 (楼主)
    2020-12-28 12:10

    According to the "Hadoop : The Definitive Guide"

    Hadoop sets job configuration parameters as environment variables for Streaming programs. However, it replaces non-alphanumeric character with underscores to make sure they are valid names. The following Python expression illustrates how you can retrieve the value of the mapred.job.id property from within a Python Streaming script:

    os.environ["mapred_job_id"]

    You can also set environment variables for the Streaming process launched by MapReduce by applying the -cmdenv option to the Streaming launcher program (once for each variable you wish to set). For example, the following sets the MAGIC_PARAMETER environment variable:

    -cmdenv MAGIC_PARAMETER=abracadabra

提交回复
热议问题