hadoop streaming: where are application logs?

问题

My question is similar to : hadoop streaming: how to see application logs? (The link in the answer is not currently working. So I have to post it again with an additional question)

I can see all hadoop logs on my /usr/local/hadoop/logs path

but where can I see application level logs? for example :

reducer.py -

import logging
....
logging.basicConfig(level=logging.ERROR, format='MAP %(asctime)s%(levelname)s%(message)s')
logging.error('Test!')  
...

I am not able to see any of the logs (WARNING,ERROR) in stderr.

Where I can find my log statements of the application? I am using Python and using hadoop-streaming.

Additional question :

If I want to use a file to store/aggregate my application logs like :

reducer.py -

....
logger = logging.getLogger('test')
hdlr = logging.FileHandler(os.environ['HOME']+'/test.log')
formatter = logging.Formatter('MAP %(asctime)s %(levelname)s %(message)s')
hdlr.setFormatter(formatter)
logger.addHandler(hdlr)
logger.setLevel(logging.ERROR)
logger.error('please work!!')
.....

(Assuming that I have test.log in $HOME location of master & all slaves in my hadoop cluster). Can I achieve this in a distributed environment like Hadoop? If so, how can achieve this?

I tried this and ran a sample streaming job, but to only see the below error :

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:330)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:543)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:484)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:397)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

Please help me understand how logging can be achieved in hadoop streaming jobs.

Thank you

回答1:

Try this HDFS path: /yarn/apps/&{user_name}/logs/application_${appid}/

in general:

Where to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories will be below this, in directories named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container.

If you print to stderr you'll find it in files under this dir I mentioned above. There should be one file per one node.

回答2:

You must be aware that Hadoop-streaming uses stdout to pipe data from mappers to reducers. So if your logging system writes in stdout, you will be in trouble, since it will very likely break your logic and your job. One way to log is to write in stderr, thus you will see your logs in errors logs.

来源：https://stackoverflow.com/questions/30586619/hadoop-streaming-where-are-application-logs

标签

python

Hadoop

logging

MapReduce

hadoop-streaming