Why do my application level logs disappear when executed in oozie?

北城以北 提交于 2019-11-28 14:27:46

Oozie runs each Action in a different "launcher" job -- actually a YARN job with a single mapper (see exceptions below).

Whenever you see an "external ID" in the form job_000000000_0000 then you can reach the YARN logs for application_000000_0000 (yeah, "job" is the legacy naming convention from Hadoop 1, still used by JobHistory service, but YARN has another naming convention).

Your application output is actually dumped into the YARN logs for that Oozie "launcher"

  • your StdErr is dumped as-is and can be retrieved in the "stderr" section
  • your StdOut is dumped with a prefix on each line (that prefix is used by Oozie to manage its <capture_output/> trick for Shell and Pig actions) at the end of the atrocely verbose "stdout" section
  • and nothing gets into the "syslog" section AFAIK

Bottom line:

  1. run oozie job -info ****** to get the list of Actions and the corresponding "external IDs" for your Oozie workflow execution
  2. for each job_*****_** legacy ID, run yarn logs -applicationId application_*****_** | more to skim the global YARN logs, then zoom on your specific app logs
  3. now you can try to automate that thing... have fun           B-)


Exceptions to the "launcher" Oozie job principle -- the E-mail Action / Filesystem Action are just API calls executed directly from the Oozie server process; and the MapReduce Action spawns a regular YARN job with multiple Mappers and Reducers.
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!