Exporting jobs listed in Oozie Web Console

こ雲淡風輕ζ 提交于 2019-12-08 13:31:19

问题


Apologies if this question sounds basic, I'm totally new to Hadoop environment.

What am I looking for?

In my case, there are jobs scheduled to run everday and I would want to export the list of failed jobs in an excel sheet each day.

How do I view the workflow jobs?

Currently I use the Oozie web console to view the jobs and I don't have/see an option to export. Also, I was not able to find this information from the Oozie documentation.

However, I found that jobs can be listed using commands like

$ oozie jobs -oozie http://localhost:8080/oozie -localtime -len 2 -fliter status=RUNNING

Where am I stuck?

I want to filter the failed jobs for a given date and would want to export it as csv/excel data.


回答1:


@YoungHobbit was right to point at that post which is very similar to this one; his answer was dead on target when it comes to extracting the entire list of jobs that have run on a specific day with the Oozie CLI (command-line interface).
Just don't forget to specify an "unbounded" reply e.g. -len 999999999 to avoid side effects (defaut is to show only the first 100 matches, which may be way too low if you run a lot of frequent jobs).

The trick is that you can make a more complex filter such as
  "startCreatedTime=2016-06-28T00:00Z;endcreatedtime=2016-06-28T10:00Z;status=FAILED"
... but you cannot request jobs that have FAILED or have been KILLED or have been SUSPENDED (which may result from a temporary YARN or HDFS outage) or are still suspiciously RUNNING (because a sub-workflow is SUSPENDED for instance).
So your best choice is to get the whole list, then filter out all jobs that have SUCCEEDED, with a plain old grep -- as suggested in another answer.

Then you will also need a complex sed or awk script to break down the ugly CLI output into a well-formed CSV. Ouch!


Now, you have an alternative to the Oozie CLI: the Oozie REST API (old Cloudera tutorial here, reference for Oozie V4.2 here) lets you query the Oozie server with any programming language that provides...
  • an HTTP client
  • and a way to parse JSON messages (using plain old regular expressions, if nothing else is available)

The logic would be basically the same -- fetch the list of all jobs in the desired time window, ignore SUCCEEDED jobs, parse the others to generate a CSV record, dump into a CSV file.
But your program would be more robust, since it would be based on structured JSON input.

One more thing: if you are familiar with Microsoft VBA, you can even use an Excel macro to build the report dynamically, in a self-service way. No need to bother with in intermediate CSV file.



来源:https://stackoverflow.com/questions/38503520/exporting-jobs-listed-in-oozie-web-console

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!