How to configure Oozie coordinator dataset for previous day

落花浮王杯 提交于 2021-02-10 05:07:04

问题


I want to run workflow based on availability of Control files for previous date. Date format in my directory is ${basePath}/YYYYMMdd/00/_Complete.I want to check the _Complete file inside my 00. My Job will run daily on the previous day data. I tried the options provided in similar questions But still not working. When I am testing it for same day data with below value for instance , it is working But not with (-1) option. Is there any restriction on URI-TEMPLATE formats, meaning do we need to have it in fixed format path/${YEAR}${$MONTH}${DAY}/Complete Please help.

<instance>${coord:current(0)}</instance>

Here is the dryrun output for my Coordinator job.

    ***coordJob after parsing: ***
<coordinator-app xmlns="uri:oozie:coordinator:0.1" name="my_Scheduler_5f" frequency="1" start="2016-08-17T23:40Z" end="2016-08-19T23:45Z" timezone="America/Los_Angeles" freq_timeunit="DAY" end_of_duration="NONE">
  <controls>
    <timeout>30</timeout>
  </controls>
  <input-events>
    <data-in name="coordInput_1" dataset="input1">
      <dataset name="input1" frequency="1" initial-instance="2016-08-17T00:00Z" timezone="America/Los_Angeles" freq_timeunit="DAY" end_of_duration="NONE">
        <uri-template>${nameNode}/myHdfsPath/Finalpath1/${YEAR}${MONTH}${DAY}/00/</uri-template>
        <done-flag>_Complete</done-flag>
      </dataset>
      <instance>${coord:current(-1)}</instance>
    </data-in>
    <data-in name="coordInput_2" dataset="input2">
      <dataset name="input2" frequency="1" initial-instance="2016-08-17T23:00Z" timezone="America/Los_Angeles" freq_timeunit="DAY" end_of_duration="NONE">
        <uri-template>${nameNode}/myHdfsPath/Finalpath2/${YEAR}${MONTH}${DAY}/00/</uri-template>
        <done-flag>_Complete</done-flag>
      </dataset>
      <instance>${coord:current(-1)}</instance>
    </data-in>
  </input-events>
  <action>
    <workflow>
      <app-path>${nameNode}/myHdfsPath/My_POC/wf-app-dir</app-path>
      <configuration>
        <property>
          <name>date</name>
          <value>${coord:formatTime(coord:dateOffset(coord:actualTime(),-1,'DAY'), "yyyyMMdd")}</value>
        </property>
    </workflow>
  </action>
</coordinator-app>
***actions for instance***

回答1:


I was able to get my job to look for the right _Complete flag using separate <datasets> and <input-events>.

<datasets>
  <dataset name="input1" frequency="1" initial-instance="2016-08-17T00:00Z" timezone="America/Los_Angeles" freq_timeunit="DAY" end_of_duration="NONE">
    <uri-template>${nameNode}/myHdfsPath/Finalpath1/${YEAR}${MONTH}${DAY}/00/</uri-template>
    <done-flag>_Complete</done-flag>
  </dataset>
  ... input2 ...
</datasets>

<input-events>
  <data-in name="coordInput_1" dataset="input1">
    <instance>${coord:current(-1)}</instance>
  </data-in>
  ... coordInput_2 ...
</input-events>

current(-1) is the part which specifies yesterday (for a daily dataset). In my case, the problem was that I'd copied an example with current(0).



来源:https://stackoverflow.com/questions/39008770/how-to-configure-oozie-coordinator-dataset-for-previous-day

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!