Copying files from a hdfs directory to another with oozie distcp-action

后端 未结 1 1213
温柔的废话
温柔的废话 2020-12-11 13:23

My actions

start_fair_usage ends with status okey, but test_copy returns

Main class [org.apache.oozie.action.hadoop.DistcpM         


        
相关标签:
1条回答
  • 2020-12-11 14:18

    Here is what I did in the end.

      <start to="start_copy"/>
    
      <fork name="start_copy">
        <path start="copy_mta"/>
        <path start="copy_rcr"/>
        <path start="copy_sub"/>
      </fork>
    
      <action name="copy_mta">
        <distcp xmlns="uri:oozie:distcp-action:0.2">
          <prepare>
            <delete path="${NAME_NODE}${dstFolder}mta/*"/>
          </prepare>
          <arg>${NAME_NODE}${srcFolder}/*mta.gz</arg>
          <arg>${NAME_NODE}${dstFolder}mta/</arg>
        </distcp>
        <ok to="end_copy"/>
        <error to="KILL"/>
      </action>
    
      <action name="copy_rcr">
        <distcp xmlns="uri:oozie:distcp-action:0.2">
          <prepare>
            <delete path="${NAME_NODE}${dstFolder}rcr/*"/>
          </prepare>
          <arg>${NAME_NODE}${srcFolder}/*rcr.gz</arg>
          <arg>${NAME_NODE}${dstFolder}rcr/</arg>
        </distcp>
        <ok to="end_copy"/>
        <error to="KILL"/>
      </action>
    
      <action name="copy_sub">
        <distcp xmlns="uri:oozie:distcp-action:0.2">
          <prepare>
            <delete path="${NAME_NODE}${dstFolder}sub/*"/>
          </prepare>
          <arg>${NAME_NODE}${srcFolder}/*sub.gz</arg>
          <arg>${NAME_NODE}${dstFolder}sub/</arg>
        </distcp>
        <ok to="end_copy"/>
        <error to="KILL"/>
      </action>
    
      <join name="end_copy" to="END"/>
    
      <kill name="KILL">
        <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
      </kill>
      <end name="END"/>
    

    Turned out it was possible to use wildcards in distcp, so I didn't need bash at all.

    Also. Some people adviced me to write it in scala.

    import org.apache.hadoop.conf.Configuration
    import org.apache.hadoop.fs.{FileSystem, Path, FileUtil}
    
    val conf = new Configuration()
    val fs = FileSystem.get(conf)
    
    val listOfFileTypes = List("mta", "rcr", "sub")
    val listOfPlatforms = List("B", "C", "H", "M", "Y")
    
    for(fileType <- listOfFileTypes){
      FileUtil.fullyDeleteContents(new File("/apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin/file_" + fileType))
      for (platform <- listOfPlatforms) {
        var srcPaths = fs.globStatus(new Path("/user/comverse/data/" + "20170404" + "_" + platform + "/*" + fileType + ".gz"))
        var dstPath = new Path("/apps/hive/warehouse/arstel.db/fair_usage/fct_evkuzmin/file_" + fileType)
    
        for(srcPath <- srcPaths){
          println("copying " + srcPath.getPath.toString)
          FileUtil.copy(fs, srcPath.getPath, fs, dstPath, false, conf)
        }
      }
    }
    

    Both things work, thought I haven't tried to run the scala script in Oozie.

    0 讨论(0)
提交回复
热议问题