Submit Oozie Job from another job's java action with Kerberos

允我心安 提交于 2020-07-20 03:56:04

问题


I am trying to submit an Oozie job using Java Client API from another Job's java action. The cluster is using Kerberos.

Here is my code:

// get a OozieClient for local Oozie
    String oozieUrl = "http://hadooputl02.northamerica.xyz.net:11000/oozie/";
    AuthOozieClient wc = new AuthOozieClient(oozieUrl);

    wc.setDebugMode(1);
// create a workflow job configuration and set the workflow application path
    Properties conf = wc.createConfiguration();
    conf.setProperty(OozieClient.APP_PATH, wfAppPath);
    conf.setProperty("jobTracker", "yarnRM");
    conf.setProperty("nameNode", "hdfs://ingestiondev");

// submit and start the workflow job
    String jobId = wc.run(conf);
    System.out.println("Workflow job submitted");

But I am getting the following error:

 org.apache.oozie.action.hadoop.JavaMainException: IO_ERROR : 
java.io.IOException: Error while connecting Oozie server. No of retries = 1. Exception = Could not authenticate, GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
      ...
 Caused by: AUTHENTICATION : Could not authenticate, GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
      ...
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
      ...
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

I believe there is something more required in the code to give the node/user access to the oozie server through kerberos.

Can someone point me to the correct way to use Oozie Java API on a Kerberized cluster?

thanks!


回答1:


The error message is explicit: Failed to find any Kerberos tgt. Your job runs in a YARN container, on a random node, and has no Kerberos ticket available there.

Did you ever wonder how Oozie could start a job with your Kerberos credentials, even though it does not know your password? That's because it uses a backdoor built inside Hadoop. But then your job has no proper Kerberos credentials, hence the message you see when you try to do something not covered.


How Oozie manages authentication without credentials
  • you connect to an Edge Node, create a Kerberos ticket with kinit, run an Oozie command line to submit a Coordinator (which will fire a Workflow at specific dates and times)
  • the Oozie CLI authenticates against the Oozie server with the local Kerberos ticket, so the Coordinator (and Workflow) "belong to you"
  • when the Coordinator triggers the Workflow, and the Workflow starts an Action, and the Action starts a YARN job... it's the Oozie server that authenticates against YARN ResourceManager (typically as oozie) -- your Kerberos ticket has probably expired long ago
  • but since oozie is defined as a priviledged proxy account in YARN config, then the RM accepts to start the job under your account, even though you did not properly authenticate via Kerberos
  • how is it possible?? because internally YARN and HDFS use a delegation token -- usually, you authenticate once with Kerberos, then you get a token, and you are good for all core services on all nodes; with Oozie in the mix, you don't even have to authenticate...

But there's a catch: the delegation token does not work for any service that uses pure Kerberos authentication -- i.e. Hive Metastore, Hive JDBC, HBase, ZooKeeper, Oozie, etc.
That's why Oozie has a workaround: explicit <credential> requests for Hive actions, Hive2 actions, HBase actions, etc. [disclaimer: I don't really know how it actually works]

I doubt that any of these "credentials" would work against Oozie itself...!


How you can manage your own custom authentication
  1. build a keytab file with your password inside (cf. Linux command ktutil)
  2. upload that file to HDFS with restricted access -- because anyone who can get access to that file could then login as you!!!
  3. tell Oozie to download the file in the container that runs your Java action, with <file> -- it will be available in the Current Working Dir so you won't have to care about the actual path
  4. create a JAAS config file that explains to Java that "whenever the Oozie REST server requests authentication via SPNEGO, create a Kerberos ticket on-the-fly using this principal, whose password is in that keytab file" (instead of the default which is "look for the ticket cache and get an existing ticket there")
  5. upload that JAAS config file to HDFS, use another <file> etc.
  6. activate that JAAS config with a Java system property

You will find more details in that post of mine: Error when connect to impala with JDBC under kerberos authrication

Disclaimer: I don't know which JAAS "subject" is expected by Oozie (for instance, ZooKeeper expects Client, Hive expects com.sun.security.jgss.krb5.initiate)


Alternative: forget about JAAS and use the cache.
  • set env variable KRB5CCNAME to a temp file in the CWD of the container (which will be destroyed automatically when the job stops)
  • spawn a Linux command kinit -kt myname.keytab myname@REALM which will obtain a Kerberos ticket in the cache defined by KRB5CCNAME
  • and let JAAS follow the default process



回答2:


It's seems that Oozie does not support use keytab to auth itself. So you must generate kerberos ticket cache outside your program. For example, you can execute the following command before you execute you program:

kinit -kt kerberos.keytab examplePrinciple/domain@example.com


来源:https://stackoverflow.com/questions/43938580/submit-oozie-job-from-another-jobs-java-action-with-kerberos

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!