问题
I am trying to submit an Oozie job using Java Client API from another Job's java action. The cluster is using Kerberos.
Here is my code:
// get a OozieClient for local Oozie
String oozieUrl = "http://hadooputl02.northamerica.xyz.net:11000/oozie/";
AuthOozieClient wc = new AuthOozieClient(oozieUrl);
wc.setDebugMode(1);
// create a workflow job configuration and set the workflow application path
Properties conf = wc.createConfiguration();
conf.setProperty(OozieClient.APP_PATH, wfAppPath);
conf.setProperty("jobTracker", "yarnRM");
conf.setProperty("nameNode", "hdfs://ingestiondev");
// submit and start the workflow job
String jobId = wc.run(conf);
System.out.println("Workflow job submitted");
But I am getting the following error:
org.apache.oozie.action.hadoop.JavaMainException: IO_ERROR :
java.io.IOException: Error while connecting Oozie server. No of retries = 1. Exception = Could not authenticate, GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
...
Caused by: AUTHENTICATION : Could not authenticate, GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
...
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
...
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
I believe there is something more required in the code to give the node/user access to the oozie server through kerberos.
Can someone point me to the correct way to use Oozie Java API on a Kerberized cluster?
thanks!
回答1:
The error message is explicit: Failed to find any Kerberos tgt
. Your job runs in a YARN container, on a random node, and has no Kerberos ticket available there.
Did you ever wonder how Oozie could start a job with your Kerberos credentials, even though it does not know your password? That's because it uses a backdoor built inside Hadoop. But then your job has no proper Kerberos credentials, hence the message you see when you try to do something not covered.
How Oozie manages authentication without credentials
- you connect to an Edge Node, create a Kerberos ticket with
kinit
, run an Oozie command line to submit a Coordinator (which will fire a Workflow at specific dates and times) - the Oozie CLI authenticates against the Oozie server with the local Kerberos ticket, so the Coordinator (and Workflow) "belong to you"
- when the Coordinator triggers the Workflow, and the Workflow starts an Action, and the Action starts a YARN job... it's the Oozie server that authenticates against YARN ResourceManager (typically as
oozie
) -- your Kerberos ticket has probably expired long ago - but since
oozie
is defined as a priviledged proxy account in YARN config, then the RM accepts to start the job under your account, even though you did not properly authenticate via Kerberos - how is it possible?? because internally YARN and HDFS use a delegation token -- usually, you authenticate once with Kerberos, then you get a token, and you are good for all core services on all nodes; with Oozie in the mix, you don't even have to authenticate...
But there's a catch: the delegation token does not work for any service that uses pure Kerberos authentication -- i.e. Hive Metastore, Hive JDBC, HBase, ZooKeeper, Oozie, etc.
That's why Oozie has a workaround: explicit <credential>
requests for Hive actions, Hive2 actions, HBase actions, etc. [disclaimer: I don't really know how it actually works]
I doubt that any of these "credentials" would work against Oozie itself...!
How you can manage your own custom authentication
- build a
keytab
file with your password inside (cf. Linux commandktutil
) - upload that file to HDFS with restricted access -- because anyone who can get access to that file could then login as you!!!
- tell Oozie to download the file in the container that runs your Java action, with
<file>
-- it will be available in the Current Working Dir so you won't have to care about the actual path - create a JAAS config file that explains to Java that "whenever the Oozie REST server requests authentication via SPNEGO, create a Kerberos ticket on-the-fly using this principal, whose password is in that keytab file" (instead of the default which is "look for the ticket cache and get an existing ticket there")
- upload that JAAS config file to HDFS, use another
<file>
etc. - activate that JAAS config with a Java system property
You will find more details in that post of mine: Error when connect to impala with JDBC under kerberos authrication
Disclaimer: I don't know which JAAS "subject" is expected by Oozie (for instance, ZooKeeper expects Client
, Hive expects com.sun.security.jgss.krb5.initiate
)
Alternative: forget about JAAS and use the cache.
- set env variable
KRB5CCNAME
to a temp file in the CWD of the container (which will be destroyed automatically when the job stops) - spawn a Linux command
kinit -kt myname.keytab myname@REALM
which will obtain a Kerberos ticket in the cache defined byKRB5CCNAME
- and let JAAS follow the default process
回答2:
It's seems that Oozie does not support use keytab to auth itself. So you must generate kerberos ticket cache outside your program. For example, you can execute the following command before you execute you program:
kinit -kt kerberos.keytab examplePrinciple/domain@example.com
来源:https://stackoverflow.com/questions/43938580/submit-oozie-job-from-another-jobs-java-action-with-kerberos