问题
I'me trying to connect RStudio to Hive that has Kerberos authentication. If I run the below in an R script called from the command line, it works.
library("DBI")
library("rJava")
library("RJDBC")
cp = c("/u01/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc.jar"
, "/u01/cloudera/parcels/CDH/lib/hadoop/hadoop-common.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/libthrift-0.9.2.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/hive-service.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/httpclient-4.2.5.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/httpcore-4.2.5.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)
drv <- JDBC("org.apache.hive.jdbc.HiveDriver" , "hive-jdbc.jar" )
conn <- dbConnect(drv , "jdbc:hive2://XXXX:10000/default;principal=hive/XXXX@XXXXX";auth-kerberos)
If I run the exact same script in RStudio, I get an error:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
If I run system('klist') in RStudio, it shows I have a valid ticket. It seems RStudio isn't able to identify the ticket but R is. Any ideas?
回答1:
Some boring stuff first, to put things into context, then the solution.
- Kerberos: it's complicated by nature (think cryptography network), even without considering that Microsoft has its own implementation and extensions
- Java and Kerberos: it's even more complicated (only partial support, subtle changes in Java versions, etc.)
- Hadoop and Java and Kerberos: it's complicated and ugly (read the GitBook "Hadoop and Kerberos, the Madness beyond the Gate" if you really want to lose your sanity) and it's even worse on Windows cf. lack of an official build for the required Hadoop "native libs"
- Hive and JDBC and Kerberos: the good news is that you don't need the Hadoop "ugly" part unless you are using the Apache JDBC driver on Windows (hint: ditch it and opt for the Cloudera JDBC driver!); the bad news is that you may need raw JAAS configuration and specific Java system properties
- R and Java/JDBC: it works quite well, except that sometimes you want to pass specific Java system properties to the JVM -- either at launch time or at run time -- but
.jinit
does not support that AFAIK, you must resort to a workaround
There is one Java system property that must be set for Kerberos auth to work in JDBC, and it's not always set by default.
But you can't set that Java property from R directly; you have to set an environment variable (either before starting R, or from R code but before
.jinit
)
Option 1: from a Linux shell script, before starting R...
export JAVA_TOOL_OPTIONS="-Djavax.security.auth.useSubjectCredsOnly=false"
Option 2: from your R code...
Sys.setenv(JAVA_TOOL_OPTIONS="-Djavax.security.auth.useSubjectCredsOnly=false")
.jinit(...)
Now, that may not be sufficient in all cases. Maybe you need to use a specific Kerberos config because your Hadoop cluster uses its own KDC. Maybe you don't want to use the default Kerberos ticket, but instead authenticate as a service account, using a password stored in a keytab file.
And maybe you need some debugging information because, well, shit happens (and security libraries are quite secretive by default, not to make things too easy for hackers, I suppose...)
Please refer to that post for more information about advanced Java configuration for Hive/Impala JDBC with Kerberos.
And be careful when setting the environment variable: simulate a Java command-line i.e. -Dsome.key=value -Dsome.other.key=blahblah
; in shell script, use quotes (because of the separating space); in R code, use a single string, no array.
来源:https://stackoverflow.com/questions/43778821/issue-connecting-rstudio-but-not-r-to-hive-with-kerberos