Issue connecting RStudio (but not R) to Hive with Kerberos

冷暖自知 提交于 2019-12-11 14:42:33

问题


I'me trying to connect RStudio to Hive that has Kerberos authentication. If I run the below in an R script called from the command line, it works.

library("DBI")
library("rJava")
library("RJDBC")

cp = c("/u01/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc.jar"
, "/u01/cloudera/parcels/CDH/lib/hadoop/hadoop-common.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/libthrift-0.9.2.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/hive-service.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/httpclient-4.2.5.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/httpcore-4.2.5.jar"
, "/u01/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)

drv <- JDBC("org.apache.hive.jdbc.HiveDriver" , "hive-jdbc.jar" )

conn <- dbConnect(drv , "jdbc:hive2://XXXX:10000/default;principal=hive/XXXX@XXXXX";auth-kerberos)

If I run the exact same script in RStudio, I get an error:

javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

If I run system('klist') in RStudio, it shows I have a valid ticket. It seems RStudio isn't able to identify the ticket but R is. Any ideas?


回答1:


Some boring stuff first, to put things into context, then the solution.

  • Kerberos: it's complicated by nature (think cryptography network), even without considering that Microsoft has its own implementation and extensions
  • Java and Kerberos: it's even more complicated (only partial support, subtle changes in Java versions, etc.)
  • Hadoop and Java and Kerberos: it's complicated and ugly (read the GitBook "Hadoop and Kerberos, the Madness beyond the Gate" if you really want to lose your sanity) and it's even worse on Windows cf. lack of an official build for the required Hadoop "native libs"
  • Hive and JDBC and Kerberos: the good news is that you don't need the Hadoop "ugly" part unless you are using the Apache JDBC driver on Windows (hint: ditch it and opt for the Cloudera JDBC driver!); the bad news is that you may need raw JAAS configuration and specific Java system properties
  • R and Java/JDBC: it works quite well, except that sometimes you want to pass specific Java system properties to the JVM -- either at launch time or at run time -- but .jinit does not support that AFAIK, you must resort to a workaround


There is one Java system property that must be set for Kerberos auth to work in JDBC, and it's not always set by default.
But you can't set that Java property from R directly; you have to set an environment variable (either before starting R, or from R code but before .jinit)

Option 1: from a Linux shell script, before starting R...

export JAVA_TOOL_OPTIONS="-Djavax.security.auth.useSubjectCredsOnly‌​=false"

Option 2: from your R code...

Sys.setenv(JAVA_TOOL_OPTIONS="-Djavax.security.auth.useSubjectCredsOnly‌​=false")
.jinit(...)


Now, that may not be sufficient in all cases. Maybe you need to use a specific Kerberos config because your Hadoop cluster uses its own KDC. Maybe you don't want to use the default Kerberos ticket, but instead authenticate as a service account, using a password stored in a keytab file.
And maybe you need some debugging information because, well, shit happens (and security libraries are quite secretive by default, not to make things too easy for hackers, I suppose...)

Please refer to that post for more information about advanced Java configuration for Hive/Impala JDBC with Kerberos.

And be careful when setting the environment variable: simulate a Java command-line i.e. -Dsome.key=value -Dsome.other.key=blahblah; in shell script, use quotes (because of the separating space); in R code, use a single string, no array.



来源:https://stackoverflow.com/questions/43778821/issue-connecting-rstudio-but-not-r-to-hive-with-kerberos

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!