问题
I'm using RJDBC 0.2-5 to connect to Hive in Rstudio. My server has hadoop-2.4.1 and hive-0.14. I follow the below mention steps to connect to Hive.
library(DBI)
library(rJava)
library(RJDBC)
.jinit(parameters="-DrJava.debug=true")
drv <- JDBC("org.apache.hadoop.hive.jdbc.HiveDriver", 
            c("/home/packages/hive/New folder3/commons-logging-1.1.3.jar",
              "/home/packages/hive/New folder3/hive-jdbc-0.14.0.jar",
              "/home/packages/hive/New folder3/hive-metastore-0.14.0.jar",
              "/home/packages/hive/New folder3/hive-service-0.14.0.jar",
              "/home/packages/hive/New folder3/libfb303-0.9.0.jar",
              "/home/packages/hive/New folder3/libthrift-0.9.0.jar",
              "/home/packages/hive/New folder3/log4j-1.2.16.jar",
              "/home/packages/hive/New folder3/slf4j-api-1.7.5.jar",
              "/home/packages/hive/New folder3/slf4j-log4j12-1.7.5.jar",
              "/home/packages/hive/New folder3/hive-common-0.14.0.jar",
            "/home/packages/hive/New folder3/hadoop-core-0.20.2.jar",
            "/home/packages/hive/New folder3/hive-serde-0.14.0.jar",
             "/home/packages/hive/New folder3/hadoop-common-2.4.1.jar"),
            identifier.quote="`")
conHive <- dbConnect(drv, "jdbc:hive://myserver:10000/default",
                  "usr",
                  "pwd")
But I am always getting the following error:
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hive.conf.HiveConf$ConfVars
Even I tried with different version of Hive jar, Hive-jdbc-standalone.jar but nothing seems to work.. I also use RHive to connect to Hive but there was also no success.
Can anyone help me?.. I kind of stuck :(
回答1:
I didn't try rHive because it seems to need a complex installation on all the nodes of the cluster.
I successfully connect to Hive using RJDBC, here are a code snipet that works on my Hadoop 2.6 CDH5.4 cluster :
#loading libraries
library("DBI")
library("rJava")
library("RJDBC")
#init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
cp = c("/usr/lib/hive/lib/hive-jdbc.jar", "/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/libthrift-0.9.2.jar", "/usr/lib/hive/lib/hive-service.jar", "/usr/lib/hive/lib/httpclient-4.2.5.jar", "/usr/lib/hive/lib/httpcore-4.2.5.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
.jinit(classpath=cp)
#initialisation de la connexion
drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc.jar", identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")
#working with the connexion
show_databases <- dbGetQuery(conn, "show databases")
show_databases
The harder is to find all the needs jars and where to find them ...
UPDATE The hive standalone JAR contains all that was needed to use Hive, using this standalone JAR with the hadoop-common jar is enough to use Hive.
So this is a simplified version, no need to worry to other jars that the hadoop-common and the hive-standalone jars.
 #loading libraries
 library("DBI")
 library("rJava")
 library("RJDBC")
 #init of the classpath (works with hadoop 2.6 on CDH 5.4 installation)
 cp = c("/usr/lib/hadoop/client/hadoop-common.jar", "/usr/lib/hive/lib/hive-jdbc-standalone.jar")
 .jinit(classpath=cp)
 #initialisation de la connexion
 drv <- JDBC("org.apache.hive.jdbc.HiveDriver", "/usr/lib/hive/lib/hive-jdbc-standalone.jar", identifier.quote="`")
 conn <- dbConnect(drv, "jdbc:hive2://localhost:10000/mydb", "myuser", "")
 #working with the connexion
 show_databases <- dbGetQuery(conn, "show databases")
 show_databases
回答2:
Ioicmathieu's answer works for me now after I have switched to an older hive jar for example from 3.1.1 to 2.0.0.
Unfortunately I can't comment on his answer that's why I have written another one.
If you run into the following error try an older version:
Error in .jcall(drv@jdrv, "Ljava/sql/Connection;", "connect", as.character(url)[1], : java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://host_name: Could not establish connection to jdbc:hive2://host_name:10000: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=default})
来源:https://stackoverflow.com/questions/33007353/connect-to-remote-hive-server-from-r-using-rjdbc-rhive