问题
I know of course about reproducible example and piece of code but for this question I have to be (I can't be otherwise) obscure.
I am trying to connect R and Impala. Putting aside the problems ("officially", I cannot install software on this PC... but I have used portable versions of R and RStudio)
I've tried the RImpala package.
rimpala.connect(IP = myip,
port = the port where Impala sees,
principal = maybe this is not clear)
I am pretty sure that the causes of my problems is the principal
argument, the documentation is not clear to me. Anyway, I've tried several combinations of what the documentations says it should be placed there.
In any case I get the same error:
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.IllegalArgumentException: Kerberos principal should have 3 parts: 10.60.10.22:8888/impala/@tempuser
I've searched online for this error and it seems to be related to some java things, but I have zero knowledge of that language.
It can be useful to know that I have no access to my PC, say, I cannot install any software or do a thing that only an administrator can do.
I know the question is not well written but as I've said for this time a reproducible example is impossible.
More details
Now that I am thinking about it, I filled the IP
argument with the address I saw in the navigation bar of my browser to connect to Hue. I guessed it was the same but maybe I did wrong on this point too anyway as I've said I am pretty sure the error is not due to that.
回答1:
The R package implyr (on CRAN and GitHub) provides a dplyr backend for Impala, using either the ODBC or JDBC driver to connect. See the README for instructions.
回答2:
I've had success using the ODBC connector and the odbc
package in R. This method doesn't appear to have any Java dependencies and is recommended by the author of the implyr package. From my limited experience, this connector does a better job of correctly matching R data types to Impala data types, resulting in smaller object sizes within R.
For Macs, the process goes something like,
- Install the Cloudera ODBC connector
- Install
unixodbc
:brew install unixodbc
Follow the Cloudera ODBC connector installation guide
echo export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/opt/cloudera/impalaodbc/lib/universal >> ~/.bash_profile
create a
~/.odbcinst.ini
file with[ODBC Drivers] Cloudera ODBC Driver for Impala=Installed [Cloudera ODBC Driver for Impala] Driver=/opt/cloudera/impalaodbc/lib/universal/libclouderaimpalaodbc.dylib Description=Cloudera ODBC Driver for Impala
optionally, create a
~/.odbc.ini
file with your connection details. Here, I'm using Kerberos:[impala] Driver = Cloudera ODBC Driver for Impala Database = Host = Port = KrbHostFQDN = KrbServiceName = KrbRealm = AuthMech = 1
source ~/.bash_profile
to ensure thatDYLD_LIBRARY_PATH
is updated
- in R, ensure you have
DBI
andodbc
installed:install.packages(c("DBI", "odbc"))
Finally, to make a connection in R,
library(DBI) library(odbc) conn <- dbConnect(odbc::odbc(), driver = "Cloudera ODBC Driver for Impala", #database = "", host = "", port = , KrbHostFQDN = "", KrbServiceName = "", KrbRealm = "", AuthMech=1)
Then, to retrieve something,
dd <- dbGetQuery(conn, "select * from my_awesome_db.my_awesome_table limit 10;")
回答3:
Instead of using RImpala package, how about using RJDBC to connect. You can download latest impala JDBC driver jar file from cloudera website: http://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-5.html
then import these files in the jar to R and using them to connect.
install.packages("rJava")
install.packages("DBI")
install.packages("RJDBC")
library(DBI)
library(rJava)
library(RJDBC)
cp <- c(
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/commons-codec-1.3.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/commons-logging-1.1.1.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/hive_metastore.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/hive_service.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/httpclient-4.1.3.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/httpcore-4.1.3.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/libfb303-0.9.0.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/libthrift-0.9.0.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/log4j-1.2.14.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/ql.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/slf4j-api-1.5.11.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/slf4j-log4j12-1.5.11.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/TCLIServiceClient.jar",
"C:/Users/Cloudera_ImpalaJDBC4_2.5.31/zookeeper-3.4.6.jar"
)
.jinit(classpath<-cp)
drv <- JDBC("com.cloudera.impala.jdbc4.Driver", "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/ImpalaJDBC4.jar")
con <- dbConnect(drv, "jdbc:impala://your_impala_host_address:21050;AuthMech= your authmech number if applicable", "username", "pwd")
data <- dbGetQuery(con, "SELECT * FROM mydb limit 25")
summary(data)
回答4:
Just wanted to put in another way to access classpaths rather than writing all the jars:
drv <- JDBC(driverClass = "com.cloudera.impala.jdbc3.Driver",
classPath = list.files("C:/Users/Impala",
pattern="jar$",full.names=T),
identifier.quote="'")
回答5:
Use the RODBC package. I successfully use it in production. Here, I write down a tutorial for it. From this Blog
- Download ClouderaImpalaODBC32.msi and install it.
- Open it and type the required information you need in the argument fields, here are some screenshots to helps you do it more quickly.
- In the R environment, install and library RODBC package.
- type,
library(RODBC)
impala <- odbcConnect("Impala")
sqlQuery(impala,"select * from xxx")
By the way, if your environment is under Win 10, in the function odbcConnect
, you have to give your username and password (Just from my colleague report).
I hope you successfully use impala by R.
来源:https://stackoverflow.com/questions/33551542/connect-r-and-impala