Connect R and Impala

邮差的信 提交于 2019-12-06 13:51:41

问题


I know of course about reproducible example and piece of code but for this question I have to be (I can't be otherwise) obscure.

I am trying to connect R and Impala. Putting aside the problems ("officially", I cannot install software on this PC... but I have used portable versions of R and RStudio)

I've tried the RImpala package.

rimpala.connect(IP = myip,
              port = the port where Impala sees,
              principal = maybe this is not clear)

I am pretty sure that the causes of my problems is the principal argument, the documentation is not clear to me. Anyway, I've tried several combinations of what the documentations says it should be placed there.

In any case I get the same error:

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.IllegalArgumentException: Kerberos principal should have 3 parts: 10.60.10.22:8888/impala/@tempuser

I've searched online for this error and it seems to be related to some java things, but I have zero knowledge of that language.

It can be useful to know that I have no access to my PC, say, I cannot install any software or do a thing that only an administrator can do.

I know the question is not well written but as I've said for this time a reproducible example is impossible.

More details

Now that I am thinking about it, I filled the IP argument with the address I saw in the navigation bar of my browser to connect to Hue. I guessed it was the same but maybe I did wrong on this point too anyway as I've said I am pretty sure the error is not due to that.


回答1:


The R package implyr (on CRAN and GitHub) provides a dplyr backend for Impala, using either the ODBC or JDBC driver to connect. See the README for instructions.




回答2:


I've had success using the ODBC connector and the odbc package in R. This method doesn't appear to have any Java dependencies and is recommended by the author of the implyr package. From my limited experience, this connector does a better job of correctly matching R data types to Impala data types, resulting in smaller object sizes within R.

For Macs, the process goes something like,

  • Install the Cloudera ODBC connector
  • Install unixodbc: brew install unixodbc
  • Follow the Cloudera ODBC connector installation guide

    • echo export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/opt/cloudera/impalaodbc/lib/universal >> ~/.bash_profile
    • create a ~/.odbcinst.ini file with

      [ODBC Drivers] 
      Cloudera ODBC Driver for Impala=Installed
      [Cloudera ODBC Driver for Impala]
      Driver=/opt/cloudera/impalaodbc/lib/universal/libclouderaimpalaodbc.dylib
      Description=Cloudera ODBC Driver for Impala
      
    • optionally, create a ~/.odbc.ini file with your connection details. Here, I'm using Kerberos:

      [impala]
      Driver = Cloudera ODBC Driver for Impala
      Database = 
      Host =
      Port =
      KrbHostFQDN =
      KrbServiceName =
      KrbRealm =
      AuthMech = 1
      
    • source ~/.bash_profile to ensure that DYLD_LIBRARY_PATH is updated
  • in R, ensure you have DBI and odbc installed: install.packages(c("DBI", "odbc"))
  • Finally, to make a connection in R,

    library(DBI)
    library(odbc)
    conn <- dbConnect(odbc::odbc(),
                   driver = "Cloudera ODBC Driver for Impala",
                   #database = "",
                   host = "",
                   port = ,
                   KrbHostFQDN = "",
                   KrbServiceName = "",
                   KrbRealm = "",
                   AuthMech=1)
    
  • Then, to retrieve something,

    dd <- dbGetQuery(conn, "select * from my_awesome_db.my_awesome_table limit 10;")
    



回答3:


Instead of using RImpala package, how about using RJDBC to connect. You can download latest impala JDBC driver jar file from cloudera website: http://www.cloudera.com/downloads/connectors/impala/jdbc/2-5-5.html

then import these files in the jar to R and using them to connect.

install.packages("rJava")

install.packages("DBI")

install.packages("RJDBC")

library(DBI)

library(rJava)

library(RJDBC)

cp <- c(        
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/commons-codec-1.3.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/commons-logging-1.1.1.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/hive_metastore.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/hive_service.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/httpclient-4.1.3.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/httpcore-4.1.3.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/libfb303-0.9.0.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/libthrift-0.9.0.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/log4j-1.2.14.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/ql.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/slf4j-api-1.5.11.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/slf4j-log4j12-1.5.11.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/TCLIServiceClient.jar",
           "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/zookeeper-3.4.6.jar"
)

.jinit(classpath<-cp)

drv <- JDBC("com.cloudera.impala.jdbc4.Driver", "C:/Users/Cloudera_ImpalaJDBC4_2.5.31/ImpalaJDBC4.jar")

con <- dbConnect(drv, "jdbc:impala://your_impala_host_address:21050;AuthMech= your authmech number if applicable", "username", "pwd")

data <- dbGetQuery(con, "SELECT * FROM mydb limit 25")

summary(data)



回答4:


Just wanted to put in another way to access classpaths rather than writing all the jars:

drv <- JDBC(driverClass = "com.cloudera.impala.jdbc3.Driver", 
        classPath = list.files("C:/Users/Impala",
                               pattern="jar$",full.names=T),
        identifier.quote="'")



回答5:


Use the RODBC package. I successfully use it in production. Here, I write down a tutorial for it. From this Blog

  1. Download ClouderaImpalaODBC32.msi and install it.
  2. Open it and type the required information you need in the argument fields, here are some screenshots to helps you do it more quickly.
  3. In the R environment, install and library RODBC package.
  4. type,
library(RODBC)
impala <- odbcConnect("Impala")
sqlQuery(impala,"select * from xxx")

By the way, if your environment is under Win 10, in the function odbcConnect, you have to give your username and password (Just from my colleague report).

I hope you successfully use impala by R.



来源:https://stackoverflow.com/questions/33551542/connect-r-and-impala

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!