RImpala: Query Failed When Larger Data

感情迁移 提交于 2019-12-25 01:55:52

问题


check1<-rimpala.query("select * from sum2")
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  java.sql.SQLException: Method not supported

dim(sum2) is 49501 rows and 18 columns.

check1<-rimpala.query("select *from sum3")

dim(sum3) is 102 rows and 6 columns.

It worked with smaller sample size.

sorry that I cant reproduce example to this. Is anyone encounter the same problem with larger data size? Any idea to solve this? Thanks.


回答1:


As noted elsewhere on StackOverflow, RImpala does not implement executeUpdate and so cannot run any query that modifies state. I suspect you hit your error not by running a larger SELECT query but rather because you tried to insert, update, or delete some data.

If you'd like to use Impala from R, I'd recommend using dplyrimpaladb.




回答2:


RImpala (v0.1.6) build is updated with the support to execute DDL queries using executeUpdate.

The latest build contains the following fixes / additions:

  1. Support for DDL query execution.
  2. fetchSize parameter in query function to state the number of records that can be retrieved in one round trip read from Impala.
  3. Fix for query failing when NULL values are being returned.
  4. Compatiblity with CDH 5.x.x

You can run DDL queries using the query function as illustrated below:

rimpala.query(Q="drop table sample_table",isDDL="true")

You can also specify the fetchSize in the query function to aid reading large data efficiently.

rimpala.query(Q="select * from sample_table",fetchSize="10000")

Please find the latest build in Cran : http://cran.r-project.org/web/packages/RImpala/index.html

Source Code : https://github.com/Mu-Sigma/RImpala




回答3:


I have the same problem with the RImpala package and recommend to use the RJDBC package:

library(RJDBC)
drv <- JDBC(driverClass = "org.apache.hive.jdbc.HiveDriver",
          classPath = list.files("path_to_jars",pattern="jar$",full.names=T),
          identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:21050/;auth=noSasl")
check1 <- dbGetQuery(conn, "select *from sum3")

I used these jar files an evenything works as expected: https://downloads.cloudera.com/impala-jdbc/impala-jdbc-0.5-2.zip

For more information and a speed comparison look at this blog post: http://datascience.la/r-and-impala-its-better-to-kiss-than-using-java/



来源:https://stackoverflow.com/questions/28213022/rimpala-query-failed-when-larger-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!