问题
check1<-rimpala.query("select * from sum2")
Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, :
java.sql.SQLException: Method not supported
dim(sum2) is 49501 rows and 18 columns.
check1<-rimpala.query("select *from sum3")
dim(sum3) is 102 rows and 6 columns.
It worked with smaller sample size.
sorry that I cant reproduce example to this. Is anyone encounter the same problem with larger data size? Any idea to solve this? Thanks.
回答1:
As noted elsewhere on StackOverflow, RImpala does not implement executeUpdate
and so cannot run any query that modifies state. I suspect you hit your error not by running a larger SELECT query but rather because you tried to insert, update, or delete some data.
If you'd like to use Impala from R, I'd recommend using dplyrimpaladb.
回答2:
RImpala (v0.1.6) build is updated with the support to execute DDL queries using executeUpdate.
The latest build contains the following fixes / additions:
- Support for DDL query execution.
- fetchSize parameter in query function to state the number of records that can be retrieved in one round trip read from Impala.
- Fix for query failing when NULL values are being returned.
- Compatiblity with CDH 5.x.x
You can run DDL queries using the query function as illustrated below:
rimpala.query(Q="drop table sample_table",isDDL="true")
You can also specify the fetchSize in the query function to aid reading large data efficiently.
rimpala.query(Q="select * from sample_table",fetchSize="10000")
Please find the latest build in Cran : http://cran.r-project.org/web/packages/RImpala/index.html
Source Code : https://github.com/Mu-Sigma/RImpala
回答3:
I have the same problem with the RImpala package and recommend to use the RJDBC package:
library(RJDBC)
drv <- JDBC(driverClass = "org.apache.hive.jdbc.HiveDriver",
classPath = list.files("path_to_jars",pattern="jar$",full.names=T),
identifier.quote="`")
conn <- dbConnect(drv, "jdbc:hive2://localhost:21050/;auth=noSasl")
check1 <- dbGetQuery(conn, "select *from sum3")
I used these jar files an evenything works as expected: https://downloads.cloudera.com/impala-jdbc/impala-jdbc-0.5-2.zip
For more information and a speed comparison look at this blog post: http://datascience.la/r-and-impala-its-better-to-kiss-than-using-java/
来源:https://stackoverflow.com/questions/28213022/rimpala-query-failed-when-larger-data