I am using RPostgreSQL to connect to a local database. The setup works just fine on my Linux machine. R 2.11.1, Postgres 8.4.
I was playing with the \'foreach\' with
It's more efficient to create the database connection once per worker, rather than once per task. Unfortunately, mclapply doesn't provide a mechanism for initializing the workers before executing tasks, so it's not easy to do this using the doMC backend, but if you use the doParallel backend, you can initialize the workers using clusterEvalQ. Here's an example of how to restructure the code:
library(doParallel)
cl <- makePSOCKcluster(detectCores())
registerDoParallel(cl)
clusterEvalQ(cl, {
library(DBI)
library(RPostgreSQL)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname="nsdq")
NULL
})
id.qed.foreach <- foreach(i=1588:3638, .inorder=FALSE,
.noexport="con",
.packages=c("DBI", "RPostgreSQL")) %dopar% {
lst <- eval(expr.01) #contains the SQL query which depends on 'i'
qry <- dbSendQuery(con, lst)
tmp <- fetch(qry, n=-1)
dt <- dates.qed2[i]
list(date=dt, idreuters=tmp$idreuters)
}
clusterEvalQ(cl, {
dbDisconnect(con)
})
Since doParallel and clusterEvalQ are using the same cluster object cl
, the foreach loop will have access to the database connection object con
when executing the tasks.
The following works and speeds up by ~ 1.5x over a sequential form. As a next step, I am wondering whether it is possible to attach a connection object to each of the workers spawned by registerDoMC. If so, then there would be no need to create/destroy the connection objects, which prevents from overwhelming the PostgreSQL server with connections.
pgparquery <- function(i) {
drv <- dbDriver("PostgreSQL");
con <- dbConnect(drv, dbname='nsdq');
lst <- eval(expr.01); #contains the SQL query which depends on 'i'
qry <- dbSendQuery(con,lst);
tmp <- fetch(qry,n=-1);
dt <- dates.qed2[i]
dbDisconnect(con);
result <- list(date=dt, idreuters=tmp$idreuters)
return(result)}
id.qed.foreach <- foreach(i = 1588:3638, .inorder=FALSE, .packages=c("DBI", "RPostgreSQL")) %dopar% {pgparquery(i)}
--
Vishal Belsare