问题
Is there a way to make dplyr hooked up to a database pipe data to a new table within that database, never downloading the data locally?
I'd like to do something along the lines of:
tbl(con, "mytable") %>%
group_by(dt) %>%
tally() %>%
write_to(name = "mytable_2", schema = "transformed")
回答1:
While I whole heartedly agree with the suggestion to learn SQL, you can take advantage of the fact that dplyr
doesn't pull data until it absolutely has to and build the query using dplyr
, add the TO TABLE
clause, and then run the SQL statement using dplyr::do()
, as in:
# CREATE A DATABASE WITH A 'FLIGHTS' TABLE
library(RSQLite)
library(dplyr)
library(nycflights13)
my_db <- src_sqlite("~/my_db.sqlite3", create = T)
flights_sqlite <- copy_to(my_db, flights, temporary = FALSE, indexes = list(
c("year", "month", "day"), "carrier", "tailnum"))
# BUILD A QUERY
QUERY = filter(flights_sqlite, year == 2013, month == 1, day == 1) %>%
select( year, month, day, carrier, dep_delay, air_time, distance) %>%
mutate( speed = distance / air_time * 60) %>%
arrange( year, month, day, carrier)
# ADD THE "TO TABLE" CLAUSE AND EXECUTE THE QUERY
do(paste(unclass(QUERY$query$sql), "TO TABLE foo"))
You could even write a little functoin that does this:
to_table <- function(qry,tbl)
dplyr::do(paste(unclass(qry$query$sql), "TO TABLE",tbl))
and pipe the query into that function like so:
filter(flights_sqlite, year == 2013, month == 1, day == 1) %>%
select( year, month, day, carrier, dep_delay, air_time, distance) %>%
mutate( speed = distance / air_time * 60) %>%
arrange( year, month, day, carrier) %>%
to_table('foo')
来源:https://stackoverflow.com/questions/29878227/write-table-in-database-with-dplyr