How to give dplyr a SQL query and have it return a remote tbl object?

爷,独闯天下 提交于 2020-01-15 06:50:09

问题


Say I have a remote tbl open using dbplyr, and I want to use a SQL query on it (maybe because there's not dbplyr translation for what I want to do), how do I give it such that it returns a remote tbl object?

The DBI::dbGetQuery() function allows you to give a query to db, but it returns a data frame on memory, and not an remote tbl object.

For example, say you already have a connection con open to a db, you can create a table like this:

library(tidyverse)

x_df <- expand.grid(A = c('a','b','c'), B = c('d','e','f', 'g','h')) %>% 
  mutate(C = round(rnorm(15), 2))

DBI::dbWriteTable(conn = con,
                  name = "x_tbl",
                  value = x_df,
                  overwrite = TRUE)

x_tbl = tbl(con, 'x_tbl')

sql_query <- build_sql('SELECT a, b, c, avg(c) OVER (PARTITION BY a) AS mean_c FROM x_tbl')

y_df <- DBI::dbGetQuery(con, sql_query) # This returns a data frame on memory

y_tbl <- x_tbl %>% 
  group_by(a) %>% 
  mutate(mean_c = mean(c))

show_query(y_tbl) # This is a remote tbl object

In this case, I could just use y_tbl. But there are cases in which the function has not been translated on dbplyr (for example, quantile doesn't work), and I need to use SQL code. But I don't want to collect the result, I want it to create a remote tbl object. Is there a way I can give a SQL query (like with dbGetQuery()) but have it return a remote tbl?

Thank you


回答1:


Well, playing with how it works, I think I found a way. You can give a sql query inside the mutate function:

y_tbl <- x_tbl %>% 
  group_by(a) %>% 
  mutate(mean_c = sql("avg(c) OVER (PARTITION BY a)"))

show_query(y_tbl) # This is a remote tbl object

This will let you give a SQL definition of a variable without having to compute the table too.




回答2:


As I understand it, there are a collection of standard translations that dbplyr makes from dplyr to SQL. Anything that falls outside this translation is left as is.

For example, DATEFROMPARTS is an SQL function but not an R function. I commonly use the following mutate:

y_tbl <- x_tbl %>% 
    mutate(new_date = DATEFROMPARTS(year_col, month_col, day_col)

And because there is no defined translation from an R function DATEFROMPARTS to an SQL function (because the R function does not exist in dplyr) it is left as is.



来源:https://stackoverflow.com/questions/55348039/how-to-give-dplyr-a-sql-query-and-have-it-return-a-remote-tbl-object

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!