Function that composes functions with existing sql translations in dbplyr

纵然是瞬间 提交于 2020-05-23 06:30:07

问题


This question arises because I wish to make a function for my convenience:

as.numeric_psql <- function(x) {

   return(as.numeric(as.integer(x)))
}

to convert boolean values in a remote postgres table into numeric. The step to convert to integer is needed as:

There is no direct cast defined between numeric and boolean. You can use integer as middle-ground. (https://stackoverflow.com/a/19290671/2109289)

Of course this function works as expected locally:

copy_to(con_psql, cars, 'tmp_cars')

tmp_cars_sdf <-
    tbl(con_psql, 'tmp_cars')


tmp_cars_sdf %>%
    mutate(low_dist = dist < 5) %>%
    mutate(low_dist = as.numeric(as.integer(low_dist)))

# # Source:   lazy query [?? x 3]
# # Database: postgres 9.5.3
#     speed  dist low_dist
#     <dbl> <dbl>    <dbl>
# 1     4     2        1
# 2     4    10        0
# 3     7     4        1
# 4     7    22        0
# 5     8    16        0

cars %>%
    mutate(low_dist = dist < 5) %>%
    mutate(low_dist = as.numeric_psql(low_dist)) %>%
    head(5)

#   speed dist low_dist
# 1     4    2        1
# 2     4   10        0
# 3     7    4        1
# 4     7   22        0
# 5     8   16        0

However, it doesn't work when used on the remote data frame, since as.numeric_psql is not in the list of sql translations, so is passed to the query verbatim:

> tmp_cars_sdf %>%
+     mutate(low_dist = dist < 5) %>%
+     mutate(low_dist = as.numeric_psql(low_dist))
Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  syntax error at or near "as"
LINE 1: SELECT "speed", "dist", as.numeric_psql("low_dist") AS "low_...
                                ^
)

My question is whether there exist a easy way (i.e. not defining a custom sql translation) of getting dplyr to understand that the function as.numeric_psql is a composition of functions that have existing sql translations, and to use those translations instead.


回答1:


One way to avoid the error is to set up the function to operate on a data frame, rather than inside mutate. For example:

copy_to(con_psql, cars, 'tmp_cars')

tmp_cars_sdf <- tbl(con_psql, 'tmp_cars')

as.numeric_psql <- function(data, x) {
  return(data %>% mutate({{x}} := as.numeric(as.integer({{x}}))))
}

tmp_cars_sdf %>%
  mutate(low_dist = dist < 5) %>%
  as.numeric_psql(low_dist)

#> # Source:   lazy query [?? x 3]
#> # Database: sqlite 3.30.1 [:memory:]
#>    speed  dist low_dist
#>    <dbl> <dbl>    <dbl>
#>  1     4     2        1
#>  2     4    10        0
#>  3     7     4        1
#>  4     7    22        0
#>  5     8    16        0
#>  6     9    10        0
#>  7    10    18        0
#>  8    10    26        0
#>  9    10    34        0
#> 10    11    17        0
#> # … with more rows

Note that in your example, in the database version low_dist already gets coded as integer when it's created, rather than as logical as it would be in a standard R data frame:

tmp_cars_sdf %>%
  mutate(low_dist = dist < 5) 
#> # Source:   lazy query [?? x 3]
#> # Database: sqlite 3.30.1 [:memory:]
#>    speed  dist low_dist
#>    <dbl> <dbl>    <int>
#>  1     4     2        1
#>  2     4    10        0
#>  3     7     4        1
#>  4     7    22        0
#>  5     8    16        0
#>  6     9    10        0
#>  7    10    18        0
#>  8    10    26        0
#>  9    10    34        0
#> 10    11    17        0
#> # … with more rows

cars %>%
  mutate(low_dist = dist < 5) %>% head
#>   speed dist low_dist
#> 1     4    2     TRUE
#> 2     4   10    FALSE
#> 3     7    4     TRUE
#> 4     7   22    FALSE
#> 5     8   16    FALSE
#> 6     9   10    FALSE


来源:https://stackoverflow.com/questions/58211123/function-that-composes-functions-with-existing-sql-translations-in-dbplyr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!