dbplyr | 易学教程

How to use EXTRACT through dbplyr when connecting to an Oracle DB

阅读更多关于 How to use EXTRACT through dbplyr when connecting to an Oracle DB

问题 Take this query: SELECT EXTRACT(month FROM order_date) "Month" FROM orders (simplified example from official oracle doc) How would you go at integrating such EXTRACT operations above in a dbplyr chain ? I'm open to any other workaround (even ugly/costly) to extract the month on server side. 回答1: More elegant: tbl(con, "orders") %>% mutate(Month = extract(NULL %month from% order_date)) This results in the following SQL (ANSI SQL): EXTRACT( MONTH FROM "order_date") This trick works because the

How to pass data.frame into SQL “IN” condition using R?

阅读更多关于 How to pass data.frame into SQL “IN” condition using R?

问题 I am reading list of values from CSV file in R, and trying to pass the values into IN condition of SQL(dbGetQuery). Can some one help me out with this? library(rJava) library(RJDBC) library(dbplyr) library(tibble) library(DBI) library(RODBC) library(data.table) jdbcDriver <- JDBC("oracle.jdbc.OracleDriver",classPath="C://Users/********/Oracle_JDBC/ojdbc6.jar") jdbcConnection <- dbConnect(jdbcDriver, "jdbc:oracle:thin:Rahul@//Host/DB", "User_name", "Password") ## Setting working directory for

Apply a ranking window function in dbplyr backend

阅读更多关于 Apply a ranking window function in dbplyr backend

问题 I want to seamlessly identify new orders (acquisitions) and returns in my transactional database table. This sounds like the perfect job for a window function; I would like to perform this operation in dbplyr . My current process is to: Create a query object I then use into dbGetQuery() ; this query contains a standard rank() window function as usually seen in postgresql Ingest this query into my R environment Then using an ifelse() function into the mutate() verb, I identify the first orders

Issue with dbplyr::spread() on tbl_sql

阅读更多关于 Issue with dbplyr::spread() on tbl_sql

问题 This is a specific issue of the following dev version of dbplyr: devtools::install_github("tidyverse/dbplyr", ref = devtools::github_pull(72)) developed by @edgararuiz It seems to me that the spread function doesn't work properly... df_sample <- tribble(~group1, ~group2, ~group3, ~identifier, ~value, 8, 24, 6, 'mt_0', 12, 18, 24, 6, 'mt_1', 4) con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") df_db <- copy_to(con, df_sample, 'df_sample') I obtained an incorrect result with the following

Joining across databases with dbplyr

阅读更多关于 Joining across databases with dbplyr

问题 I am working with database tables with dbplyr I have a local table and want to join it with a large (150m rows) table on the database The database PRODUCTION is read only # Set up the connection and point to the table library(odbc); library(dbplyr) my_conn_string <- paste("Driver={Teradata};DBCName=teradata2690;DATABASE=PRODUCTION;UID=", t2690_username,";PWD=",t2690_password, sep="") t2690 <- dbConnect(odbc::odbc(), .connection_string=my_conn_string) order_line <- tbl(t2690, "order_line")

How to use a window function to determine when to perform different tasks?

阅读更多关于 How to use a window function to determine when to perform different tasks?

问题 Note: Similar question I have asked for SQL - How to use a window function to determine when to perform different tasks in Hive or Postgres? Data I have a some data showing the start day and end day for different pre-prioritised tasks per person: input_df <- data.frame(person = c(rep("Kate", 2), rep("Adam", 2), rep("Eve", 2), rep("Jason", 5)), task_key = c(c("A","B"), c("A","B"), c("A","B"), c("A","B","C","D","E")), start_day = c(c(1L,1L), c(1L,2L), c(2L,1L), c(1L,4L,3L,5L,4L)), end_day = 5L)

How to spread tbl_dbi and tbl_sql data without downloading to local memory

阅读更多关于 How to spread tbl_dbi and tbl_sql data without downloading to local memory

问题 I am working with large datasets and tidyr's spread usually gives me error messages suggesting failure to obtain memory to perform the operation. Therefore, I have been exploring dbplyr. However, as it says here, and also shown below, dbplyr::spread() does not work. My question here is whether there is another way to accomplish what tidyr::spread does while working with tbl_dbi and tbl_sql data without downloading to local memory. Using sample data from here, below I present what I get and

Adding column to sqlite database

阅读更多关于 Adding column to sqlite database

问题 I am trying to add a vector which I generated in R to a sqlite table as a new column. For this I wanted to use dplyr (I installed the most recent dev. version along with the dbplyr package according to this post here). What I tried: library(dplyr) library(DBI) #creating initial database and table dbcon <- dbConnect(RSQLite::SQLite(), "cars.db") dbWriteTable(dbcon, name = "cars", value = cars) cars_tbl <- dplyr::tbl(dbcon, "cars") #new values which I want to add as a new column new_values <-

How to use a window function to determine when to perform different tasks?

阅读更多关于 How to use a window function to determine when to perform different tasks?

Note: Similar question I have asked for SQL - How to use a window function to determine when to perform different tasks in Hive or Postgres? Data I have a some data showing the start day and end day for different pre-prioritised tasks per person: input_df <- data.frame(person = c(rep("Kate", 2), rep("Adam", 2), rep("Eve", 2), rep("Jason", 5)), task_key = c(c("A","B"), c("A","B"), c("A","B"), c("A","B","C","D","E")), start_day = c(c(1L,1L), c(1L,2L), c(2L,1L), c(1L,4L,3L,5L,4L)), end_day = 5L) person task_key start_day end_day 1 Kate A 1 5 2 Kate B 1 5 3 Adam A 1 5 4 Adam B 2 5 5 Eve A 2 5 6

How to spread tbl_dbi and tbl_sql data without downloading to local memory

阅读更多关于 How to spread tbl_dbi and tbl_sql data without downloading to local memory

I am working with large datasets and tidyr's spread usually gives me error messages suggesting failure to obtain memory to perform the operation. Therefore, I have been exploring dbplyr . However, as it says here , and also shown below, dbplyr::spread() does not work. My question here is whether there is another way to accomplish what tidyr::spread does while working with tbl_dbi and tbl_sql data without downloading to local memory. Using sample data from here , below I present what I get and what I would like to do and get. #sample tbl_dbi and tbl_sql data df_sample <- tribble(~group1,