sqldf

Cumulative sum by group in sqldf?

自闭症网瘾萝莉.ら 提交于 2019-11-29 11:52:46
I have a data frame with 3 variables: place, time, and value (P, T, X). I want to create a fourth variable which will be the cumulative sum of X. Normally I like to do grouping calculations with sqldf , but can't seem to find an equivalent for cumsum . That is: sqldf("select P,T,X, cumsum(X) as X_CUM from df group by P,T") doesn't work. Is this even possible with sqldf ? I tried doBy , but that doesn't all cumsum either. Set up some test data: DF <- data.frame(t = 1:4, p = rep(1:3, each = 4), value = 1:12) and now we have three solutions. First we use sqldf , as requested, using the default

Using sqldf and RPostgreSQL together

被刻印的时光 ゝ 提交于 2019-11-29 11:51:06
问题 When using RPostgreSQL I find that I cannot use sqldf in the same way. For example if I load the library and read in data into a data frame using the following code: library(RPostgreSQL) drv <- dbDriver("PostgreSQL") con <- dbConnect(drv, host="localhost", user="postgres", password="xxx", dbname="yyy", port="5436") rs <- dbSendQuery(con, "select * from table"); df<- fetch(rs, n = -1); dbClearResult(rs) dbDisconnect(con) I know have the contents of this table in the dataframe df . However if I

How can I pass R variable into sqldf?

泪湿孤枕 提交于 2019-11-29 09:49:15
I have some query like this: sqldf("select TenScore from data where State_P = 'AndhraPradesh'") But I have "AndhraPradesh" in a variable stateValue . How can I use this variable in a select query in R to get the same result as above. Please show me the syntax. You can use sprintf : sqldf(sprintf("select TenScore from data where State_P = '%s'", stateValue)) G. Grothendieck See Example 5 on the sqldf GitHub page . Example 5. Insert Variables Here is an example of inserting evaluated variables into a query using gsubfn quasi-perl-style string interpolation. gsubfn is used by sqldf so its already

Using read.csv.sql to select multiple values from a single column

巧了我就是萌 提交于 2019-11-29 08:42:28
I am using read.csv.sql from the package sqldf to try and read in a subset of rows, where the subset selects from multiple values - these values are stored in another vector. I have hacked a way to a form that works but I would like to see the correct way to pass the sql statement. Code below gives minimum example. library(sqldf) # some data write.csv(mtcars, "mtcars.csv", quote = FALSE, row.names = FALSE) # values to select from variable 'carb' cc <- c(1, 2) # This only selects last value from 'cc' vector read.csv.sql("mtcars.csv", sql = paste("select * from file where carb = ", cc )) # So

Regarding sqldf package/regexp function [duplicate]

▼魔方 西西 提交于 2019-11-28 14:35:37
This question already has an answer here: How do I use regex in a SQLite query? 13 answers I am using sqldf package and sql analyze one table generated by a classification model. But when I use the code: table<-sqldf(" SELECT a, b, c, d, e, f, CASE WHEN (REGEXP_LIKE(t, '\b(2nd time|3rd time|4th time)\b')) = TRUE THEN 1 ELSE 0 END AS UPSET_NOT_LIKE, regexp_extract(t, '\b(2nd time|3rd time|4th time)\b')) as Word FROM cls ") It looks like that the sqldf package don't have regexp_like and regexp_extract function. Is there any sql-advanced packages that I can use to do the query? G. Grothendieck

R: Date function in sqldf giving unusual answer (wrong date format?)

南笙酒味 提交于 2019-11-28 10:28:42
问题 I am trying to add to a date using sqldf, i know it should be simple but I can't figure out what is wrong with my date format. Using: sqldf("select date(model_date, '+1 day') from lapse_test") give's answers like '-4666-01-23' The model_date's are in the date format and look like 2015-01-01 I previously made them from a character string ('12/1/2015') using lapse_test$model_date <- as.Date(lapse_test$date1,format = "%m/%d/%Y") or lapse_test$model_date <- as.POSIXCT(lapse_test$date1,format = "

sqldf can't find the data with error “no such table”

混江龙づ霸主 提交于 2019-11-28 08:00:14
问题 I've been using sqldf in my R-scripts until now when I got the following error: library(sqldf) data(mtcars) out <- sqldf("SELECT * FROM mtcars") > Error in rsqlite_send_query(conn@ptr, statement) : no such table: mtcars This hasn't been a problem before now, anyone know what's the issue? 回答1: I had this problem with 0.4-10 from CRAN (Windows 10). > out <- sqldf("SELECT * FROM mtcars") Loading required package: tcltk Error in rsqlite_send_query(conn@ptr, statement) : no such table: mtcars Then

R: how to rbind two huge data-frames without running out of memory

☆樱花仙子☆ 提交于 2019-11-27 06:34:38
I have two data-frames df1 and df2 that each have around 10 million rows and 4 columns. I read them into R using RODBC/sqlQuery with no problems, but when I try to rbind them, I get that most dreaded of R error messages: cannot allocate memory . There have got to be more efficient ways to do an rbind more efficiently -- anyone have their favorite tricks on this they want to share? For instance I found this example in the doc for sqldf : # rbind a7r <- rbind(a5r, a6r) a7s <- sqldf("select * from a5s union all select * from a6s") Is that the best/recommended way to do it? UPDATE I got it to work

Summarize with conditions in dplyr

走远了吗. 提交于 2019-11-26 18:41:51
I'll illustrate my question with an example. Sample data: df <- data.frame(ID = c(1, 1, 2, 2, 3, 5), A = c("foo", "bar", "foo", "foo", "bar", "bar"), B = c(1, 5, 7, 23, 54, 202)) df ID A B 1 1 foo 1 2 1 bar 5 3 2 foo 7 4 2 foo 23 5 3 bar 54 6 5 bar 202 What I want to do is to summarize, by ID, the sum of B and the sum of B when A is "foo". I can do this in a couple steps like: require(magrittr) require(dplyr) df1 <- df %>% group_by(ID) %>% summarize(sumB = sum(B)) df2 <- df %>% filter(A == "foo") %>% group_by(ID) %>% summarize(sumBfoo = sum(B)) left_join(df1, df2) ID sumB sumBfoo 1 1 6 1 2 2

R: how to rbind two huge data-frames without running out of memory

久未见 提交于 2019-11-26 12:04:33
问题 I have two data-frames df1 and df2 that each have around 10 million rows and 4 columns. I read them into R using RODBC/sqlQuery with no problems, but when I try to rbind them, I get that most dreaded of R error messages: cannot allocate memory . There have got to be more efficient ways to do an rbind more efficiently -- anyone have their favorite tricks on this they want to share? For instance I found this example in the doc for sqldf : # rbind a7r <- rbind(a5r, a6r) a7s <- sqldf(\"select *