dplyr

write table in database with dplyr

旧时模样 提交于 2021-02-07 12:45:28
问题 Is there a way to make dplyr hooked up to a database pipe data to a new table within that database, never downloading the data locally? I'd like to do something along the lines of: tbl(con, "mytable") %>% group_by(dt) %>% tally() %>% write_to(name = "mytable_2", schema = "transformed") 回答1: While I whole heartedly agree with the suggestion to learn SQL, you can take advantage of the fact that dplyr doesn't pull data until it absolutely has to and build the query using dplyr , add the TO TABLE

How to mimic geom_boxplot() with outliers using geom_boxplot(stat = “identity”)

霸气de小男生 提交于 2021-02-07 12:38:21
问题 I would like to pre-compute by-variable summaries of data (with plyr and passing a quantile function) and then plot with geom_boxplot(stat = "identity") . This works great except it (a) does not plot outliers as points and (b) extends the "whiskers" to the max and min of the data being plotted. Example: library(plyr) library(ggplot2) set.seed(4) df <- data.frame(fact = sample(letters[1:2], 12, replace = TRUE), val = c(1:10, 100, 101)) df # fact val # 1 b 1 # 2 a 2 # 3 a 3 # 4 a 4 # 5 b 5 # 6

In dplyr, what are the intrinsic differences between setdiff and anti_join?

放肆的年华 提交于 2021-02-07 12:17:07
问题 I'm still working through the lessons on DataCamp for R, so please forgive me if this question seems naïve. Consider the following (very contrived) sample: library(dplyr) library(tibble) type <- c("Dog", "Cat", "Cat", "Cat") name <- c("Ella", "Arrow", "Gabby", "Eddie") pets = tibble(name, type) name <- c("Ella", "Arrow", "Dog") type <- c("Dog", "Cat", "Calvin") favorites = tibble(name, type) anti_join(favorites, pets, by = "name") setdiff(favorites, pets, by = "name") Both of these return

Using cummean with group_by and ignoring NAs

送分小仙女□ 提交于 2021-02-07 10:28:10
问题 df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"), value=c(NA,2,3,4,5,NA,7,8)) I'd like to add a new column to the above dataframe which takes the cumulative mean of the value column, not taking into account NAs. Is it possible to do this with dplyr ? I've tried df <- df %>% group_by(category) %>% mutate(new_col=cummean(value)) but cummean just doesn't know what to do with NAs. EDIT: I do not want to count NAs as 0. 回答1: You could use ifelse to treat NA s as

Using cummean with group_by and ignoring NAs

拜拜、爱过 提交于 2021-02-07 10:28:10
问题 df <- data.frame(category=c("cat1","cat1","cat2","cat1","cat2","cat2","cat1","cat2"), value=c(NA,2,3,4,5,NA,7,8)) I'd like to add a new column to the above dataframe which takes the cumulative mean of the value column, not taking into account NAs. Is it possible to do this with dplyr ? I've tried df <- df %>% group_by(category) %>% mutate(new_col=cummean(value)) but cummean just doesn't know what to do with NAs. EDIT: I do not want to count NAs as 0. 回答1: You could use ifelse to treat NA s as

Count values less than x and find nearest values to x by multiple groups

痞子三分冷 提交于 2021-02-07 08:53:43
问题 Sample data frame data uid bas_id dist2mouth type 2020 2019 W3A9101601 2.413629 1 2021 2020 W3A9101601 2.413629 1 2022 2021 W3A9101602 2.413629 1 2023 2022 W3A9101602 3.313893 1 2032 2031 W3A9101602 3.313893 1 2033 2032 W3A9101602 3.313893 1 2034 2033 W3A9101602 3.313893 1 15023 15022 W3A9101601 1.349000 2 15025 15024 W3A9101601 3.880000 2 15026 15025 W3A9101602 3.880000 2 15027 15026 W3A9101602 0.541101 2 16106 17097 W3A9101602 1.349000 2 For each row I'd like to calculate how many rows of

Count values less than x and find nearest values to x by multiple groups

天大地大妈咪最大 提交于 2021-02-07 08:52:50
问题 Sample data frame data uid bas_id dist2mouth type 2020 2019 W3A9101601 2.413629 1 2021 2020 W3A9101601 2.413629 1 2022 2021 W3A9101602 2.413629 1 2023 2022 W3A9101602 3.313893 1 2032 2031 W3A9101602 3.313893 1 2033 2032 W3A9101602 3.313893 1 2034 2033 W3A9101602 3.313893 1 15023 15022 W3A9101601 1.349000 2 15025 15024 W3A9101601 3.880000 2 15026 15025 W3A9101602 3.880000 2 15027 15026 W3A9101602 0.541101 2 16106 17097 W3A9101602 1.349000 2 For each row I'd like to calculate how many rows of

Count values less than x and find nearest values to x by multiple groups

最后都变了- 提交于 2021-02-07 08:52:47
问题 Sample data frame data uid bas_id dist2mouth type 2020 2019 W3A9101601 2.413629 1 2021 2020 W3A9101601 2.413629 1 2022 2021 W3A9101602 2.413629 1 2023 2022 W3A9101602 3.313893 1 2032 2031 W3A9101602 3.313893 1 2033 2032 W3A9101602 3.313893 1 2034 2033 W3A9101602 3.313893 1 15023 15022 W3A9101601 1.349000 2 15025 15024 W3A9101601 3.880000 2 15026 15025 W3A9101602 3.880000 2 15027 15026 W3A9101602 0.541101 2 16106 17097 W3A9101602 1.349000 2 For each row I'd like to calculate how many rows of

remove everything after the last underscore of a column in R [duplicate]

时光怂恿深爱的人放手 提交于 2021-02-07 06:55:47
问题 This question already has answers here : R regex find last occurrence of delimiter (4 answers) Closed 4 years ago . I have a dataframe and for a particular column I want to strip out everything after the last underscore. So: test <- data.frame(label=c('test_test_test', 'test_tom_cat', 'tset_eat_food', 'tisk - tisk'), stuff=c('blah', 'blag', 'gah', 'nah') , numbers=c(1,2,3, 4)) should become result <- data.frame(label=c('test_test', 'test_tom', 'tset_eat', 'tisk - tisk'), stuff=c('blah', 'blag

tidyverse: row wise calculations by group

纵然是瞬间 提交于 2021-02-07 06:25:40
问题 I am trying to do an inventory calculation in R which requires a row wise calculation for each Mat-Plant combination. Here's a test data set - df <- structure(list(Mat = c("A", "A", "A", "A", "A", "A", "B", "B" ), Plant = c("P1", "P1", "P1", "P2", "P2", "P2", "P1", "P1"), Day = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L), UU = c(0L, 10L, 0L, 0L, 0L, 120L, 10L, 0L), CumDailyFcst = c(11L, 22L, 33L, 0L, 5L, 10L, 20L, 50L)), .Names = c("Mat", "Plant", "Day", "UU", "CumDailyFcst"), class = "data.frame", row