tidyverse | 易学教程

Define sequences based on a variable run with additional condition from another variable

阅读更多关于 Define sequences based on a variable run with additional condition from another variable

问题 structure(list(group = c(NA, "A", "B", NA, "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", NA, NA, "B", "B", "A", "A", NA, NA, "B", "B", "B", NA, "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", NA, NA, "B", "B", NA, "A"), seq_break = c(TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE,

R - Pivot like data

阅读更多关于 R - Pivot like data

问题 Using R and tidyverse library I am trying to achieve pivot like result. Here it sample data set: zz <- " Date ParAB ParCD 1 2017-05-27 A C 2 2017-05-27 B D 3 2017-05-27 A D 4 2017-05-27 B C 5 2017-05-27 B C 6 2017-05-28 A D 7 2017-05-28 A C 8 2017-05-28 A C 9 2017-05-28 A D" Data <- read.table(text=zz, header = TRUE)} I would like transform the data to look like this with number of occurrences per day: Date A B C D 2017-05-27 2 3 3 2 2017-05-28 2 0 1 1 I tired spread function that works great

purrr::pmap with dplyr::mutate

阅读更多关于 purrr::pmap with dplyr::mutate

问题 I have a function which takes multiple inputs and creates multiple outputs. For example: example_fun = function(a,b){ x = a+b y = a-b return(list(x=x, y=y)) } How can I use dplyr::mutate to evaluate this function on each row of a dataframe? Turn df = expand.grid(a=c(7,8), b=c(9,10)) df a b 1 7 9 2 8 9 3 7 10 4 8 10 into a b x y 1 7 9 16 -2 2 8 9 17 -1 3 7 10 17 -3 4 8 10 18 -2 this following code almost accomplishes it: df = df %>% mutate(outputs = pmap(list(a,b), example_fun)) %>% unnest()

Force date as new line on reading non-delimited text file

阅读更多关于 Force date as new line on reading non-delimited text file

问题 I am trying to read in and work with a horribly formatted debug log. There are no consistent delimeters and it does not appear line breaks are encoded either. What I'd like to do is read in and parse the data to have a new line for each date (YYYY-MM-DD format). I am trying to work within the tidyverse but cannot seem to get something that will parse the file correctly. Is there a way to force lines to be delimited by a date pattern? None of these work: library(tidyverse) Log_File <- read

looping over a list of filter expressions: problem with NSE in map2 call within mutate

阅读更多关于 looping over a list of filter expressions: problem with NSE in map2 call within mutate

问题 I have defined a list of expressions containing arguments I want to pass to a dplyr::filter call. library(tidyverse) # using tidyr 1.0.0 cond_filter <- list(expr(1 > 0), # condition to select all rows expr(Species == "setosa"), expr(Species != "virginica")) I further have a data frame that I put into a list-column and which I then expand by the number of filter expressions in said list. iris_nest <- iris %>% nest(data = everything()) %>% expand_grid(., filters = cond_filter) In a last step I

how to merge data across data.frame rows based on identical column values

阅读更多关于 how to merge data across data.frame rows based on identical column values

问题 How would you merge values between rows that have identical values in id_3 ? I'm sure there's a better name for the question title but I'm struggling to find the appropriate operation/function name(s) for this procedure. library(tidyverse) id_1 <- c("x12", NA, "a_bc", NA) id_2 <- c(NA, "gye", NA, "ab_c") id_3 <- c("qwe", "ert", "abc", "abc") param_1 <- c(0.21, 1.5, 0.23, NA) param_12 <- c(0.05, 4.4, NA, 6.3) df <- data.frame(id_1, id_2, id_3, param_1, param_12) as_tibble(df) # id_1 id_2 id_3

{if…else..} statement after group_by in dplyr chain

阅读更多关于 {if…else..} statement after group_by in dplyr chain

问题 To illustrate what I'm trying to do, I'm using diamond dataset as an example. After group_by(cut), I want to do lm on each group, depending on the mean depth of each group, and then save the model in the dataframe. diamonds %>% group_by(cut) %>% mutate(mean.depth=mean(depth)) %>% {if (.$mean.depth>60) do(model=lm(price~x, data=.)) else do(model=lm(price~y, data=.))} This is what I got: Error in function_list[[k]](value) : object 'mean.depth' not found Spent an hour to fix it but failed.

Grouping a data frame by dates: resolve missing time periods' bug

阅读更多关于 Grouping a data frame by dates: resolve missing time periods' bug

问题 I've identified, if not myself created, a difficult bug to resolve in some nice code received from a generous respondent here on StackOverflow a few weeks ago, and I could use some new assistance today. Sample data (called object eh below): ID 2013-03-20 2013-04-09 2013-04-11 2013-04-17 2013-04-25 2013-05-15 2013-05-24 2013-05-25 2013-05-26 5167f 0 0 0 0 0 0 0 0 0 1214m 0 0 0 0 0 0 0 0 0 1844f 0 0 0 0 0 0 0 0 0 2113m 0 0 0 0 0 0 0 0 0 2254m 0 0 0 0 0 0 0 0 0 2721f 0 0 0 0 0 0 0 0 0 3121f 0 0

`gather` can't handle rownames

阅读更多关于 `gather` can't handle rownames

问题 allcsvs = list.files(pattern = "*.csv$", recursive = TRUE) library(tidyverse) ##LOOP to redact the snow data csvs## for(x in 1:length(allcsvs)) { df = read.csv(allcsvs[x], check.names = FALSE) newdf = df %>% gather(COL_DATE, SNOW_DEPTH, -PT_ID, -DATE) %>% mutate( DATE = as.Date(DATE,format = "%m/%d/%Y"), COL_DATE = as.Date(COL_DATE, format = "%Y.%m.%d") ) %>% filter(DATE == COL_DATE) %>% select(-COL_DATE) ####TURN DATES UNAMBIGUOUS HERE#### df$DATE = lubridate::mdy(df$DATE) finaldf = merge

How to copy grouped rows into column by dplyr/tidyverse in R?

阅读更多关于 How to copy grouped rows into column by dplyr/tidyverse in R?

问题 I am trying to copy sets of rows into columns using dplyr. Following is my data frame. df <- data.frame( hid=c(1,1,1,1,2,2,2,2,2,3,3,3,3), mid=c(1,2,3,4,1,2,3,4,5,1,2,3,4), tmid=c("010","01010","010","01020", "010","0120","010","010","020", "010","01010","010","01020"), thid=c("010","02020","010","02020", "000","0120","010","010","010", "010","02020","010","02020"), ) It is printed in the following format: > df hid mid tmid thid 1 1 1 010 010 2 1 2 01010 02020 3 1 3 010 010 4 1 4 01020 02020