tidyr | 易学教程

Is it possible to use spread on multiple columns in tidyr similar to dcast?

阅读更多关于 Is it possible to use spread on multiple columns in tidyr similar to dcast?

问题 I have the following dummy data: library(dplyr) library(tidyr) library(reshape2) dt <- expand.grid(Year = 1990:2014, Product=LETTERS[1:8], Country = paste0(LETTERS, \"I\")) %>% select(Product, Country, Year) dt$value <- rnorm(nrow(dt)) I pick two product-country combinations sdt <- dt %>% filter((Product == \"A\" & Country == \"AI\") | (Product == \"B\" & Country ==\"EI\")) and I want to see the values side by side for each combination. I can do this with dcast : sdt %>% dcast(Year ~ Product

dplyr summarise: Equivalent of “.drop=FALSE” to keep groups with zero length in output

阅读更多关于 dplyr summarise: Equivalent of “.drop=FALSE” to keep groups with zero length in output

When using summarise with plyr 's ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE . However, this doesn't work when using summarise with dplyr . Is there another way to keep empty categories in the result? Here's an example with fake data. library(dplyr) df = data.frame(a=rep(1:3,4), b=rep(1:2,6)) # Now add an extra level to df$b that has no corresponding value in df$a df$b = factor(df$b, levels=1:3) # Summarise with plyr, keeping categories with a count of zero plyr::ddply(df, "b", summarise, count_a=length(a), .drop=FALSE) b count

How to tidy this dataset?

阅读更多关于 How to tidy this dataset?

问题 I have the dataset below that I want to tidy up. user_id topic may june july august september october 1 192775 talk 2 0 0 2 2 1 2 192775 walk 165 123 128 146 113 105 3 192775 bark 0 0 0 0 0 0 4 192775 harp 0 0 0 0 0 1 I want to use tidyr to shape into the below format. user_id month talk walk bark harp 192775 may 2 165 0 0 192775 june 0 123 0 0 Any help is appreciated 回答1: With: library(tidyr) df %>% gather(month, val, may:october) %>% spread(topic, val) you get: user_id month bark harp talk

Spread with duplicate identifiers (using tidyverse and %>%) [duplicate]

阅读更多关于 Spread with duplicate identifiers (using tidyverse and %>%) [duplicate]

问题 This question already has answers here : Reshaping data in R with “login” “logout” times (6 answers) Closed 2 years ago . My data looks like this: I am trying to make it look like this: I would like to do this in tidyverse using %>%-chaining. df <- structure(list(id = c(2L, 2L, 4L, 5L, 5L, 5L, 5L), start_end = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 1L), .Label = c(\"end\", \"start\"), class = \"factor\"), date = structure(c(6L, 7L, 3L, 8L, 9L, 10L, 11L), .Label = c(\"1979-01-03\", \"1979-06-21\"

Add NAs to make all list elements equal length

阅读更多关于 Add NAs to make all list elements equal length

问题 I\'m doing a series of things in dplyr , tidyr , so would like to keep with a piped solution if possible. I have a list with uneven numbers of elements in each component: lolz <- list(a = c(2,4,5,2,3), b = c(3,3,2), c=c(1,1,2,4,5,3,3), d=c(1,2,3,1), e=c(5,4,2,2)) lolz $a [1] 2 4 5 2 3 $b [1] 3 3 2 $c [1] 1 1 2 4 5 3 3 $d [1] 1 2 3 1 $e [1] 5 4 2 2 I am wondering if there\'s a neat one liner to fill up each element with NAs such that they all are of the same length as the element with the

Reshape multiple values at once

阅读更多关于 Reshape multiple values at once

I have a long data set I would like to make wide and I'm curious if there is a way to do this all in one step using the reshape2 or tidyr packages in R. The data frame df looks like this: id type transactions amount 20 income 20 100 20 expense 25 95 30 income 50 300 30 expense 45 250 I'd like to get to this: id income_transactions expense_transactions income_amount expense_amount 20 20 25 100 95 30 50 45 300 250 I know I can get part of the way there with reshape2 via for example: dcast(df, id ~ type, value.var="transactions") But is there a way to reshape the entire df in one shot addressing

Complete dataframe with missing combinations of values

阅读更多关于 Complete dataframe with missing combinations of values

问题 I have a simple question, which I can\'t figure out. I have a dataframe with two factors ( distance ) and years ( years ). I would like to complete all years values for every factor by 0. i.e. from this: distance years area 1 NPR 3 10 2 NPR 4 20 3 NPR 7 30 4 100 1 40 5 100 5 50 6 100 6 60 get this: distance years area 1 NPR 1 0 2 NPR 2 0 3 NPR 3 10 4 NPR 4 20 5 NPR 5 0 6 NPR 6 0 7 NPR 7 30 8 100 1 40 9 100 2 0 10 100 3 0 11 100 4 0 12 100 5 50 13 100 6 60 14 100 7 0 I tried to apply expand()

How to spread columns with duplicate identifiers?

阅读更多关于 How to spread columns with duplicate identifiers?

问题 A have the following tibble: structure(list(age = c(\"21\", \"17\", \"32\", \"29\", \"15\"), gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c(\"Female\", \"Male\"), class = \"factor\")), row.names = c(NA, -5L), class = c(\"tbl_df\", \"tbl\", \"data.frame\"), .Names = c(\"age\", \"gender\")) age gender <chr> <fctr> 1 21 Male 2 17 Female 3 32 Female 4 29 Male 5 15 Male And I am trying to use tidyr::spread to achieve this: Female Male 1 NA 21 2 17 NA 3 32 NA 4 NA 29 5 NA 15 I thought spread

Spread with data.frame/tibble with duplicate identifiers

阅读更多关于 Spread with data.frame/tibble with duplicate identifiers

问题 The documentation for tidyr suggests that gather and spread are transitive, but the following example with the \"iris\" data shows they are not, but it is not clear why. Any clarification would be greatly appreciated iris.df = as.data.frame(iris) long.iris.df = iris.df %>% gather(key = feature.measure, value = size, -Species) w.iris.df = long.iris.df %>% spread(key = feature.measure, value = size, -Species) I expected the data frame \"w.iris.df\" to be the same as \"iris.df\" but received the

dplyr summarise: Equivalent of “.drop=FALSE” to keep groups with zero length in output

阅读更多关于 dplyr summarise: Equivalent of “.drop=FALSE” to keep groups with zero length in output

问题 When using summarise with plyr \'s ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE . However, this doesn\'t work when using summarise with dplyr . Is there another way to keep empty categories in the result? Here\'s an example with fake data. library(dplyr) df = data.frame(a=rep(1:3,4), b=rep(1:2,6)) # Now add an extra level to df$b that has no corresponding value in df$a df$b = factor(df$b, levels=1:3) # Summarise with plyr,