tidyr

Is it possible to use spread on multiple columns in tidyr similar to dcast?

久未见 提交于 2019-11-26 12:21:47
问题 I have the following dummy data: library(dplyr) library(tidyr) library(reshape2) dt <- expand.grid(Year = 1990:2014, Product=LETTERS[1:8], Country = paste0(LETTERS, \"I\")) %>% select(Product, Country, Year) dt$value <- rnorm(nrow(dt)) I pick two product-country combinations sdt <- dt %>% filter((Product == \"A\" & Country == \"AI\") | (Product == \"B\" & Country ==\"EI\")) and I want to see the values side by side for each combination. I can do this with dcast : sdt %>% dcast(Year ~ Product

dplyr summarise: Equivalent of “.drop=FALSE” to keep groups with zero length in output

匆匆过客 提交于 2019-11-26 11:40:53
When using summarise with plyr 's ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE . However, this doesn't work when using summarise with dplyr . Is there another way to keep empty categories in the result? Here's an example with fake data. library(dplyr) df = data.frame(a=rep(1:3,4), b=rep(1:2,6)) # Now add an extra level to df$b that has no corresponding value in df$a df$b = factor(df$b, levels=1:3) # Summarise with plyr, keeping categories with a count of zero plyr::ddply(df, "b", summarise, count_a=length(a), .drop=FALSE) b count

How to tidy this dataset?

倾然丶 夕夏残阳落幕 提交于 2019-11-26 11:36:42
问题 I have the dataset below that I want to tidy up. user_id topic may june july august september october 1 192775 talk 2 0 0 2 2 1 2 192775 walk 165 123 128 146 113 105 3 192775 bark 0 0 0 0 0 0 4 192775 harp 0 0 0 0 0 1 I want to use tidyr to shape into the below format. user_id month talk walk bark harp 192775 may 2 165 0 0 192775 june 0 123 0 0 Any help is appreciated 回答1: With: library(tidyr) df %>% gather(month, val, may:october) %>% spread(topic, val) you get: user_id month bark harp talk

Spread with duplicate identifiers (using tidyverse and %>%) [duplicate]

ぐ巨炮叔叔 提交于 2019-11-26 11:18:27
问题 This question already has answers here : Reshaping data in R with “login” “logout” times (6 answers) Closed 2 years ago . My data looks like this: I am trying to make it look like this: I would like to do this in tidyverse using %>%-chaining. df <- structure(list(id = c(2L, 2L, 4L, 5L, 5L, 5L, 5L), start_end = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 1L), .Label = c(\"end\", \"start\"), class = \"factor\"), date = structure(c(6L, 7L, 3L, 8L, 9L, 10L, 11L), .Label = c(\"1979-01-03\", \"1979-06-21\"

Add NAs to make all list elements equal length

狂风中的少年 提交于 2019-11-26 11:17:59
问题 I\'m doing a series of things in dplyr , tidyr , so would like to keep with a piped solution if possible. I have a list with uneven numbers of elements in each component: lolz <- list(a = c(2,4,5,2,3), b = c(3,3,2), c=c(1,1,2,4,5,3,3), d=c(1,2,3,1), e=c(5,4,2,2)) lolz $a [1] 2 4 5 2 3 $b [1] 3 3 2 $c [1] 1 1 2 4 5 3 3 $d [1] 1 2 3 1 $e [1] 5 4 2 2 I am wondering if there\'s a neat one liner to fill up each element with NAs such that they all are of the same length as the element with the

Reshape multiple values at once

女生的网名这么多〃 提交于 2019-11-26 10:31:56
I have a long data set I would like to make wide and I'm curious if there is a way to do this all in one step using the reshape2 or tidyr packages in R. The data frame df looks like this: id type transactions amount 20 income 20 100 20 expense 25 95 30 income 50 300 30 expense 45 250 I'd like to get to this: id income_transactions expense_transactions income_amount expense_amount 20 20 25 100 95 30 50 45 300 250 I know I can get part of the way there with reshape2 via for example: dcast(df, id ~ type, value.var="transactions") But is there a way to reshape the entire df in one shot addressing

Complete dataframe with missing combinations of values

馋奶兔 提交于 2019-11-26 07:47:52
问题 I have a simple question, which I can\'t figure out. I have a dataframe with two factors ( distance ) and years ( years ). I would like to complete all years values for every factor by 0. i.e. from this: distance years area 1 NPR 3 10 2 NPR 4 20 3 NPR 7 30 4 100 1 40 5 100 5 50 6 100 6 60 get this: distance years area 1 NPR 1 0 2 NPR 2 0 3 NPR 3 10 4 NPR 4 20 5 NPR 5 0 6 NPR 6 0 7 NPR 7 30 8 100 1 40 9 100 2 0 10 100 3 0 11 100 4 0 12 100 5 50 13 100 6 60 14 100 7 0 I tried to apply expand()

How to spread columns with duplicate identifiers?

人走茶凉 提交于 2019-11-26 06:38:22
问题 A have the following tibble: structure(list(age = c(\"21\", \"17\", \"32\", \"29\", \"15\"), gender = structure(c(2L, 1L, 1L, 2L, 2L), .Label = c(\"Female\", \"Male\"), class = \"factor\")), row.names = c(NA, -5L), class = c(\"tbl_df\", \"tbl\", \"data.frame\"), .Names = c(\"age\", \"gender\")) age gender <chr> <fctr> 1 21 Male 2 17 Female 3 32 Female 4 29 Male 5 15 Male And I am trying to use tidyr::spread to achieve this: Female Male 1 NA 21 2 17 NA 3 32 NA 4 NA 29 5 NA 15 I thought spread

Spread with data.frame/tibble with duplicate identifiers

∥☆過路亽.° 提交于 2019-11-26 03:58:02
问题 The documentation for tidyr suggests that gather and spread are transitive, but the following example with the \"iris\" data shows they are not, but it is not clear why. Any clarification would be greatly appreciated iris.df = as.data.frame(iris) long.iris.df = iris.df %>% gather(key = feature.measure, value = size, -Species) w.iris.df = long.iris.df %>% spread(key = feature.measure, value = size, -Species) I expected the data frame \"w.iris.df\" to be the same as \"iris.df\" but received the

dplyr summarise: Equivalent of “.drop=FALSE” to keep groups with zero length in output

梦想与她 提交于 2019-11-26 02:30:08
问题 When using summarise with plyr \'s ddply function, empty categories are dropped by default. You can change this behavior by adding .drop = FALSE . However, this doesn\'t work when using summarise with dplyr . Is there another way to keep empty categories in the result? Here\'s an example with fake data. library(dplyr) df = data.frame(a=rep(1:3,4), b=rep(1:2,6)) # Now add an extra level to df$b that has no corresponding value in df$a df$b = factor(df$b, levels=1:3) # Summarise with plyr,