tidyr

Spread with duplicate identifiers for rows [duplicate]

独自空忆成欢 提交于 2019-11-29 10:25:38
This question already has an answer here: Using spread with duplicate identifiers for rows 3 answers There has been questions on this topic before here , but I am still struggling with spreading this. I would like so each state to have its own column of temperatures values. Here is a dput() of my data. I'll call it df structure(list(date = c("2018-01-21", "2018-01-21", "2018-01-20", "2018-01-20", "2018-01-19", "2018-01-19", "2018-01-18", "2018-01-18", "2018-01-17", "2018-01-17", "2018-01-16", "2018-01-16", "2018-01-15", "2018-01-15", "2018-01-14", "2018-01-14", "2018-01-12", "2018-01-12",

tidyr spread function generates sparse matrix when compact vector expected

不打扰是莪最后的温柔 提交于 2019-11-29 09:34:05
I'm learning dplyr, having come from plyr, and I want to generate (per group) columns (per interaction) from the output of xtabs. Short summary: I'm getting A B 1 NA NA 2 when I wanted A B 1 2 xtabs data looks like this: > xtabs(data=data.frame(P=c(F,T,F,T,F),A=c(F,F,T,T,T))) A P FALSE TRUE FALSE 1 2 TRUE 1 1 now do( wants it's data in data frames, like this: > xtabs(data=data.frame(P=c(F,T,F,T,F),A=c(F,F,T,T,T))) %>% as.data.frame P A Freq 1 FALSE FALSE 1 2 TRUE FALSE 1 3 FALSE TRUE 2 4 TRUE TRUE 1 Now I want a single row output with columns being the interaction of levels. Here's what I'm

tidyr: multiple unnesting with varying NA counts

戏子无情 提交于 2019-11-29 07:59:54
I'm confused about some tidyr behavior. I can unnest a single response like this: library(tidyr) resp1 <- c("A", "B; A", "B", NA, "B") resp2 <- c("C; D; F", NA, "C; F", "D", "E") resp3 <- c(NA, NA, "G; H; I", "H; I", "I") data <- data.frame(resp1, resp2, resp3, stringsAsFactors = F) tidy <- data %>% transform(resp1 = strsplit(resp1, "; ")) %>% unnest() # Source: local data frame [6 x 3] # # resp2 resp3 resp1 # (chr) (chr) (chr) # 1 C; D; F NA A # 2 NA NA B # 3 NA NA A # 4 C; F G; H; I B # 5 D H; I NA # 6 E I B But I need to unnest multiple columns in my dataset, and the columns have varying

tidyr separate only first n instances [duplicate]

风流意气都作罢 提交于 2019-11-29 07:33:22
This question already has an answer here: How to strsplit different number of strings in certain column by do function 1 answer I have a data.frame in R, which, for simplicity, has one column that I want to separate. It looks like this: V1 Value_is_the_best_one This_is_the_prettiest_thing_I've_ever_seen Here_is_the_next_example_of_what_I_want My real data is very large (millions of rows), so I'd like to use tidyr's separate function (because it's amazingly fast) to separate out JUST the first few instances. I'd like the result to be the following: V1 V2 V3 V4 Value is the best_one This is the

Unnesting a list of lists in a data frame column

眉间皱痕 提交于 2019-11-28 22:55:01
问题 To unnest a data frame I can use: df <- data_frame( x = 1, y = list(a = 1, b = 2) ) tidyr::unnest(df) But how can I unnest a list inside of a list inside of a data frame column? df <- data_frame( x = 1, y = list(list(a = 1, b = 2)) ) tidyr::unnest(df) Error: Each column must either be a list of vectors or a list of data frames [y] 回答1: With purrr , which is nice for lists, library(purrr) df %>% dmap(unlist) ## # A tibble: 2 x 2 ## x y ## <dbl> <dbl> ## 1 1 1 ## 2 1 2 which is more or less

Comparison between dplyr::do / purrr::map, what advantages? [closed]

霸气de小男生 提交于 2019-11-28 16:43:15
When using broom I was used to combine dplyr::group_by and dplyr::do to perform actions on grouped data thanks to @drob. For example, fitting a linear model to cars depending on their gear system: library("dplyr") library("tidyr") library("broom") # using do() mtcars %>% group_by(am) %>% do(tidy(lm(mpg ~ wt, data = .))) # Source: local data frame [4 x 6] # Groups: am [2] # am term estimate std.error statistic p.value # (dbl) (chr) (dbl) (dbl) (dbl) (dbl) # 1 0 (Intercept) 31.416055 2.9467213 10.661360 6.007748e-09 # 2 0 wt -3.785908 0.7665567 -4.938848 1.245595e-04 # 3 1 (Intercept) 46.294478

Using tidyr spread function to create columns with binary value

爷,独闯天下 提交于 2019-11-28 13:29:19
I am aware of spread function in tidyr package but this is something I am unable to achieve. I have a data.frame with 2 columns as defined below. I need to transpose the column Subject into binary columns with 1 and 0. Below is the data.frame studentInfo <- data.frame(StudentID = c(1,1,1,2,3,3), Subject = c("Maths", "Science", "English", "Maths", "History", "History")) > studentInfo StudentID Subject 1 1 Maths 2 1 Science 3 1 English 4 2 Maths 5 3 History 6 3 History And the output I am expecting is: StudentID Maths Science English History 1 1 1 1 1 0 2 2 1 0 0 0 3 3 0 0 0 1 Please assist how

Separate a column into multiple columns using tidyr::separate with sep=“”

不问归期 提交于 2019-11-28 12:26:54
df <- data.frame(category = c("X", "Y"), sequence = c("AAT.G", "CCG-T"), stringsAsFactors = FALSE) df category sequence 1 X AAT.G 2 Y CCG-T I want to separate the column sequence into 5 columns (one for each character). I tried to do that with tidyr::separate but it internally uses stringi::stri_split_regex which doesn't accept an empty string as a separator (although the sep argument should take a regex). library(tidyr) separate(df, sequence, into = paste0("V", 1:5), sep="") Error: Values not split into 5 pieces at 1, 2 In addition: Warning messages: 1: In stringi::stri_split_regex(value, sep

How to use the spread function properly in tidyr

ⅰ亾dé卋堺 提交于 2019-11-28 12:22:34
How do I change the following table from: Type Name Answer n TypeA Apple Yes 5 TypeA Apple No 10 TypeA Apple DK 8 TypeA Apple NA 20 TypeA Orange Yes 6 TypeA Orange No 11 TypeA Orange DK 8 TypeA Orange NA 23 Change to: Type Name Yes No DK NA TypeA Apple 5 10 8 20 TypeA Orange 6 11 8 23 I used the following codes to get the first table. df_1 <- df %>% group_by(Type, Name, Answer) %>% tally() Then I tried to use the spread command to get to the 2nd table, but I got the following error message: "Error: All columns must be named" df_2 <- spread(df_1, Answer) Following on the comment from ayk, I'm

From long to wide data with multiple columns

做~自己de王妃 提交于 2019-11-28 12:19:41
Suggestions for how to smoothly get from foo to foo2 (preferably with tidyr or reshape2 packages)? This is kind of like this question , but not exactly I think, because I don't want to auto-number columns, just widen multiple columns. It's also kind of like this question , but again, I don't think I want the columns to vary with a row value as in that answer. Or, a valid answer to this question is to convince me it's exactly like one of the others. The solution in the second question of "two dcasts plus a merge" is the most attractive right now, because it is comprehensible to me. foo: foo =