tidyr

Unlisting columns by groups

十年热恋 提交于 2019-11-27 14:51:33
I have a dataframe in the following format: id | name | logs ---+--------------------+----------------------------------------- 84 | "zibaroo" | "C47931038" 12 | "fabien kelyarsky" | c("C47331040", "B19412225", "B18511449") 96 | "mitra lutsko" | c("F19712226", "A18311450") 34 | "PaulSandoz" | "A47431044" 65 | "BeamVision" | "D47531045" As you see the column "logs" includes vectors of strings in each cell. Is there an efficient way to convert the data frame to the long format (one observation per row) without the intermediary step of separating "logs" into several columns? This is important

Spreading a two column data frame with tidyr

*爱你&永不变心* 提交于 2019-11-27 09:41:15
I have a data frame that looks like this: a b 1 x 8 2 x 6 3 y 3 4 y 4 5 z 5 6 z 6 and I want to turn it into this: x y z 1 8 3 5 2 6 4 6 But calling library(tidyr) df <- data.frame( a = c("x", "x", "y", "y", "z", "z"), b = c(8, 6, 3, 4, 5, 6) ) df %>% spread(a, b) returns x y z 1 8 NA NA 2 6 NA NA 3 NA 3 NA 4 NA 4 NA 5 NA NA 5 6 NA NA 6 What am I doing wrong? While I'm aware you're after tidyr , base has a solution in this case: unstack(df, b~a) It's also a little bit faster: Unit: microseconds expr min lq mean median uq max neval df %>% spread(a, b) 657.699 679.508 717.7725 690.484 724.9795

Comparing gather (tidyr) to melt (reshape2)

廉价感情. 提交于 2019-11-27 09:27:58
问题 I love the reshape2 package because it made life so doggone easy. Typically Hadley has made improvements in his previous packages that enable streamlined, faster running code. I figured I'd give tidyr a whirl and from what I read I thought gather was very similar to melt from reshape2 . But after reading the documentation I can't get gather to do the same task that melt does. Data View Here's a view of the data (actual data in dput form at end of post): teacher yr1.baseline pd yr1.lesson1 yr1

R: Pivoting using 'spread' function

牧云@^-^@ 提交于 2019-11-27 07:30:00
问题 Continuing from my previous post, I am now having 1 more column of ID values that I need to use to pivot rows into columns. NUM <- c(1,2,3,1,2,3,1,2,3,1) ID <- c("DJ45","DJ45","DJ45","DJ46","DJ46","DJ46","DJ47","DJ47","DJ47","DJ48") Type <- c("A", "F", "C", "B", "D", "A", "E", "C", "F", "D") Points <- c(9.2,60.8,22.9,1012.7,18.7,11.1,67.2,63.1,16.7,58.4) df1 <- data.frame(ID,NUM,Type,Points) df1: +------+-----+------+--------+ | ID | Num | Type | Points | +------+-----+------+--------+ | DJ45

Separate a column into multiple columns using tidyr::separate with sep=“”

那年仲夏 提交于 2019-11-27 07:00:22
问题 df <- data.frame(category = c("X", "Y"), sequence = c("AAT.G", "CCG-T"), stringsAsFactors = FALSE) df category sequence 1 X AAT.G 2 Y CCG-T I want to separate the column sequence into 5 columns (one for each character). I tried to do that with tidyr::separate but it internally uses stringi::stri_split_regex which doesn't accept an empty string as a separator (although the sep argument should take a regex). library(tidyr) separate(df, sequence, into = paste0("V", 1:5), sep="") Error: Values

How to use the spread function properly in tidyr

≯℡__Kan透↙ 提交于 2019-11-27 06:11:09
问题 How do I change the following table from: Type Name Answer n TypeA Apple Yes 5 TypeA Apple No 10 TypeA Apple DK 8 TypeA Apple NA 20 TypeA Orange Yes 6 TypeA Orange No 11 TypeA Orange DK 8 TypeA Orange NA 23 Change to: Type Name Yes No DK NA TypeA Apple 5 10 8 20 TypeA Orange 6 11 8 23 I used the following codes to get the first table. df_1 <- df %>% group_by(Type, Name, Answer) %>% tally() Then I tried to use the spread command to get to the 2nd table, but I got the following error message:

Spread with duplicate identifiers (using tidyverse and %>%) [duplicate]

淺唱寂寞╮ 提交于 2019-11-27 04:53:34
This question already has an answer here: Reshaping data in R with “login” “logout” times 5 answers My data looks like this: I am trying to make it look like this: I would like to do this in tidyverse using %>%-chaining. df <- structure(list(id = c(2L, 2L, 4L, 5L, 5L, 5L, 5L), start_end = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 1L), .Label = c("end", "start"), class = "factor"), date = structure(c(6L, 7L, 3L, 8L, 9L, 10L, 11L), .Label = c("1979-01-03", "1979-06-21", "1979-07-18", "1989-09-12", "1991-01-04", "1994-05-01", "1996-11-04", "2005-02-01", "2009-09-17", "2010-10-01", "2012-10-06" ), class

Add NAs to make all list elements equal length

跟風遠走 提交于 2019-11-27 04:52:27
I'm doing a series of things in dplyr , tidyr , so would like to keep with a piped solution if possible. I have a list with uneven numbers of elements in each component: lolz <- list(a = c(2,4,5,2,3), b = c(3,3,2), c=c(1,1,2,4,5,3,3), d=c(1,2,3,1), e=c(5,4,2,2)) lolz $a [1] 2 4 5 2 3 $b [1] 3 3 2 $c [1] 1 1 2 4 5 3 3 $d [1] 1 2 3 1 $e [1] 5 4 2 2 I am wondering if there's a neat one liner to fill up each element with NAs such that they all are of the same length as the element with the maximum items: I have a 2 liner: lolz %>% lapply(length) %>% unlist %>% max -> mymax lolz %>% lapply(function

Proper idiom for adding zero count rows in tidyr/dplyr

倖福魔咒の 提交于 2019-11-27 03:44:33
Suppose I have some count data that looks like this: library(tidyr) library(dplyr) X.raw <- data.frame( x = as.factor(c("A", "A", "A", "B", "B", "B")), y = as.factor(c("i", "ii", "ii", "i", "i", "i")), z = 1:6) X.raw # x y z # 1 A i 1 # 2 A ii 2 # 3 A ii 3 # 4 B i 4 # 5 B i 5 # 6 B i 6 I'd like to tidy and summarise like this: X.tidy <- X.raw %>% group_by(x,y) %>% summarise(count=sum(z)) X.tidy # Source: local data frame [3 x 3] # Groups: x # # x y count # 1 A i 1 # 2 A ii 5 # 3 B i 15 I know that for x=="B" and y=="ii" we have observed count of zero, rather than a missing value. i.e. the

Complete dataframe with missing combinations of values

时光毁灭记忆、已成空白 提交于 2019-11-27 02:07:14
I have a simple question, which I can't figure out. I have a dataframe with two factors ( distance ) and years ( years ). I would like to complete all years values for every factor by 0. i.e. from this: distance years area 1 NPR 3 10 2 NPR 4 20 3 NPR 7 30 4 100 1 40 5 100 5 50 6 100 6 60 get this: distance years area 1 NPR 1 0 2 NPR 2 0 3 NPR 3 10 4 NPR 4 20 5 NPR 5 0 6 NPR 6 0 7 NPR 7 30 8 100 1 40 9 100 2 0 10 100 3 0 11 100 4 0 12 100 5 50 13 100 6 60 14 100 7 0 I tried to apply expand() function: library(tidyr) library(dplyr, warn.conflicts = FALSE) expand(df, years = 1:7) but this just