tidyr | 易学教程

Unlisting columns by groups

阅读更多关于 Unlisting columns by groups

I have a dataframe in the following format: id | name | logs ---+--------------------+----------------------------------------- 84 | "zibaroo" | "C47931038" 12 | "fabien kelyarsky" | c("C47331040", "B19412225", "B18511449") 96 | "mitra lutsko" | c("F19712226", "A18311450") 34 | "PaulSandoz" | "A47431044" 65 | "BeamVision" | "D47531045" As you see the column "logs" includes vectors of strings in each cell. Is there an efficient way to convert the data frame to the long format (one observation per row) without the intermediary step of separating "logs" into several columns? This is important

Spreading a two column data frame with tidyr

阅读更多关于 Spreading a two column data frame with tidyr

I have a data frame that looks like this: a b 1 x 8 2 x 6 3 y 3 4 y 4 5 z 5 6 z 6 and I want to turn it into this: x y z 1 8 3 5 2 6 4 6 But calling library(tidyr) df <- data.frame( a = c("x", "x", "y", "y", "z", "z"), b = c(8, 6, 3, 4, 5, 6) ) df %>% spread(a, b) returns x y z 1 8 NA NA 2 6 NA NA 3 NA 3 NA 4 NA 4 NA 5 NA NA 5 6 NA NA 6 What am I doing wrong? While I'm aware you're after tidyr , base has a solution in this case: unstack(df, b~a) It's also a little bit faster: Unit: microseconds expr min lq mean median uq max neval df %>% spread(a, b) 657.699 679.508 717.7725 690.484 724.9795

Comparing gather (tidyr) to melt (reshape2)

阅读更多关于 Comparing gather (tidyr) to melt (reshape2)

问题 I love the reshape2 package because it made life so doggone easy. Typically Hadley has made improvements in his previous packages that enable streamlined, faster running code. I figured I'd give tidyr a whirl and from what I read I thought gather was very similar to melt from reshape2 . But after reading the documentation I can't get gather to do the same task that melt does. Data View Here's a view of the data (actual data in dput form at end of post): teacher yr1.baseline pd yr1.lesson1 yr1

R: Pivoting using 'spread' function

阅读更多关于 R: Pivoting using 'spread' function

问题 Continuing from my previous post, I am now having 1 more column of ID values that I need to use to pivot rows into columns. NUM <- c(1,2,3,1,2,3,1,2,3,1) ID <- c("DJ45","DJ45","DJ45","DJ46","DJ46","DJ46","DJ47","DJ47","DJ47","DJ48") Type <- c("A", "F", "C", "B", "D", "A", "E", "C", "F", "D") Points <- c(9.2,60.8,22.9,1012.7,18.7,11.1,67.2,63.1,16.7,58.4) df1 <- data.frame(ID,NUM,Type,Points) df1: +------+-----+------+--------+ | ID | Num | Type | Points | +------+-----+------+--------+ | DJ45

Separate a column into multiple columns using tidyr::separate with sep=“”

阅读更多关于 Separate a column into multiple columns using tidyr::separate with sep=“”

问题 df <- data.frame(category = c("X", "Y"), sequence = c("AAT.G", "CCG-T"), stringsAsFactors = FALSE) df category sequence 1 X AAT.G 2 Y CCG-T I want to separate the column sequence into 5 columns (one for each character). I tried to do that with tidyr::separate but it internally uses stringi::stri_split_regex which doesn't accept an empty string as a separator (although the sep argument should take a regex). library(tidyr) separate(df, sequence, into = paste0("V", 1:5), sep="") Error: Values

How to use the spread function properly in tidyr

阅读更多关于 How to use the spread function properly in tidyr

问题 How do I change the following table from: Type Name Answer n TypeA Apple Yes 5 TypeA Apple No 10 TypeA Apple DK 8 TypeA Apple NA 20 TypeA Orange Yes 6 TypeA Orange No 11 TypeA Orange DK 8 TypeA Orange NA 23 Change to: Type Name Yes No DK NA TypeA Apple 5 10 8 20 TypeA Orange 6 11 8 23 I used the following codes to get the first table. df_1 <- df %>% group_by(Type, Name, Answer) %>% tally() Then I tried to use the spread command to get to the 2nd table, but I got the following error message:

Spread with duplicate identifiers (using tidyverse and %>%) [duplicate]

阅读更多关于 Spread with duplicate identifiers (using tidyverse and %>%) [duplicate]

This question already has an answer here: Reshaping data in R with “login” “logout” times 5 answers My data looks like this: I am trying to make it look like this: I would like to do this in tidyverse using %>%-chaining. df <- structure(list(id = c(2L, 2L, 4L, 5L, 5L, 5L, 5L), start_end = structure(c(2L, 1L, 2L, 2L, 1L, 2L, 1L), .Label = c("end", "start"), class = "factor"), date = structure(c(6L, 7L, 3L, 8L, 9L, 10L, 11L), .Label = c("1979-01-03", "1979-06-21", "1979-07-18", "1989-09-12", "1991-01-04", "1994-05-01", "1996-11-04", "2005-02-01", "2009-09-17", "2010-10-01", "2012-10-06" ), class

Add NAs to make all list elements equal length

阅读更多关于 Add NAs to make all list elements equal length

I'm doing a series of things in dplyr , tidyr , so would like to keep with a piped solution if possible. I have a list with uneven numbers of elements in each component: lolz <- list(a = c(2,4,5,2,3), b = c(3,3,2), c=c(1,1,2,4,5,3,3), d=c(1,2,3,1), e=c(5,4,2,2)) lolz $a [1] 2 4 5 2 3 $b [1] 3 3 2 $c [1] 1 1 2 4 5 3 3 $d [1] 1 2 3 1 $e [1] 5 4 2 2 I am wondering if there's a neat one liner to fill up each element with NAs such that they all are of the same length as the element with the maximum items: I have a 2 liner: lolz %>% lapply(length) %>% unlist %>% max -> mymax lolz %>% lapply(function

Proper idiom for adding zero count rows in tidyr/dplyr

阅读更多关于 Proper idiom for adding zero count rows in tidyr/dplyr

Suppose I have some count data that looks like this: library(tidyr) library(dplyr) X.raw <- data.frame( x = as.factor(c("A", "A", "A", "B", "B", "B")), y = as.factor(c("i", "ii", "ii", "i", "i", "i")), z = 1:6) X.raw # x y z # 1 A i 1 # 2 A ii 2 # 3 A ii 3 # 4 B i 4 # 5 B i 5 # 6 B i 6 I'd like to tidy and summarise like this: X.tidy <- X.raw %>% group_by(x,y) %>% summarise(count=sum(z)) X.tidy # Source: local data frame [3 x 3] # Groups: x # # x y count # 1 A i 1 # 2 A ii 5 # 3 B i 15 I know that for x=="B" and y=="ii" we have observed count of zero, rather than a missing value. i.e. the

Complete dataframe with missing combinations of values

阅读更多关于 Complete dataframe with missing combinations of values

I have a simple question, which I can't figure out. I have a dataframe with two factors ( distance ) and years ( years ). I would like to complete all years values for every factor by 0. i.e. from this: distance years area 1 NPR 3 10 2 NPR 4 20 3 NPR 7 30 4 100 1 40 5 100 5 50 6 100 6 60 get this: distance years area 1 NPR 1 0 2 NPR 2 0 3 NPR 3 10 4 NPR 4 20 5 NPR 5 0 6 NPR 6 0 7 NPR 7 30 8 100 1 40 9 100 2 0 10 100 3 0 11 100 4 0 12 100 5 50 13 100 6 60 14 100 7 0 I tried to apply expand() function: library(tidyr) library(dplyr, warn.conflicts = FALSE) expand(df, years = 1:7) but this just