tidyr | 易学教程

data.table equivalent of tidyr::complete()

阅读更多关于 data.table equivalent of tidyr::complete()

问题 tidyr::complete() adds rows to a data.frame for combinations of column values that are missing from the data. Example: library(dplyr) library(tidyr) df <- data.frame(person = c(1,2,2), observation_id = c(1,1,2), value = c(1,1,1)) df %>% tidyr::complete(person, observation_id, fill = list(value=0)) yields # A tibble: 4 × 3 person observation_id value <dbl> <dbl> <dbl> 1 1 1 1 2 1 2 0 3 2 1 1 4 2 2 1 where the value of the combination person == 1 and observation_id == 2 that is missing in df

How to use tidyr::separate when the number of needed variables is unknown [duplicate]

阅读更多关于 How to use tidyr::separate when the number of needed variables is unknown [duplicate]

问题 This question already has an answer here: Splitting a dataframe string column into multiple different columns [duplicate] 4 answers I've got a dataset that consists of email communication. An example: library(dplyr) library(tidyr) dat <- data_frame('date' = Sys.time(), 'from' = c("person1@gmail.com", "person2@yahoo.com", "person3@hotmail.com", "person4@msn.com"), 'to' = c("person2@yahoo.com,person3@hotmail.com", "person3@hotmail.com", "person4@msn.com,person1@gmail.com,person2@yahoo.com",

Tidy data.frame with repeated column names

阅读更多关于 Tidy data.frame with repeated column names

问题 I have a program that gives me data in this format toy file_path Condition Trial.Num A B C ID A B C ID A B C ID 1 root/some.extension Baseline 1 2 3 5 car 2 1 7 bike 4 9 0 plane 2 root/thing.extension Baseline 2 3 6 45 car 5 4 4 bike 9 5 4 plane 3 root/else.extension Baseline 3 4 4 6 car 7 5 4 bike 68 7 56 plane 4 root/uniquely.extension Treatment 1 5 3 7 car 1 7 37 bike 9 8 7 plane 5 root/defined.extension Treatment 2 6 7 3 car 4 6 8 bike 9 0 8 plane My goal is to tidy the format into

Transposing data frames

阅读更多关于 Transposing data frames

问题 Happy Weekends. I've been trying to replicate the results from this blog post in R. I am looking for a method of transposing the data without using t , preferably using tidyr or reshape . In example below, metadata is obtained by transposing data . metadata <- data.frame(colnames(data), t(data[1:4, ]) ) colnames(metadata) <- t(metadata[1,]) metadata <- metadata[-1,] metadata$Multiplier <- as.numeric(metadata$Multiplier) Though it achieves what I want, I find it little unskillful. Is there any

adding default values to item x group pairs that don't have a value (df %>% spread %>% gather seems strange)

阅读更多关于 adding default values to item x group pairs that don't have a value (df %>% spread %>% gather seems strange)

问题 Short version How to do the operation df1 %>% spread(groupid, value, fill = 0) %>% gather(groupid, value, one, two) in a more natural way? Long version Given a data frame df1 <- data.frame(groupid = c("one","one","one","two","two","two", "one"), value = c(3,2,1,2,3,1,22), itemid = c(1:6, 6)) for many itemid and groupid pairs we have a value, for some itemids there are groupids where there is no value. I want to add a default value for those cases. E.g. for the itemid 1 and groupid "two" there

Split or separate uneven/unequal strings with no delimiter

阅读更多关于 Split or separate uneven/unequal strings with no delimiter

问题 Given the dataframe df : x <- c("X1", "X2", "X3", "X4", "X5") y <- c("00L0", "0", "00012L", "0123L0", "0D0") df <- data.frame(x, y) How can I leverage tidyr::separate to put each character of the y strings into a separate column (one column per string position)? Desired output: x <- c("X1", "X2", "X3", "X4", "X5") m1 <- c(0, 0, 0, 0, 0) m2 <- c(0, NA, 0, 1, "D") m3 <- c("L", NA, 0, 2, 0) mN <- c(NA, NA, NA, NA, NA) df <- data.frame(x, m1, m2, m3, mN) Where mN could theoretically go up to m100

Using tidyr spread function to create columns with binary value

阅读更多关于 Using tidyr spread function to create columns with binary value

问题 I am aware of spread function in tidyr package but this is something I am unable to achieve. I have a data.frame with 2 columns as defined below. I need to transpose the column Subject into binary columns with 1 and 0. Below is the data.frame studentInfo <- data.frame(StudentID = c(1,1,1,2,3,3), Subject = c("Maths", "Science", "English", "Maths", "History", "History")) > studentInfo StudentID Subject 1 1 Maths 2 1 Science 3 1 English 4 2 Maths 5 3 History 6 3 History And the output I am

Removing NA observations with dplyr::filter()

阅读更多关于 Removing NA observations with dplyr::filter()

问题 My data looks like this: library(tidyverse) df <- tribble( ~a, ~b, ~c, 1, 2, 3, 1, NA, 3, NA, 2, 3 ) I can remove all NA observations with drop_na() : df %>% drop_na() Or remove all NA observations in a single column ( a for example): df %>% drop_na(a) Why can't I just use a regular != filter pipe? df %>% filter(a != NA) Why do we have to use a special function from tidyr to remove NAs? 回答1: For example: you can use: df %>% filter(!is.na(a)) to remove the NA in column a. 回答2: From @Ben Bolker

Spread with data.frame/tibble with duplicate identifiers

阅读更多关于 Spread with data.frame/tibble with duplicate identifiers

The documentation for tidyr suggests that gather and spread are transitive, but the following example with the "iris" data shows they are not, but it is not clear why. Any clarification would be greatly appreciated iris.df = as.data.frame(iris) long.iris.df = iris.df %>% gather(key = feature.measure, value = size, -Species) w.iris.df = long.iris.df %>% spread(key = feature.measure, value = size, -Species) I expected the data frame "w.iris.df" to be the same as "iris.df" but received the following error instead: "Error: Duplicate identifiers for rows (1, 2, 3, 4, 5, 6, 7, 8, 9..." My general

Using spread with duplicate identifiers for rows

阅读更多关于 Using spread with duplicate identifiers for rows

I have a long form dataframe that have multiple entries for same date and person. jj <- data.frame(month=rep(1:3,4), student=rep(c("Amy", "Bob"), each=6), A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5), B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5)) I want to convert it to wide form and make it like this: month Amy.A Bob.A Amy.B Bob.B 1 2 3 1 2 3 1 2 3 1 2 3 My question is very similar to this . I have used the given code in the answer : kk <- jj %>% gather(variable, value, -(month:student)) %>% unite(temp, student, variable) %>% spread(temp, value) but it gives following error: Error: Duplicate