tidyr

data.table equivalent of tidyr::complete()

僤鯓⒐⒋嵵緔 提交于 2019-11-26 23:07:58
问题 tidyr::complete() adds rows to a data.frame for combinations of column values that are missing from the data. Example: library(dplyr) library(tidyr) df <- data.frame(person = c(1,2,2), observation_id = c(1,1,2), value = c(1,1,1)) df %>% tidyr::complete(person, observation_id, fill = list(value=0)) yields # A tibble: 4 × 3 person observation_id value <dbl> <dbl> <dbl> 1 1 1 1 2 1 2 0 3 2 1 1 4 2 2 1 where the value of the combination person == 1 and observation_id == 2 that is missing in df

How to use tidyr::separate when the number of needed variables is unknown [duplicate]

本小妞迷上赌 提交于 2019-11-26 22:18:24
问题 This question already has an answer here: Splitting a dataframe string column into multiple different columns [duplicate] 4 answers I've got a dataset that consists of email communication. An example: library(dplyr) library(tidyr) dat <- data_frame('date' = Sys.time(), 'from' = c("person1@gmail.com", "person2@yahoo.com", "person3@hotmail.com", "person4@msn.com"), 'to' = c("person2@yahoo.com,person3@hotmail.com", "person3@hotmail.com", "person4@msn.com,person1@gmail.com,person2@yahoo.com",

Tidy data.frame with repeated column names

倾然丶 夕夏残阳落幕 提交于 2019-11-26 22:01:22
问题 I have a program that gives me data in this format toy file_path Condition Trial.Num A B C ID A B C ID A B C ID 1 root/some.extension Baseline 1 2 3 5 car 2 1 7 bike 4 9 0 plane 2 root/thing.extension Baseline 2 3 6 45 car 5 4 4 bike 9 5 4 plane 3 root/else.extension Baseline 3 4 4 6 car 7 5 4 bike 68 7 56 plane 4 root/uniquely.extension Treatment 1 5 3 7 car 1 7 37 bike 9 8 7 plane 5 root/defined.extension Treatment 2 6 7 3 car 4 6 8 bike 9 0 8 plane My goal is to tidy the format into

Transposing data frames

痴心易碎 提交于 2019-11-26 20:42:42
问题 Happy Weekends. I've been trying to replicate the results from this blog post in R. I am looking for a method of transposing the data without using t , preferably using tidyr or reshape . In example below, metadata is obtained by transposing data . metadata <- data.frame(colnames(data), t(data[1:4, ]) ) colnames(metadata) <- t(metadata[1,]) metadata <- metadata[-1,] metadata$Multiplier <- as.numeric(metadata$Multiplier) Though it achieves what I want, I find it little unskillful. Is there any

adding default values to item x group pairs that don't have a value (df %>% spread %>% gather seems strange)

做~自己de王妃 提交于 2019-11-26 18:36:26
问题 Short version How to do the operation df1 %>% spread(groupid, value, fill = 0) %>% gather(groupid, value, one, two) in a more natural way? Long version Given a data frame df1 <- data.frame(groupid = c("one","one","one","two","two","two", "one"), value = c(3,2,1,2,3,1,22), itemid = c(1:6, 6)) for many itemid and groupid pairs we have a value, for some itemids there are groupids where there is no value. I want to add a default value for those cases. E.g. for the itemid 1 and groupid "two" there

Split or separate uneven/unequal strings with no delimiter

旧时模样 提交于 2019-11-26 18:35:49
问题 Given the dataframe df : x <- c("X1", "X2", "X3", "X4", "X5") y <- c("00L0", "0", "00012L", "0123L0", "0D0") df <- data.frame(x, y) How can I leverage tidyr::separate to put each character of the y strings into a separate column (one column per string position)? Desired output: x <- c("X1", "X2", "X3", "X4", "X5") m1 <- c(0, 0, 0, 0, 0) m2 <- c(0, NA, 0, 1, "D") m3 <- c("L", NA, 0, 2, 0) mN <- c(NA, NA, NA, NA, NA) df <- data.frame(x, m1, m2, m3, mN) Where mN could theoretically go up to m100

Using tidyr spread function to create columns with binary value

馋奶兔 提交于 2019-11-26 17:24:30
问题 I am aware of spread function in tidyr package but this is something I am unable to achieve. I have a data.frame with 2 columns as defined below. I need to transpose the column Subject into binary columns with 1 and 0. Below is the data.frame studentInfo <- data.frame(StudentID = c(1,1,1,2,3,3), Subject = c("Maths", "Science", "English", "Maths", "History", "History")) > studentInfo StudentID Subject 1 1 Maths 2 1 Science 3 1 English 4 2 Maths 5 3 History 6 3 History And the output I am

Removing NA observations with dplyr::filter()

我怕爱的太早我们不能终老 提交于 2019-11-26 16:58:31
问题 My data looks like this: library(tidyverse) df <- tribble( ~a, ~b, ~c, 1, 2, 3, 1, NA, 3, NA, 2, 3 ) I can remove all NA observations with drop_na() : df %>% drop_na() Or remove all NA observations in a single column ( a for example): df %>% drop_na(a) Why can't I just use a regular != filter pipe? df %>% filter(a != NA) Why do we have to use a special function from tidyr to remove NAs? 回答1: For example: you can use: df %>% filter(!is.na(a)) to remove the NA in column a. 回答2: From @Ben Bolker

Spread with data.frame/tibble with duplicate identifiers

落花浮王杯 提交于 2019-11-26 14:37:10
The documentation for tidyr suggests that gather and spread are transitive, but the following example with the "iris" data shows they are not, but it is not clear why. Any clarification would be greatly appreciated iris.df = as.data.frame(iris) long.iris.df = iris.df %>% gather(key = feature.measure, value = size, -Species) w.iris.df = long.iris.df %>% spread(key = feature.measure, value = size, -Species) I expected the data frame "w.iris.df" to be the same as "iris.df" but received the following error instead: "Error: Duplicate identifiers for rows (1, 2, 3, 4, 5, 6, 7, 8, 9..." My general

Using spread with duplicate identifiers for rows

こ雲淡風輕ζ 提交于 2019-11-26 13:44:40
I have a long form dataframe that have multiple entries for same date and person. jj <- data.frame(month=rep(1:3,4), student=rep(c("Amy", "Bob"), each=6), A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5), B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5)) I want to convert it to wide form and make it like this: month Amy.A Bob.A Amy.B Bob.B 1 2 3 1 2 3 1 2 3 1 2 3 My question is very similar to this . I have used the given code in the answer : kk <- jj %>% gather(variable, value, -(month:student)) %>% unite(temp, student, variable) %>% spread(temp, value) but it gives following error: Error: Duplicate