tidyr | 易学教程

Using gather from tidyr changes my regression results

阅读更多关于 Using gather from tidyr changes my regression results

When I run the code below, everything works as expected # install.packages("dynlm") # install.packages("tidyr") require(dynlm) require(tidyr) Time <- 1950:1993 Y <- c(5820, 5843, 5917, 6054, 6099, 6365, 6440, 6465, 6449, 6658, 6698, 6740, 6931, 7089, 7384, 7703, 8005, 8163, 8506, 8737, 8842, 9022, 9425, 9752, 9602, 9711, 10121, 10425, 10744, 10876, 10746, 10770, 10782, 11179, 11617, 12015, 12336, 12568, 12903, 13029, 13093, 12899, 13110, 13391) X <- c(6284, 6390, 6476, 6640, 6628, 6879, 7080, 7114, 7113, 7256, 7264, 7382, 7583, 7718, 8140, 8508, 8822, 9114, 9399, 9606, 9875, 10111, 10414,

How to spread tbl_dbi and tbl_sql data without downloading to local memory

阅读更多关于 How to spread tbl_dbi and tbl_sql data without downloading to local memory

I am working with large datasets and tidyr's spread usually gives me error messages suggesting failure to obtain memory to perform the operation. Therefore, I have been exploring dbplyr . However, as it says here , and also shown below, dbplyr::spread() does not work. My question here is whether there is another way to accomplish what tidyr::spread does while working with tbl_dbi and tbl_sql data without downloading to local memory. Using sample data from here , below I present what I get and what I would like to do and get. #sample tbl_dbi and tbl_sql data df_sample <- tribble(~group1,

tidyr spread does not aggregate data

阅读更多关于 tidyr spread does not aggregate data

I have data of the following: > data <- data.frame(unique=1:9, grouping=rep(c('a', 'b', 'c'), each=3), value=sample(1:30, 9)) > data unique grouping value 1 1 a 15 2 2 a 21 3 3 a 26 4 4 b 8 5 5 b 6 6 6 b 4 7 7 c 17 8 8 c 1 9 9 c 3 I would like to create a table that looks like this: a b c 1 15 8 17 2 21 6 1 3 26 6 3 I am using tidyr::spread and not getting the correct result: > data %>% spread(grouping, value) unique a b c 1 1 15 NA NA 2 2 21 NA NA 3 3 26 NA NA 4 4 NA 8 NA 5 5 NA 6 NA 6 6 NA 4 NA 7 7 NA NA 17 8 8 NA NA 1 9 9 NA NA 3 Or > data %>% select(grouping, value) %>% spread(grouping,

Long to wide data with tidyR?

阅读更多关于 Long to wide data with tidyR?

问题 I have data that looks something like this df = data.frame(name=c("A","A","B","B"), group=c("g1","g2","g1","g2"), V1=c(10,40,20,30), V2=c(6,3,1,7)) I want to reshape it to look like this: df = data.frame(name=c("A", "B"), V1.g1=c(10,20), V1.g2=c(40,30), V2.g1=c(6,1), V2.g2=c(3,7)) Is it possible to do it with tidyR? I can do it with reshape reshape(df, idvar='name', timevar='group', direction='wide') but is always good to learn something new. 回答1: Since tidyr 1.0.0 you can do the following:

How can tidyr spread function take variable as a select column

阅读更多关于 How can tidyr spread function take variable as a select column

tidyr's spread function only takes column names without quotes. Is there a way I can pass in a variable that contains the column name for eg # example using gather() library("tidyr") dummy.data <- data.frame("a" = letters[1:25], "B" = LETTERS[1:5], "x" = c(1:25)) dummy.data var = "x" dummy.data %>% gather(key, value, var) This gives an error Error: All select() inputs must resolve to integer column positions. The following do not: * var Which is solved using match function which gives the required column position dummy.data %>% gather(key, value, match(var, names(.))) But this same approach

Using spread with duplicate identifiers for rows giving error

阅读更多关于 Using spread with duplicate identifiers for rows giving error

问题 My data looks like this: df <- read.table(header = T, text = "GeneID Gene_Name Species Paralogues Domains Functional_Diversity 1234 DDR1 hsapiens 14 2 8.597482 5678 CSNK1E celegans 70 4 8.154788 9104 FGF1 Chicken 3 0 5.455874 4575 FGF1 hsapiens 4 6 6.745845") I need it to look like: Gene_Name hsapiens celegans ggalus DDR1 8.597482 NA NA CSNK1E NA 8.154788 NA FGF1 6.745845 NA 5.455874 I've tried using: library(tidyverse) df %>% select(Gene_Name, Species, Functional_Diversity) %>% spread

Matching values between data frames based on overlapping dates

阅读更多关于 Matching values between data frames based on overlapping dates

问题 I am currently dealing with the following data structures: Attributes df: ID Begin_A End_A Interval Value 1 5 1990-03-01 2017-03-10 1990-03-01 UTC--2017-03-10 UTC Cat1 2 10 1993-12-01 2017-12-02 1993-12-01 UTC--2017-12-02 UTC Cat2 3 5 1991-03-01 2017-03-03 1991-03-01 UTC--2017-03-03 UTC Cat3 4 10 1995-12-05 2017-12-10 1995-12-05 UTC--2017-12-10 UTC Cat4 Bookings df: ID Begin_A End_A Interval 1 5 2017-03-03 2017-03-05 2017-03-03 UTC--2017-03-05 UTC 2 6 2017-05-03 2017-05-05 2017-05-03 UTC-

How do I remove NAs with the tidyr::unite function?

阅读更多关于 How do I remove NAs with the tidyr::unite function?

问题 After combining several columns with tidyr::unite() , NAs from missing data remain in my character vector, which I do not want. I have a series of medical diagnoses per row (1 per column) and would like to benchmark searching for a series of codes via. %in% and grepl() . There is an open issue on Github on this problem, is there any movement - or work arounds? I would like to keep the vector comma-separated. Here is a representative example: library(dplyr) library(tidyr) df <- data_frame(a =

Grouping linked unique ID pairs using R [duplicate]

阅读更多关于 Grouping linked unique ID pairs using R [duplicate]

问题 This question already has an answer here : Make a group_indices based on several columns (1 answer) Closed 10 months ago . I'm trying to link together pairs of unique IDs using R. Given the example below, I have two IDs (here ID1 and ID2) that indicate linkage. I'm trying to create groups of rows that are linked. In this example A is linked to B which is linked to D which is linked to E. Because these are all connected, I want to group them together. Next, there is also X which is linked to

Separate contents of field

阅读更多关于 Separate contents of field

问题 I'm sure this is very simple, and I think it's a case of using separate and gather. I have a single field in a dataframe, authorlist,an edited export of a pubmed search. It contains the authors of the publications. It can, obviously, contain either a single author or a collaboration of authors. For example this is just a selection of the options available: Author Drijgers RL, Verhey FR, Leentjens AF, Kahler S, Aalten P. What I'd like to do is create a single list of ALL authors so that I'd