tidyr | 易学教程

dplyr pivot table

阅读更多关于 dplyr pivot table

I want to obtain a pivot table with descending value. library(dplyr) library(tidyr) h<-mtcars %>% group_by(cyl, gear) %>% tally() %>% spread(gear, n, fill = 0) h<-h%>% add_rownames("index") i<-mtcars %>% group_by(cyl, gear) %>% tally() %>% spread(cyl, n, fill = 0) To obtain the sum of the values j<-i%>% select(-1)%>% summarise_each(funs(sum)) k<-t(j) k<- as.data.frame(k) k<-tbl_df(k) k<-k%>%add_rownames("index") l<-left_join(h,k,by="index") l<-l%>% select(-1)%>% arrange(desc(V1)) Is there another way to do the same in dplyr? We group by 'cyl', 'gear', get the frequency count ( tally() ),

How to spread tbl_dbi and tbl_sql data without downloading to local memory

阅读更多关于 How to spread tbl_dbi and tbl_sql data without downloading to local memory

问题 I am working with large datasets and tidyr's spread usually gives me error messages suggesting failure to obtain memory to perform the operation. Therefore, I have been exploring dbplyr. However, as it says here, and also shown below, dbplyr::spread() does not work. My question here is whether there is another way to accomplish what tidyr::spread does while working with tbl_dbi and tbl_sql data without downloading to local memory. Using sample data from here, below I present what I get and

tidyr spread does not aggregate data

阅读更多关于 tidyr spread does not aggregate data

问题 I have data of the following: > data <- data.frame(unique=1:9, grouping=rep(c('a', 'b', 'c'), each=3), value=sample(1:30, 9)) > data unique grouping value 1 1 a 15 2 2 a 21 3 3 a 26 4 4 b 8 5 5 b 6 6 6 b 4 7 7 c 17 8 8 c 1 9 9 c 3 I would like to create a table that looks like this: a b c 1 15 8 17 2 21 6 1 3 26 6 3 I am using tidyr::spread and not getting the correct result: > data %>% spread(grouping, value) unique a b c 1 1 15 NA NA 2 2 21 NA NA 3 3 26 NA NA 4 4 NA 8 NA 5 5 NA 6 NA 6 6 NA

Separate variable in field by character

阅读更多关于 Separate variable in field by character

I recently asked this question Separate contents of field And got a very quick and very simple answer. Something I can do simply in Excel is look in a cell, find the first instance of a character and then return all the characters to the left of that. For example Author Drijgers RL, Verhey FR, Leentjens AF, Kahler S, Aalten P. I can extract Drijgers RL and Aalten P into separate columns in excel. This lets me count the number of times someone is a first author and also the last author. How can I replicate this in R? I can count the total number of times an author has a publication from the

How can tidyr spread function take variable as a select column

阅读更多关于 How can tidyr spread function take variable as a select column

问题 tidyr's spread function only takes column names without quotes. Is there a way I can pass in a variable that contains the column name for eg # example using gather() library("tidyr") dummy.data <- data.frame("a" = letters[1:25], "B" = LETTERS[1:5], "x" = c(1:25)) dummy.data var = "x" dummy.data %>% gather(key, value, var) This gives an error Error: All select() inputs must resolve to integer column positions. The following do not: * var Which is solved using match function which gives the

Concatenating all rows within a group using dplyr

阅读更多关于 Concatenating all rows within a group using dplyr

Suppose I have a dataframe like this: hand_id card_id card_name card_class A 1 p alpha A 2 q beta A 3 r theta B 2 q beta B 3 r theta B 4 s gamma C 1 p alpha C 2 q beta I would like to concatenate the card_id, card_name, and card_class into one single row per hand level A, B, C. So the result would look something like this: hand_id combo_1 combo_2 combo_3 A 1-2-3 p-q-r alpha-beta-theta B 2-3-4 q-r-s beta-theta-gamma .... I attempted to do this using group_by and mutate, but I can't seem to get it to work data <- read_csv('data.csv') byHand <- group_by(data, hand_id) %>% mutate(combo_1 = paste

How to ungroup list columns in data.table?

阅读更多关于 How to ungroup list columns in data.table?

tidyr provides the unnest function that help expanding list columns. This is similar to the much (20x) faster ungroup function in kdb. I am looking for a similar (but much faster) function that, assuming a data.table that contains several list columns, each with the same number of element on each row, would expand the data.table. This an extension of this post . library(data.table) library(tidyr) t = Sys.time() DT = data.table(a=c(1,2,3), b=c('q','w','e'), c=list(rep(t,2),rep(t+1,3),rep(t,0)), d=list(rep(1,2),rep(20,3),rep(1,0))) print(DT) a b c d 1: 1 q 2016-01-09 09:55:14,2016-01-09 09:55:14

Separate string after last underscore

阅读更多关于 Separate string after last underscore

This is indeed a duplicate for this question r-split-string-using-tidyrseparate , but I cannot use the MWE for my purpose, because I do not know how to adjust the regular Expression. I basically want the same thing, but split the variable after the last underscore. Reason: I have data where some columns show up several times for the same factor/type. I figured I can melt the data separate the value variable before the type string and spread it out again to a wide format with less columns. My Problem is that my variable names have different several underscores and I would like to learn how to

conditional string splitting in R (using tidyr)

阅读更多关于 conditional string splitting in R (using tidyr)

I have a data frame like this: X <- data.frame(value = c(1,2,3,4), variable = c("cost", "cost", "reed_cost", "reed_cost")) I'd like to split the variable column into two; one column to indicate if the variable is a 'cost' and another column to indicate whether or not the variable is "reed". I cannot seem to figure out the right regex for the split (e.g. using tidyr) If my data were something nicer, say: Y <- data.frame(value = c(1,2,3,4), variable = c("adjusted_cost", "adjusted_cost", "reed_cost", "reed_cost")) Then this is trivial with tidyr: separate(Y, variable, c("Type", "Model"), "_") and

tidyr - unique way to get combinations (using tidyverse only)

阅读更多关于 tidyr - unique way to get combinations (using tidyverse only)

I wanted to get all unique pairwise combinations of a unique string column of a dataframe using the tidyverse (ideally). Here is a dummy example: library(tidyverse) a <- letters[1:3] %>% tibble::as_tibble() a #> # A tibble: 3 x 1 #> value #> <chr> #> 1 a #> 2 b #> 3 c tidyr::crossing(a, a) %>% magrittr::set_colnames(c("words1", "words2")) #> # A tibble: 9 x 2 #> words1 words2 #> <chr> <chr> #> 1 a a #> 2 a b #> 3 a c #> 4 b a #> 5 b b #> 6 b c #> 7 c a #> 8 c b #> 9 c c Is there a way to remove 'duplicate' combinations here. That is have the output be the following in this example: # A tibble: