tidyr

How to efficiently mutate multiple columns of a large dataframe

烈酒焚心 提交于 2019-12-23 02:36:39
问题 I would appreciate any help to efficiently apply my function to multiple columns of my large data frame DT_large . My function works well and efficiently when I apply it with dplyr::mutate_at() to a small data frame DT_small . However, when applied to a relatively large dataset DT_large , available here, it takes dplyr::mutate_at() several hours to deliver the desired output. It might be that there is some mistake in my code that is making dplyr::mutate_at() less efficient with my relatively

dplyr pivot table

北城余情 提交于 2019-12-22 18:07:49
问题 I want to obtain a pivot table with descending value. library(dplyr) library(tidyr) h<-mtcars %>% group_by(cyl, gear) %>% tally() %>% spread(gear, n, fill = 0) h<-h%>% add_rownames("index") i<-mtcars %>% group_by(cyl, gear) %>% tally() %>% spread(cyl, n, fill = 0) To obtain the sum of the values j<-i%>% select(-1)%>% summarise_each(funs(sum)) k<-t(j) k<- as.data.frame(k) k<-tbl_df(k) k<-k%>%add_rownames("index") l<-left_join(h,k,by="index") l<-l%>% select(-1)%>% arrange(desc(V1)) Is there

Collapse rows from 0 to 0

痴心易碎 提交于 2019-12-22 11:29:19
问题 For a dataset like this Incident.ID.. date product INCFI0000029582 2014-09-25 08:39:45 foo INCFI0000029582 2014-09-25 08:39:48 bar INCFI0000029582 2014-09-25 08:40:44 foo INCFI0000029582 2014-10-10 23:04:00 foo INCFI0000029587 2014-09-25 08:33:32 bar INCFI0000029587 2014-09-25 08:34:41 bar INCFI0000029587 2014-09-25 08:35:24 bar INCFI0000029587 2014-10-10 23:04:00 foo df <- structure(list(Incident.ID.. = c("INCFI0000029582", "INCFI0000029582", "INCFI0000029582", "INCFI0000029582",

tidyr - unique way to get combinations (using tidyverse only)

半世苍凉 提交于 2019-12-22 11:20:38
问题 I wanted to get all unique pairwise combinations of a unique string column of a dataframe using the tidyverse (ideally). Here is a dummy example: library(tidyverse) a <- letters[1:3] %>% tibble::as_tibble() a #> # A tibble: 3 x 1 #> value #> <chr> #> 1 a #> 2 b #> 3 c tidyr::crossing(a, a) %>% magrittr::set_colnames(c("words1", "words2")) #> # A tibble: 9 x 2 #> words1 words2 #> <chr> <chr> #> 1 a a #> 2 a b #> 3 a c #> 4 b a #> 5 b b #> 6 b c #> 7 c a #> 8 c b #> 9 c c Is there a way to

R: How to spread, group_by, summarise and mutate at the same time

泄露秘密 提交于 2019-12-22 06:56:00
问题 I want to spread this data below (first 12 rows shown here only) by the column 'Year', returning the sum of 'Orders' grouped by 'CountryName'. Then calculate the % change in 'Orders' for each 'CountryName' from 2014 to 2015. CountryName Days pCountry Revenue Orders Year United Kingdom 0-1 days India 2604.799 13 2014 Norway 8-14 days Australia 5631.123 9 2015 US 31-45 days UAE 970.8324 2 2014 United Kingdom 4-7 days Austria 94.3814 1 2015 Norway 8-14 days Slovenia 939.8392 3 2014 South Korea

How to control new variables' names after tidyr's spread?

寵の児 提交于 2019-12-22 01:39:50
问题 I have a dataframe with panel structure: 2 observations for each unit from two years: library(tidyr) mydf <- data.frame( id = rep(1:3, rep(2,3)), year = rep(c(2012, 2013), 3), value = runif(6) ) mydf # id year value #1 1 2012 0.09668064 #2 1 2013 0.62739399 #3 2 2012 0.45618433 #4 2 2013 0.60347152 #5 3 2012 0.84537624 #6 3 2013 0.33466030 I would like to reshape this data to wide format which can be done easily with tidyr::spread . However, as the values of the year variable are numbers, the

How to add metadata to a tibble

余生颓废 提交于 2019-12-21 19:24:40
问题 How does one add metadata to a tibble? I would like a sentence describing each of my variable names such that I could print out the tibble with the associated metadata and if I handed it to someone who hadn't seen the data before, they could make some sense of it. as_tibble(iris) # A tibble: 150 × 5 Sepal.Length Sepal.Width Petal.Length Petal.Width Species <dbl> <dbl> <dbl> <dbl> <fctr> 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3

Double nesting in the tidyverse

China☆狼群 提交于 2019-12-21 05:17:13
问题 Using the examples from Wickhams introduction to purrr in R for data science, I am trying to create a double nested list. library(gapminder) library(purrr) library(tidyr) gapminder nest_data <- gapminder %>% group_by(continent) %>% nest(.key = by_continent) How can I further nest the countries so that nest_data contains by_continent and a new level of nesting by_contry that ultimately includes the tibble by_year? Furthermore, after creating this datastructure for the gapminder data - how

Why doesn't gather() use the key variable name?

喜你入骨 提交于 2019-12-20 06:39:59
问题 It's shameful, but I still can't wrap my mind fully around tidyr , specifically gather() . I feel like I'm missing something fundamental. If I run this tiny snippet of code library(tidyr) x <- data.frame(var1=letters[1:3], var2=LETTERS[7:9], var3=21:23) gather(x, foo, value) I get > x var1 var2 var3 1 a G 21 2 b H 22 3 c I 23 > gather(x, foo, value) variable value 1 var1 a 2 var1 b 3 var1 c 4 var2 G 5 var2 H 6 var2 I 7 var3 21 8 var3 22 9 var3 23 Where does foo get used? Is this completely

How to get group-level statistics while preserving the original dataframe?

给你一囗甜甜゛ 提交于 2019-12-20 06:37:06
问题 I have the following dataframe one <- c('one',NA,NA,NA,NA,'two',NA,NA) group1 <- c('A','A','A','A','B','B','B','B') group2 <- c('C','C','C','D','E','E','F','F') df = data.frame(one, group1,group2) > df one group1 group2 1 one A C 2 <NA> A C 3 <NA> A C 4 <NA> A D 5 <NA> B E 6 two B E 7 <NA> B F 8 <NA> B F I want to get the count of non-missing observations of one for each combination of group1 and group2 . In Pandas, I would use groupby(['group1','group2']).transform , but how can I do that in