tidyr | 易学教程

Error using tidyeval quo() with gather()

阅读更多关于 Error using tidyeval quo() with gather()

问题 I am trying to run gather() on data frames, and programmatically assign the .key column name using !!quo(). But I keep getting 'Error: Invalid column specification'. I even found a closed ticket where it shows that it should work (https://github.com/tidyverse/tidyr/issues/293). I'm going to go back to using rename_() as a workaround, but it would be nice to use the more elegant NSE. library('tidyverse') data(mtcars) my_var <- 'my_col_name' The following works, but is a one-trick pony > mtcars

tidyr gather: simultaneously gather and rename key?

阅读更多关于 tidyr gather: simultaneously gather and rename key?

问题 Suppose I have the following data frame: > a <- data_frame(my_type_1_num_widgets = c(1, 2, 3), my_type_2_num_widgets = c(4, 5, 6)) > a Source: local data frame [3 x 2] my_type_1_num_widgets my_type_2_num_widgets 1 1 4 2 2 5 3 3 6 I want to do two things: gather the "num_widgets" columns. rename the resulting keys to remove the "num_widgets" suffix. The way I'm doing this currently, and the correct/desired output that I'm getting: > a %>% rename(my_type_1 = my_type_1_num_widgets, my_type_2 =

how to use gather_ in tidyr with variables

阅读更多关于 how to use gather_ in tidyr with variables

问题 I'm using tidyr together with shiny and hence needs to utilize dynamic values in tidyr operations. However I do have trouble using the gather_(), which I think was designed for such case. Minimal example below: library(tidyr) df <- data.frame(name=letters[1:5],v1=1:5,v2=10:14,v3=7:11,stringsAsFactors=FALSE) #works fine df %>% gather(Measure,Qty,v1:v3) dyn_1 <- 'Measure' dyn_2 <- 'Qty' dyn_err <- 'v1:v3' dyn_err_1 <- 'v1' dyn_err_2 <- 'v2' #error df %>% gather_(dyn_1,dyn_2,dyn_err) #error df %

How to expand.grid on vectors sets rather than single elements

阅读更多关于 How to expand.grid on vectors sets rather than single elements

问题 So, I have the following four vectors a1 <- c(11, 12) a2 <- c(21, 22) b1 <- c(31, 32) b2 <- c(41, 42) and what I want to have in the end is a data frame that looks like p1 p2 p3 p4 1 11 12 31 32 2 21 22 31 32 3 11 12 41 42 4 21 22 41 42 i.e. every possible combination of the two a vectors with the two b vectors. What I did is ab <- expand.grid(list(a1, a2), list(b1,b2)) ab.new <- ab %>% separate(Var1, c("p1", "p2"), sep = ",") %>% separate(Var2, c("p3", "p4"), sep = ",") and what I end up

Reformat data from long to wide

阅读更多关于 Reformat data from long to wide

问题 How can I reformat this data to a wide format? species val price setosa 5.1 3 setosa 4.9 3 setosa 4.7 3 setosa 4.6 2 Desired output: species val1 val2 val3 val4 price1 price2 price3 price4 setosa 5.1 4.9 4.7 4.6 3 3 3 2 I have tried spread from tidyr but without success. 回答1: data.table v 1.9.6+ allows you to pass more than one value.vars , so you can simply do library(data.table) dcast(setDT(df), species ~ val + price, value.var = c("val", "price")) # species val.1_4.6_2 val.1_4.7_3 val.1_4

Conditional Cross tabulation in R

阅读更多关于 Conditional Cross tabulation in R

问题 Looking for the quickest way to achieve below task using "expss" package. With a great package of "expss", we can easily do cross tabulation (which has other advantage and useful functions for cross-tabulations.), we can cross-tabulate multiple variables easily like below. #install.packages("expss") library("expss") data(mtcars) var1 <- "vs, am, gear, carb" var_names = trimws(unlist(strsplit(var1, split = ","))) mtcars %>% tab_prepend_values %>% tab_cols(total(), ..[(var_names)]) %>% tab

Filling “implied missing values” in a data frame that has varying observations per time unit

阅读更多关于 Filling “implied missing values” in a data frame that has varying observations per time unit

问题 I have a large dataset with spatiotemporal data. Each set of coordinates are associated with an id (player id in a computer game). Unfortunately the coordinates for each id aren't logged at every time unit. If a reading is not available for a specific id at x time stamp, then that row was entirely omitted from the dataset rather than logged as NA. I would like to have the same exact amount of observations per time unit as there are unique ids (i.e. inserting "implied missing NAs"). On time

exclusions with '-' when using string versions (underscore suffix such as gather_()) of dplyr/tidyr functions

阅读更多关于 exclusions with '-' when using string versions (underscore suffix such as gather_()) of dplyr/tidyr functions

问题 Normally with dplyr/tidyr, I can achieve exclusions with negation ... %>% gather(x, -y) However, currently, I want some variables to be specified programmatically and be an exclusion, so ideally ... %>% gather_(xVar, -yVar) where xVar and yVar are character variables (say, with values 'x' and 'y'). Are exclusions simply disallowed with the string versions of functions, or is there a way to do them? Both of the obvious culprits -yVar and paste0('-', yVar) seem to produce errors. 回答1: I had the

How to compute the mean in different categories using left_join and nest in R?

阅读更多关于 How to compute the mean in different categories using left_join and nest in R?

问题 I'm trying to compute the mean values for binned data using left_join and nest . bin.size = 100 First dataframe: df = data.frame(x =c(300,400), y = c("sca1","sca2")) x y 1 300 sca1 2 400 sca2 Second dataframe: df2 = data.frame(snp = c(1,2,10,100,1,2,14,16,399), sca = c("sca1","sca1","sca1","sca1","sca2","sca2","sca2","sca2","sca2")) snp r2 sca 1 1 0.70 sca1 2 2 0.80 sca1 3 10 0.70 sca1 4 100 0.10 sca1 5 1 0.90 sca2 6 2 0.98 sca2 7 14 0.80 sca2 8 16 0.80 sca2 9 399 0.01 sca2 Code from @r2evans

Spread multiple columns [tidyr]

阅读更多关于 Spread multiple columns [tidyr]

问题 I would like to spread data over multiple columns using tidyr . dat <- data.frame(ID = rep(1,10), col1 = LETTERS[seq(1,10)], col2 = c(letters[seq(1,8)],NA,NA), col3 = c(rep(NA,8),"5",NA), col4 = c(rep(NA,8),NA,"value")) The expected outcome is: Out <- data.frame(t(c(1,letters[seq(1,8)],"5","value")),row.names=NULL) colnames(Out) <- c("ID",LETTERS[seq(1,10)]) I came up with: a <- dat %>% gather(variable, value, -(ID:col1)) %>% unite(temp, col1, variable) %>% spread(temp, value) a[,-which(is.na