tidyr | 易学教程

dividing column values in range and aggregate date by month to count frequency of range which fall in that month

阅读更多关于 dividing column values in range and aggregate date by month to count frequency of range which fall in that month

问题 I have a data frame that contains a date column that is in integer type. I also want to divide price in range of 10,000 and then count frequency which falls in that month > df date values price 11/25/18 a 10000 11/30/18 b 30500 12/4/18 a 20000 12/5/18 b 65000 12/5/18 a 50000 12/6/18 b 35000 12/6/18 c 40000 12/6/18 a 45000 12/6/18 a 30000 12/7/18 b 80000 12/7/18 c 85000 12/7/18 a 90000 12/9/18 b 20000 12/12/18 a 32500 12/12/18 c 40200 12/13/18 b 56000 1/9/19 a 82000 1/9/19 c 63000 1/9/19 b

Can you list an exception to tidyselect `everything()`

阅读更多关于 Can you list an exception to tidyselect `everything()`

问题 library(tidyverse) iris %>% as_tibble() %>% select(everything()) #> # A tibble: 150 x 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <fct> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> 4 4.6 3.1 1.5 0.2 setosa #> 5 5 3.6 1.4 0.2 setosa #> 6 5.4 3.9 1.7 0.4 setosa #> 7 4.6 3.4 1.4 0.3 setosa #> 8 5 3.4 1.5 0.2 setosa #> 9 4.4 2.9 1.4 0.2 setosa #> 10 4.9 3.1 1.5 0.1 setosa #> # ... with 140 more rows Say I want

Using spread with duplicate identifiers for rows

阅读更多关于 Using spread with duplicate identifiers for rows

问题 I have a long form dataframe that have multiple entries for same date and person. jj <- data.frame(month=rep(1:3,4), student=rep(c("Amy", "Bob"), each=6), A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5), B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5)) I want to convert it to wide form and make it like this: month Amy.A Bob.A Amy.B Bob.B 1 2 3 1 2 3 1 2 3 1 2 3 My question is very similar to this. I have used the given code in the answer : kk <- jj %>% gather(variable, value, -(month:student)) %>% unite(temp

how can I speed a function in tidyr up

阅读更多关于 how can I speed a function in tidyr up

问题 I have a data like this n <- 1e5 set.seed(24) df1 <- data.frame(query_string = sample(sprintf("%06d", 100:1000), n, replace=TRUE), id.x = sample(1:n), s_val = sample(paste0("F", 400:700), n, replace=TRUE), id.y = sample(100:3000, n, replace=TRUE), ID_col_n = sample(100:1e6, n, replace=TRUE), total_id = 1:n) I use the spread function to assign common strings using the following function library(tidyr) res <- spread(resNik,s_val,value=query_string,fill=NA) This works perfectly but when the data

R / tidyr::complete - filling missing values dynamically

阅读更多关于 R / tidyr::complete - filling missing values dynamically

问题 I'm using tidyr::complete() to include missing rows in a data frame with many columns, leading to NAs values. How can I instruct the fill option to replace the NA values with 0 if I don't have an explicit list of column names? Example: df <- data.frame(year = c(2010, 2013:2015), age.21 = runif(4, 0, 10), age.22 = runif(4, 0, 10), age.23 = runif(4, 0, 10), age.24 = runif(4, 0, 10), age.25 = runif(4, 0, 10)) # replaces missing values with NA - not what I want df.complete <- complete(df, year =

How to renumber result of intersection/group_indices in R?

阅读更多关于 How to renumber result of intersection/group_indices in R?

问题 I am struggling with renumbering result from intersection/ group_indices in R for a few days. Sample data frame is shown below: t <- data.frame(mid=c(102,102,102,102,102,102,102,103,103,103,103,103,103,103), aid=c(10201,10202,10203,10204,10205,10206,10207, 10301,10302,10303,10304,10305,10306,10307), dummy=c(0,1,0,1,0,1,0,0,1,0,1,0,1,0), location=c(0,2,0,4,0,1,0,0,2,0,2,0,3,0) ) I need to update numbers stored in "location" fiels to sequential number by a group of "mid" without changing its

R tidyr regex: extract ordered numbers from character column

阅读更多关于 R tidyr regex: extract ordered numbers from character column

问题 Suppose I have a data frame like this df <- data.frame(x=c("This script outputs 10 visualizations.", "This script outputs 1 visualization.", "This script outputs 5 data files.", "This script outputs 1 data file.", "This script doesn't output any visualizations or data files", "This script outputs 9 visualizations and 28 data files.", "This script outputs 1 visualization and 1 data file.")) It looks like this x 1 This script outputs 10 visualizations. 2 This script outputs 1 visualization. 3

Are there more elegant ways to transform ragged data into a tidy dataframe

阅读更多关于 Are there more elegant ways to transform ragged data into a tidy dataframe

问题 I have a dataframe that contains a column of ragged data: "topics" where each topic is a string of characters, and adjacent topics are separated from each other by a delimiter ("|" in this case): library(lubridate) events <- data.frame( date =dmy(c( "12/6/2012", "13/7/2012", "4/8/2012")), days = c( 1, 6, 0.5), name = c("Intro to stats", "Stats Winter school", "TidyR tools"), topics= c( "probability|R", "R|regression|ggplot", "tidyR|dplyr"), stringsAsFactors=FALSE ) The events dataframe looks

How do I fill data until last non-missing value?

阅读更多关于 How do I fill data until last non-missing value?

问题 I have some data grouped by let like so: events <- structure(list(let = c("A", "A", "A", "B", "B", "B"), age = c(0L, 4L, 16L, 0L, 8L, 7L), value = c(61L, 60L, 13L, 29L, 56L, 99L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6")) let age value 1 A 0 61 2 A 4 60 3 A 16 13 4 B 0 29 5 B 8 56 6 B 7 99 How can I cast the data frame so that: Age is multiple columns grouped into weeks. So for each column, take the value of the largest age that is less than or equal to 0, 7, 14, etc

conditional string splitting in R (using tidyr)

阅读更多关于 conditional string splitting in R (using tidyr)

问题 I have a data frame like this: X <- data.frame(value = c(1,2,3,4), variable = c("cost", "cost", "reed_cost", "reed_cost")) I'd like to split the variable column into two; one column to indicate if the variable is a 'cost' and another column to indicate whether or not the variable is "reed". I cannot seem to figure out the right regex for the split (e.g. using tidyr) If my data were something nicer, say: Y <- data.frame(value = c(1,2,3,4), variable = c("adjusted_cost", "adjusted_cost", "reed