tidyr

dividing column values in range and aggregate date by month to count frequency of range which fall in that month

我的未来我决定 提交于 2020-01-06 05:44:06
问题 I have a data frame that contains a date column that is in integer type. I also want to divide price in range of 10,000 and then count frequency which falls in that month > df date values price 11/25/18 a 10000 11/30/18 b 30500 12/4/18 a 20000 12/5/18 b 65000 12/5/18 a 50000 12/6/18 b 35000 12/6/18 c 40000 12/6/18 a 45000 12/6/18 a 30000 12/7/18 b 80000 12/7/18 c 85000 12/7/18 a 90000 12/9/18 b 20000 12/12/18 a 32500 12/12/18 c 40200 12/13/18 b 56000 1/9/19 a 82000 1/9/19 c 63000 1/9/19 b

Can you list an exception to tidyselect `everything()`

末鹿安然 提交于 2020-01-06 05:17:09
问题 library(tidyverse) iris %>% as_tibble() %>% select(everything()) #> # A tibble: 150 x 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <fct> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> 4 4.6 3.1 1.5 0.2 setosa #> 5 5 3.6 1.4 0.2 setosa #> 6 5.4 3.9 1.7 0.4 setosa #> 7 4.6 3.4 1.4 0.3 setosa #> 8 5 3.4 1.5 0.2 setosa #> 9 4.4 2.9 1.4 0.2 setosa #> 10 4.9 3.1 1.5 0.1 setosa #> # ... with 140 more rows Say I want

Using spread with duplicate identifiers for rows

假装没事ソ 提交于 2020-01-05 08:08:10
问题 I have a long form dataframe that have multiple entries for same date and person. jj <- data.frame(month=rep(1:3,4), student=rep(c("Amy", "Bob"), each=6), A=c(9, 7, 6, 8, 6, 9, 3, 2, 1, 5, 6, 5), B=c(6, 7, 8, 5, 6, 7, 5, 4, 6, 3, 1, 5)) I want to convert it to wide form and make it like this: month Amy.A Bob.A Amy.B Bob.B 1 2 3 1 2 3 1 2 3 1 2 3 My question is very similar to this. I have used the given code in the answer : kk <- jj %>% gather(variable, value, -(month:student)) %>% unite(temp

how can I speed a function in tidyr up

狂风中的少年 提交于 2020-01-04 09:28:21
问题 I have a data like this n <- 1e5 set.seed(24) df1 <- data.frame(query_string = sample(sprintf("%06d", 100:1000), n, replace=TRUE), id.x = sample(1:n), s_val = sample(paste0("F", 400:700), n, replace=TRUE), id.y = sample(100:3000, n, replace=TRUE), ID_col_n = sample(100:1e6, n, replace=TRUE), total_id = 1:n) I use the spread function to assign common strings using the following function library(tidyr) res <- spread(resNik,s_val,value=query_string,fill=NA) This works perfectly but when the data

R / tidyr::complete - filling missing values dynamically

倾然丶 夕夏残阳落幕 提交于 2020-01-04 08:17:07
问题 I'm using tidyr::complete() to include missing rows in a data frame with many columns, leading to NAs values. How can I instruct the fill option to replace the NA values with 0 if I don't have an explicit list of column names? Example: df <- data.frame(year = c(2010, 2013:2015), age.21 = runif(4, 0, 10), age.22 = runif(4, 0, 10), age.23 = runif(4, 0, 10), age.24 = runif(4, 0, 10), age.25 = runif(4, 0, 10)) # replaces missing values with NA - not what I want df.complete <- complete(df, year =

How to renumber result of intersection/group_indices in R?

十年热恋 提交于 2020-01-04 07:36:10
问题 I am struggling with renumbering result from intersection/ group_indices in R for a few days. Sample data frame is shown below: t <- data.frame(mid=c(102,102,102,102,102,102,102,103,103,103,103,103,103,103), aid=c(10201,10202,10203,10204,10205,10206,10207, 10301,10302,10303,10304,10305,10306,10307), dummy=c(0,1,0,1,0,1,0,0,1,0,1,0,1,0), location=c(0,2,0,4,0,1,0,0,2,0,2,0,3,0) ) I need to update numbers stored in "location" fiels to sequential number by a group of "mid" without changing its

R tidyr regex: extract ordered numbers from character column

瘦欲@ 提交于 2020-01-04 05:18:07
问题 Suppose I have a data frame like this df <- data.frame(x=c("This script outputs 10 visualizations.", "This script outputs 1 visualization.", "This script outputs 5 data files.", "This script outputs 1 data file.", "This script doesn't output any visualizations or data files", "This script outputs 9 visualizations and 28 data files.", "This script outputs 1 visualization and 1 data file.")) It looks like this x 1 This script outputs 10 visualizations. 2 This script outputs 1 visualization. 3

Are there more elegant ways to transform ragged data into a tidy dataframe

巧了我就是萌 提交于 2020-01-04 03:45:08
问题 I have a dataframe that contains a column of ragged data: "topics" where each topic is a string of characters, and adjacent topics are separated from each other by a delimiter ("|" in this case): library(lubridate) events <- data.frame( date =dmy(c( "12/6/2012", "13/7/2012", "4/8/2012")), days = c( 1, 6, 0.5), name = c("Intro to stats", "Stats Winter school", "TidyR tools"), topics= c( "probability|R", "R|regression|ggplot", "tidyR|dplyr"), stringsAsFactors=FALSE ) The events dataframe looks

How do I fill data until last non-missing value?

梦想与她 提交于 2020-01-02 20:42:08
问题 I have some data grouped by let like so: events <- structure(list(let = c("A", "A", "A", "B", "B", "B"), age = c(0L, 4L, 16L, 0L, 8L, 7L), value = c(61L, 60L, 13L, 29L, 56L, 99L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6")) let age value 1 A 0 61 2 A 4 60 3 A 16 13 4 B 0 29 5 B 8 56 6 B 7 99 How can I cast the data frame so that: Age is multiple columns grouped into weeks. So for each column, take the value of the largest age that is less than or equal to 0, 7, 14, etc

conditional string splitting in R (using tidyr)

廉价感情. 提交于 2020-01-02 07:21:15
问题 I have a data frame like this: X <- data.frame(value = c(1,2,3,4), variable = c("cost", "cost", "reed_cost", "reed_cost")) I'd like to split the variable column into two; one column to indicate if the variable is a 'cost' and another column to indicate whether or not the variable is "reed". I cannot seem to figure out the right regex for the split (e.g. using tidyr) If my data were something nicer, say: Y <- data.frame(value = c(1,2,3,4), variable = c("adjusted_cost", "adjusted_cost", "reed