dplyr

conditional conversion from character to date for a dataframe column in r

时光总嘲笑我的痴心妄想 提交于 2021-01-28 11:20:44
问题 I have a dataframe I read from an excel file, like below. Date turned out to be in 5-digits format or a date string format. df = data.frame(Date = c('42195', '3/31/2016', '42198'), Value = c(123, 445, 222)) Date Value 42195 123 3/31/2016 445 42198 222 I want to clean up the column and convert everything into date format. I did the following. df %>% mutate(Date = ifelse(length(Date)==5,as.Date(Date, origin = '1899-12-30'), as.Date(Date) )) I got error like this: Error in charToDate(x) :

multidplyr: trial custom function

早过忘川 提交于 2021-01-28 09:51:09
问题 I'm trying to learn to run a custom function through multidplyr::do() on a cluster. Consider this simple self contained example. For example's sake, I'm trying to apply my custom function myWxTest to each common_dest (destinations with more than 50 flights) in the flight dataset: library(dplyr) library(multidplyr) library(nycflights13) library(quantreg) myWxTest <- function(x){ stopifnot(!is.null(x$dep_time)) stopifnot(!is.null(x$dep_delay)) stopifnot(!is.null(x$sched_dep_time)) stopifnot(!is

Calculate differences based on categorical column with tidyverse

狂风中的少年 提交于 2021-01-28 08:25:16
问题 I have the following data frame: library(tidyverse) df <- data.frame( vars = rep(letters[1:2], 3), value = c(10,12,15,19,22,23), phase = rep(factor(c("pre","post1","post2"), levels = c("pre","post1","post2")),2) ) %>% arrange(vars,phase) And I would like to calculate the difference in value of the following: post1 - pre post2 - post1 post2 - pre for each var (i.e., a and b ). What would be the most efficient way of achieving this using tidyverse ? Expected outcome: vars x diffs a post1 - pre

How can I mutate multiple variables using dplyr?

旧城冷巷雨未停 提交于 2021-01-28 08:15:52
问题 Given a tbl_df object df containing multiple variables (i.e. Var.50, Var.100, Var.150 and Var.200), measured twice (i.e. P1 and P2), I want to mutate a new set of the same variables from repeated measurements (for example, average P1 and P2, creating P3 for each corresponding variable). Similar questions have been asked before, but there does not seem to have clear answers using dplyr. Example data: df <- structure(list(P1.Var.50 = c(134.242050170898, 52.375, 177.126017252604 ), P1.Var.100 =

categorize based on date ranges in R

这一生的挚爱 提交于 2021-01-28 07:55:26
问题 How do I categorize each row in a large R dataframe (>2 million rows) based on date range definitions in a separate, much smaller R dataframe (12 rows)? My large dataframe, captures, looks similar to this when called via head(captures) : id date sex 1 160520 2016-11-22 1 2 1029735 2016-11-12 1 3 1885200 2016-11-05 1 4 2058366 2015-09-26 2 5 2058367 2015-09-26 1 6 2058368 2015-09-26 1 My small dataframe, seasons, looks similar to this in its entirety: Season Opening.Date Closing.Date 2016 2016

Compute relative frequencies with group totals using dplyr

心不动则不痛 提交于 2021-01-28 06:43:51
问题 I have the following toy data: data <- structure(list(value = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), class = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor")), .Names = c("value", "class"), class = "data.frame", row.names = c(NA, -16L)) Using the commands: data <- table(data$class, data$value) data <- as.data.frame(data) data$rel_freq <- data$Freq / aggregate(Freq ~ Var1, FUN = sum, data = data)

Cumulative aggregates within tidyverse

◇◆丶佛笑我妖孽 提交于 2021-01-28 05:51:51
问题 say I have a tibble (or data.table ) which consists of two columns: a <- tibble(id = rep(c("A", "B"), each = 6), val = c(1, 0, 0, 1 ,0,1,0,0,0,1,1,1)) Furthermore I have a function called myfun which takes a numeric vector of arbitrary length as input and returns a single number. For example, you can think of myfun as being the standard deviation. Now I would like to create a third column to my tibble (called result) which contains the outputs of myfun applied to val cumulated and grouped

Running multiple simple linear regressions from a nested dataframe/tibble

不问归期 提交于 2021-01-28 04:26:50
问题 I am trying to run multiple simple linear regressions based on data from a nested data frame and store the regression fit coefficients in a dataframe using tidy(). My code block is as follows library(tidyverse) library(broom) library(reshape2) library(dplyr) Factors <- as.factor(c("A","B","C","D")) set.seed(5) DF <- data.frame(Factors, X = rnorm(4), Y = rnorm(4), Z= rnorm(4)) MDF <- melt(DF, id.vars=c("Factors","X")) DFF <- MDF %>% nest(-Factors) If it is a single dataframe with many columns,

Grouped pivot_longer dplyr

孤人 提交于 2021-01-28 04:13:12
问题 This is an example dataframe. My real dataframe is larger. I highly prefer a tidyverse solution. #my data age <- c(18,18,19) A1 <- c(3,5,3) A2 <- c(4,4,3) B1 <- c(1,5,2) B2 <- c(2,2,5) df <- data.frame(age, A1, A2, B1, B2) I want my data to look like this: #what i want new_age <- c(18,18,18,18,19,19) A <- c(3,5,4,4,3,3) B <- c(1,5,2,2,2,5) new_df <- data.frame(new_age, A, B) I want to pivot longer and stack columns A1:A2 into column A, and B1:B2 into B. I also want to have the responses to

conditionally duplicating rows in a data frame

…衆ロ難τιáo~ 提交于 2021-01-28 03:12:23
问题 This is a sample of my data set: day city count 1 1 A 50 2 2 A 100 3 2 B 110 4 2 C 90 Here is the code for reproducing it: df <- data.frame( day = c(1,2,2,2), city = c("A","A","B","C"), count = c(50,100,110,90) ) As you could see, the count data is missing for city B and C on the day 1. What I want to do is to use city A's count as an estimate for the other two cities. So the desired output would be: day city count 1 1 A 50 2 1 B 50 3 1 C 50 4 2 A 100 5 2 B 110 6 2 C 90 I could come up with a