dplyr | 易学教程

conditional conversion from character to date for a dataframe column in r

阅读更多关于 conditional conversion from character to date for a dataframe column in r

问题 I have a dataframe I read from an excel file, like below. Date turned out to be in 5-digits format or a date string format. df = data.frame(Date = c('42195', '3/31/2016', '42198'), Value = c(123, 445, 222)) Date Value 42195 123 3/31/2016 445 42198 222 I want to clean up the column and convert everything into date format. I did the following. df %>% mutate(Date = ifelse(length(Date)==5,as.Date(Date, origin = '1899-12-30'), as.Date(Date) )) I got error like this: Error in charToDate(x) :

multidplyr: trial custom function

阅读更多关于 multidplyr: trial custom function

问题 I'm trying to learn to run a custom function through multidplyr::do() on a cluster. Consider this simple self contained example. For example's sake, I'm trying to apply my custom function myWxTest to each common_dest (destinations with more than 50 flights) in the flight dataset: library(dplyr) library(multidplyr) library(nycflights13) library(quantreg) myWxTest <- function(x){ stopifnot(!is.null(x$dep_time)) stopifnot(!is.null(x$dep_delay)) stopifnot(!is.null(x$sched_dep_time)) stopifnot(!is

Calculate differences based on categorical column with tidyverse

阅读更多关于 Calculate differences based on categorical column with tidyverse

问题 I have the following data frame: library(tidyverse) df <- data.frame( vars = rep(letters[1:2], 3), value = c(10,12,15,19,22,23), phase = rep(factor(c("pre","post1","post2"), levels = c("pre","post1","post2")),2) ) %>% arrange(vars,phase) And I would like to calculate the difference in value of the following: post1 - pre post2 - post1 post2 - pre for each var (i.e., a and b ). What would be the most efficient way of achieving this using tidyverse ? Expected outcome: vars x diffs a post1 - pre

How can I mutate multiple variables using dplyr?

阅读更多关于 How can I mutate multiple variables using dplyr?

问题 Given a tbl_df object df containing multiple variables (i.e. Var.50, Var.100, Var.150 and Var.200), measured twice (i.e. P1 and P2), I want to mutate a new set of the same variables from repeated measurements (for example, average P1 and P2, creating P3 for each corresponding variable). Similar questions have been asked before, but there does not seem to have clear answers using dplyr. Example data: df <- structure(list(P1.Var.50 = c(134.242050170898, 52.375, 177.126017252604 ), P1.Var.100 =

categorize based on date ranges in R

阅读更多关于 categorize based on date ranges in R

问题 How do I categorize each row in a large R dataframe (>2 million rows) based on date range definitions in a separate, much smaller R dataframe (12 rows)? My large dataframe, captures, looks similar to this when called via head(captures) : id date sex 1 160520 2016-11-22 1 2 1029735 2016-11-12 1 3 1885200 2016-11-05 1 4 2058366 2015-09-26 2 5 2058367 2015-09-26 1 6 2058368 2015-09-26 1 My small dataframe, seasons, looks similar to this in its entirety: Season Opening.Date Closing.Date 2016 2016

Compute relative frequencies with group totals using dplyr

阅读更多关于 Compute relative frequencies with group totals using dplyr

问题 I have the following toy data: data <- structure(list(value = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), class = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor")), .Names = c("value", "class"), class = "data.frame", row.names = c(NA, -16L)) Using the commands: data <- table(data$class, data$value) data <- as.data.frame(data) data$rel_freq <- data$Freq / aggregate(Freq ~ Var1, FUN = sum, data = data)

Cumulative aggregates within tidyverse

阅读更多关于 Cumulative aggregates within tidyverse

问题 say I have a tibble (or data.table ) which consists of two columns: a <- tibble(id = rep(c("A", "B"), each = 6), val = c(1, 0, 0, 1 ,0,1,0,0,0,1,1,1)) Furthermore I have a function called myfun which takes a numeric vector of arbitrary length as input and returns a single number. For example, you can think of myfun as being the standard deviation. Now I would like to create a third column to my tibble (called result) which contains the outputs of myfun applied to val cumulated and grouped

Running multiple simple linear regressions from a nested dataframe/tibble

阅读更多关于 Running multiple simple linear regressions from a nested dataframe/tibble

问题 I am trying to run multiple simple linear regressions based on data from a nested data frame and store the regression fit coefficients in a dataframe using tidy(). My code block is as follows library(tidyverse) library(broom) library(reshape2) library(dplyr) Factors <- as.factor(c("A","B","C","D")) set.seed(5) DF <- data.frame(Factors, X = rnorm(4), Y = rnorm(4), Z= rnorm(4)) MDF <- melt(DF, id.vars=c("Factors","X")) DFF <- MDF %>% nest(-Factors) If it is a single dataframe with many columns,

Grouped pivot_longer dplyr

阅读更多关于 Grouped pivot_longer dplyr

问题 This is an example dataframe. My real dataframe is larger. I highly prefer a tidyverse solution. #my data age <- c(18,18,19) A1 <- c(3,5,3) A2 <- c(4,4,3) B1 <- c(1,5,2) B2 <- c(2,2,5) df <- data.frame(age, A1, A2, B1, B2) I want my data to look like this: #what i want new_age <- c(18,18,18,18,19,19) A <- c(3,5,4,4,3,3) B <- c(1,5,2,2,2,5) new_df <- data.frame(new_age, A, B) I want to pivot longer and stack columns A1:A2 into column A, and B1:B2 into B. I also want to have the responses to

conditionally duplicating rows in a data frame

阅读更多关于 conditionally duplicating rows in a data frame

问题 This is a sample of my data set: day city count 1 1 A 50 2 2 A 100 3 2 B 110 4 2 C 90 Here is the code for reproducing it: df <- data.frame( day = c(1,2,2,2), city = c("A","A","B","C"), count = c(50,100,110,90) ) As you could see, the count data is missing for city B and C on the day 1. What I want to do is to use city A's count as an estimate for the other two cities. So the desired output would be: day city count 1 1 A 50 2 1 B 50 3 1 C 50 4 2 A 100 5 2 B 110 6 2 C 90 I could come up with a