dplyr | 易学教程

r: coding dummy variables based-on max value for each month

阅读更多关于 r: coding dummy variables based-on max value for each month

问题 I want to code a new variable called df$dummy based-on the max value in df$var1 for each df$month , where the value will be 1 for the max value and 0 for every other value. See reproducible data set: df<- data.frame(date= seq.Date(from = as.Date('2017-01-01'), by= 7, length.out = 20), var1= rnorm(20, 5, 3)) df$month<- as.numeric(strftime(df$date, "%m")) I'm having trouble conceptualizing the conditions for the function. In Excel I would just use the maxif function and specific my criteria. My

R: Plotting Multiple Confidence Intervals on the Same Graph

阅读更多关于 R: Plotting Multiple Confidence Intervals on the Same Graph

问题 I am using the R programming language. I am trying to learn how to plot multiple time series on the same graph, as well as including their confidence intervals (in this case, their maximum and minimum values). Suppose I have two time series like this: library(xts) library(ggplot2) library(dplyr) library(plotly) library(lubridate) #time series 1 date_decision_made = seq(as.Date("2014/1/1"), as.Date("2016/1/1"),by="day") property_damages_in_dollars <- rnorm(731,100,10) final_data <- data.frame

dplyr `pivot_longer()` object not found but it's right there?

阅读更多关于 dplyr `pivot_longer()` object not found but it's right there?

问题 library(tidyverse) df <- tibble(Date = as.Date(c("2020-01-01", "2020-01-02")), Shop = c("Store A", "Store B"), Employees = c(5, 10), Sales = c(1000, 3000)) #> # A tibble: 2 x 4 #> Date Shop Employees Sales #> <date> <chr> <dbl> <dbl> #> 1 2020-01-01 Store A 5 1000 #> 2 2020-01-02 Store B 10 3000 I'm switching from dplyr spread/gather to pivot_* following the dplyr reference guide. I want to gather the "Employees" and "Sales" columns in the following manner: df %>% pivot_longer(-Date, -Shop,

Using formulas with aliases to perform multi-column operations

阅读更多关于 Using formulas with aliases to perform multi-column operations

问题 This question is related to a previous one I asked, but trying to be more generic. I want to use formulas to perform operations on multiple "groups" of data (i.e. a_data1 , a_data2 , b_data1 , b_data2 , and then make operations using the *_data1 columns). Based on @akrun's answer to that question, I created the following function. It takes a one-sided formula and applies it to all the "groups of data": suppressPackageStartupMessages({ library(dplyr) library(tidyr) }) polymutate <- function(df

Extracting dates following a specific word from a column of strings using dplyr

阅读更多关于 Extracting dates following a specific word from a column of strings using dplyr

问题 I am trying to extract the most recent date that a report was added in an R dataframe of reports. The text always looks like Date Ordered: M/DD/YYYY and may contain 0 many times in a given report. If it's repeating, I want the most recent (usually the last) instance, and I'm trying to convert it to a date in a mutated dplyr column. Using the code below on my actual dataframe, I get the error: Error in if (nchar(s) > 0 && substring(s, 1, 1) == "\002") { : missing value where TRUE/FALSE needed

Build rowSums in dplyr based on columns containing pattern in their names [duplicate]

阅读更多关于 Build rowSums in dplyr based on columns containing pattern in their names [duplicate]

问题 This question already has answers here : Sum across multiple columns with dplyr (6 answers) R, create a new column in a data frame that applies a function of all the columns with similar names (3 answers) Closed 2 years ago . My data frame looks something like this USER OBSERVATION COUNT.1 COUNT.2 COUNT.3 A 1 0 1 1 A 2 1 1 2 A 3 3 0 0 With dplyr I want to build a columns that sums the values of the count-variables for each row, selecting the count-variables based on their name. USER

Mutate column as input to sample

阅读更多关于 Mutate column as input to sample

问题 I want create a distribution using the sample() function with the probability of each value defined by a data.frame() column. When I try the code below however it produces the error: Error in sample.int(length(x), size, replace, prob) : incorrect number of probabilities I'm guessing this is because mutate is passing the entire column and not just one integer. Can anyone help me with this? library(dplyr) input = data.frame(input = 1:100) output = input %>% mutate(output = sum( sample(0:1, 10,

Subsetting data by levels of granularity and applying a function to each data frame in R

阅读更多关于 Subsetting data by levels of granularity and applying a function to each data frame in R

问题 Okay, this question is a fairly long and complex (at least for me) and I have done my best to make this as clear, organized, and detailed as possible, so please bear with me... ---------------------------------------------------------------------- I currently have an overly manual process in applying a function to subsets in my data, and I would like to figure out how to make the code more efficient. It is easiest to describe the issue with an example: The variables in my data (myData): GDP

proportion of factors and dummies

阅读更多关于 proportion of factors and dummies

问题 I have a data set full of factors and dummies, I want to see the proportion of each value after dplyr::group_by(cyl) mtcars; rownames(mtcars) <- NULL df <- mtcars[,c(2,8,9)] head(df) cyl vs am 1 6 0 1 2 6 0 1 3 4 1 1 4 6 1 0 5 8 0 0 6 6 1 0 Expected answer I have in cyl 6 6 6 6 for vs column two of them is 1 two of them 0 1 0 6 50% 50% 4 100% 0% 8 0% 100% same as this for column am too 回答1: Here's a first crack: (df %>% pivot_longer(-cyl) ## spread out variables (vs, am) %>% group_by(cyl,name

How to summarise all columns using group_by and summarise?

阅读更多关于 How to summarise all columns using group_by and summarise?

问题 I'm trying to tidy my daily activity data (accelerometer data). I would like to sum the repeated rows of each day for all columns. I have 32 rows (some are repeated) and 90 columns (data of one subject). # Example of my data with 32 rows and 14 columns df <- data.frame(LbNr = c(22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002,22002), Type = c("A2. Working" ,