mutate | 易学教程

Normalizing selection of dataframe columns with dplyr

阅读更多关于 Normalizing selection of dataframe columns with dplyr

问题 I have a data.frame with variables var1 var2 (both strings) and variables x , y , and z . I would like to normalize variables x , y and z by dividing them all by their respective first element. I tried: df_ %>% mutate_at(c("x", "y", "z"), funs(./.[1])) %>% head() But, this sets the whole column to 1. How can I achieve that it devides by the first element? Secondly, what is the best way to add the normalized to the dataframe as variables x_norm , y_norm , z_norm ? Many thanks, and please let

Can you make dplyr::mutate and dplyr::lag default = its own input value?

阅读更多关于 Can you make dplyr::mutate and dplyr::lag default = its own input value?

问题 This is similar to this dplyr lag post, and this dplyr mutate lag post, but neither of those ask this question about defaulting to the input value. I am using dplyr to mutate a new field that's a lagged offset of another field (that I've converted to POSIXct). The goal is, for a given ip, I'd like to know some summary statistics on the delta between all the times it shows up on my list. I also have about 12 million rows. The data look like this (prior to mutation) ip hour snap 192.168.1.2

R: row-wise dplyr::mutate using function that takes a data frame row and returns an integer

阅读更多关于 R: row-wise dplyr::mutate using function that takes a data frame row and returns an integer

问题 I am trying to use pipe mutate statement using a custom function. I looked a this somewhat similar SO post but in vain. Say I have a data frame like this (where blob is some variable not related to the specific task but is part of the entire data) : df <- data.frame(exclude=c('B','B','D'), B=c(1,0,0), C=c(3,4,9), D=c(1,1,0), blob=c('fd', 'fs', 'sa'), stringsAsFactors = F) I have a function that uses the variable names so select some based on the value in the exclude column and e.g. calculates

R: row-wise dplyr::mutate using function that takes a data frame row and returns an integer

阅读更多关于 R: row-wise dplyr::mutate using function that takes a data frame row and returns an integer

Update cell values with the column name using mutate_at

阅读更多关于 Update cell values with the column name using mutate_at

问题 I am processing survey data. Some of the questions ask participants to check all of the options that apply to them. In the dataframe I currently have, there is a column for each of the possible responses, with a value of 1 recorded if the participant selected that option. For example, for the question "Which of the following emotions have you experienced at work?", with the options "Boredom", "Stress", "Contentment", my dataframe would look like this: df <- data.frame( id = seq(1,3,1),

Transpose dplyr::tbl object

阅读更多关于 Transpose dplyr::tbl object

问题 I am using src_postgres to connect and dplyr::tbl function to fetch data from redshift database. I have applied some filters and top function to it using the dplyr itself. Now my data looks as below: riid day hour <dbl> <chr> <chr> 1 5542. "THURSDAY " 12 2 5862. "FRIDAY " 15 3 5982. "TUESDAY " 15 4 6022. WEDNESDAY 16 My final output should be as below: riid MON TUES WED THUR FRI SAT SUN 5542 12 5862 15 5988 15 6022 16 I have tried spread. It throws the below error because of the class type:

Ordering problems when using mutate with ifelse condition to date

阅读更多关于 Ordering problems when using mutate with ifelse condition to date

问题 I'm trying to use mutate to create a column that takes the value of one column up to a point and then uses cumprod to fill the rest of the observations based on the values of another column. I tried combining mutate with ifelse but the order of the statements is not correct and I can't figure out why Below I reproduce a more basic example that replicates my problem: foo1 <- data.frame(date=seq(2005,2018,1)) foo1 %>% mutate(h=ifelse(date>2008, seq(1,11,1), 99)) The output is: date h 1 2005 99

Group/Mutate only returns NA and not an average

阅读更多关于 Group/Mutate only returns NA and not an average

问题 Using R 3.5, R studio 1.1.419. I have a dataset that contains geographic data and a city based measures. zip state city statefips finmei14 finmei15 1 501 NY Holtsville 36 NA NA 2 544 NY Holtsville 36 NA NA 3 1001 MA Agawam 25 NA NA 4 1002 MA Amherst 25 69 64 5 1003 MA Amherst 25 69 64 6 1004 MA Amherst 25 69 64 7 1005 MA Barre 25 NA NA 8 1007 MA Belchertown 25 NA NA 9 1008 MA Blandford 25 NA NA 10 1009 MA Bondsville 25 NA NA finmei14 and finmei15 are city based measures that I want to

dplyr mutate multiple columns based on names in vectors

阅读更多关于 dplyr mutate multiple columns based on names in vectors

问题 I want to multiply two columns with each other by using dplyr's mutate function. But instead of writing a new line for each mutate conditions I would like to use the names of the columns stored in the vectors var1 and var2 . For example in the end I want to have a additional column in my existing bankdata with the name result1 which contains the result by multiplying the columns cash and loans with each other. This shall be continued until 3 new columns have been created. Reproducible code:

dplyr concat columns stored in variable (mutate and non standard evaluation)

阅读更多关于 dplyr concat columns stored in variable (mutate and non standard evaluation)

问题 I would like to concatenate an arbitrary number of columns in a dataframe based on a variable cols_to_concat df <- dplyr::data_frame(a = letters[1:3], b = letters[4:6], c = letters[7:9]) cols_to_concat = c("a", "b", "c") To achieve the desired result with this specific value of cols_to_concat I could do this: df %>% dplyr::mutate(concat = paste0(a, b, c)) But I need to generalise this, using syntax a bit like this # (DOES NOT WORK) df %>% dplyr::mutate(concat = paste0(cols)) I'd like to use