tidyverse

Correct usage of dplyr::select in dplyr 0.7.0+, selecting columns using character vector

会有一股神秘感。 提交于 2019-12-09 10:01:47
问题 Suppose we have a character vector cols_to_select containing some columns we want to select from a dataframe df , e.g. df <- tibble::data_frame(a=1:3, b=1:3, c=1:3, d=1:3, e=1:3) cols_to_select <- c("b", "d") Suppose also we want to use dplyr::select because it's part of an operation that uses %>% so using select makes the code easy to read. There seem to be a number of ways which this can be achieved, but some are more robust than others. Please could you let me know which is the 'correct'

how to compute rowsums using tidyverse

南笙酒味 提交于 2019-12-09 06:31:51
问题 I did mtcars %>% by_row(sum) but got the message: by_row() is deprecated; please use a combination of: tidyr::nest(); dplyr::mutate(); purrr::map() My naive approach is this mtcars %>% group_by(id = row_number()) %>% nest(-id) %>% mutate(hi = map_dbl(data, sum)) Is there a way to do it without creating an "id" column? 回答1: Is this what you are looking for? mtcars %>% mutate(rowsum = rowSums(.)) Output: mpg cyl disp hp drat wt qsec vs am gear carb rowsum 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1

replace NA with 0 using starts_with()

给你一囗甜甜゛ 提交于 2019-12-08 18:27:14
问题 I am trying to replace NA values for a specific set of columns in my tibble . The columns all start with the same prefix so I am wanting to know if there is a concise way to make use of the starts_with() function from the dplyr package that would allow me to do this. I have seen several other questions on SO, however they all require the use of specific column names or locations. I'm really trying to be lazy and not wanting to define ALL columns, just the prefix. I've tried the replace_na()

How do I prevent interpolation between values where there are more than 2 missing rows of data?

怎甘沉沦 提交于 2019-12-08 15:20:38
I would like to write a conditional statement inside mutate_at() so that approx() does not interpolate between values where there are more than 2 missing rows of data. Here are the data: dat <- data.frame( time = 1:10, var1 = c(10, 10, 10, 12, 12, 12, 15, 15, 15, 15), var2 = c( 1, NA, 3, 6, NA, NA, NA, 10, 9, 8), var3 = c(10, NA, NA, 13, 14, 16, NA, 18, 19, 20) ) The is the chunk of code I would like to adapt such that it does NOT interpolate where there are more than 2 NAs between values (i.e., rows 5-7 in the var2 column should remain NA and all other NAs should be interpolated values.

Aggregating if each observation can belong to multiple groups with multiple grouping variables

佐手、 提交于 2019-12-08 11:58:27
问题 This question is a follow up of : Aggregating if each observation can belong to multiple groups. As in the linked question my observations can belong to several groups. But now I got 2 grouping variables, which makes the problem much harder (at least to me). In the example below an observation can belong to one or more of the groups A, B, C. But I also want to distinguish according to another factor, i.e. is x < 1, x <.5 or y < 0. Since all x smaller 0 are also smaller 1 each observation can

`dplyr::case_when` Evaluation error: object 'x' not found

孤街浪徒 提交于 2019-12-08 09:07:48
问题 Does anyone know why dplyr::case_when() produces the error in the following code? tibble(tmp1 = sample(c(T, F), size = 32, replace = T), tmp2 = sample(c(T, F), size = 32, replace = T), tmp3 = sample(c(T, F), size = 32, replace = T)) %>% mutate(tmp = apply(cbind(tmp1, tmp2, tmp3), 1, function(x) { case_when( all(x == F) ~ "N", any(x == T) ~ "Y" ) })) Error in mutate_impl(.data, dots) : Evaluation error: object 'x' not found. I am using R 3.4.3 with dplyr 0.7.4 on Ubuntu 16.04. The error

How do I prevent interpolation between values where there are more than 2 missing rows of data?

江枫思渺然 提交于 2019-12-08 07:54:55
问题 I would like to write a conditional statement inside mutate_at() so that approx() does not interpolate between values where there are more than 2 missing rows of data. Here are the data: dat <- data.frame( time = 1:10, var1 = c(10, 10, 10, 12, 12, 12, 15, 15, 15, 15), var2 = c( 1, NA, 3, 6, NA, NA, NA, 10, 9, 8), var3 = c(10, NA, NA, 13, 14, 16, NA, 18, 19, 20) ) The is the chunk of code I would like to adapt such that it does NOT interpolate where there are more than 2 NAs between values (i

Summing a dataframe based on another dataframe

会有一股神秘感。 提交于 2019-12-08 07:21:28
问题 I have daily data of rainfall from 10 locations across 10 years set.seed(123) df <- data.frame(loc.id = rep(1:10, each = 10*365),years = rep(rep(2001:2010,each = 365),times = 10), day = rep(rep(1:365,times = 10),times = 10), rain = runif(min = 0 , max = 35, 10*10*365)) I have a separate data frame that has certain days using which I want to sum the rainfall in df df.ref <- data.frame(loc.id = rep(1:10, each = 10), years = rep(2001:2010,times = 10), index1 = rep(250,times = 10*10), index2 =

R - lubridate: split durations into “sub-durations”

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-08 05:47:23
问题 I have a R tidy dataset my_durations where each case in the data frame corresponds to a sample taken over a duration of time like so: > glimpse(my_durations) Observations: 300 Variables: 5 $ sample_id <int> 2, 8, 25, 41, 59, 70, 98, 100, 105, 106, 108, 114, 119, 126,... $ site_id <int> 2, 13, 12, 23, 47, 23, 66, 72, 72, 50, 50, 54, 45, 73, 48, 7... $ start_date <dttm> 2015-04-12, 2015-06-10, 2015-07-02, 2015-07-22, 2015-07-29,... $ end_date <dttm> 2015-05-14, 2015-06-18, 2015-07-08, 2015-07

Conditionally replace the values in columns to value in another column using dplyr

情到浓时终转凉″ 提交于 2019-12-08 05:04:24
问题 I tried really hard to find an answer to this and I apologize if it's a duplicate. I'll make some dummy data to explain my question. tibble(a=c(0.1, 0.2, 0.3), sample1 = c(0, 1, 1), sample2 = c(1, 1, 0)) # A tibble: 3 x 3 a sample1 sample2 <dbl> <dbl> <dbl> 1 0.1 0 1 2 0.2 1 1 3 0.3 1 0 How to I conditionally change the values in columns sample1 and sample2 so that if they are equal to one, they take on the value of a . The resulting tibble should look like this: # A tibble: 3 x 3 a sample1