tidyverse | 易学教程

Correct usage of dplyr::select in dplyr 0.7.0+, selecting columns using character vector

阅读更多关于 Correct usage of dplyr::select in dplyr 0.7.0+, selecting columns using character vector

问题 Suppose we have a character vector cols_to_select containing some columns we want to select from a dataframe df , e.g. df <- tibble::data_frame(a=1:3, b=1:3, c=1:3, d=1:3, e=1:3) cols_to_select <- c("b", "d") Suppose also we want to use dplyr::select because it's part of an operation that uses %>% so using select makes the code easy to read. There seem to be a number of ways which this can be achieved, but some are more robust than others. Please could you let me know which is the 'correct'

how to compute rowsums using tidyverse

阅读更多关于 how to compute rowsums using tidyverse

问题 I did mtcars %>% by_row(sum) but got the message: by_row() is deprecated; please use a combination of: tidyr::nest(); dplyr::mutate(); purrr::map() My naive approach is this mtcars %>% group_by(id = row_number()) %>% nest(-id) %>% mutate(hi = map_dbl(data, sum)) Is there a way to do it without creating an "id" column? 回答1: Is this what you are looking for? mtcars %>% mutate(rowsum = rowSums(.)) Output: mpg cyl disp hp drat wt qsec vs am gear carb rowsum 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1

replace NA with 0 using starts_with()

阅读更多关于 replace NA with 0 using starts_with()

问题 I am trying to replace NA values for a specific set of columns in my tibble . The columns all start with the same prefix so I am wanting to know if there is a concise way to make use of the starts_with() function from the dplyr package that would allow me to do this. I have seen several other questions on SO, however they all require the use of specific column names or locations. I'm really trying to be lazy and not wanting to define ALL columns, just the prefix. I've tried the replace_na()

How do I prevent interpolation between values where there are more than 2 missing rows of data?

阅读更多关于 How do I prevent interpolation between values where there are more than 2 missing rows of data?

I would like to write a conditional statement inside mutate_at() so that approx() does not interpolate between values where there are more than 2 missing rows of data. Here are the data: dat <- data.frame( time = 1:10, var1 = c(10, 10, 10, 12, 12, 12, 15, 15, 15, 15), var2 = c( 1, NA, 3, 6, NA, NA, NA, 10, 9, 8), var3 = c(10, NA, NA, 13, 14, 16, NA, 18, 19, 20) ) The is the chunk of code I would like to adapt such that it does NOT interpolate where there are more than 2 NAs between values (i.e., rows 5-7 in the var2 column should remain NA and all other NAs should be interpolated values.

Aggregating if each observation can belong to multiple groups with multiple grouping variables

阅读更多关于 Aggregating if each observation can belong to multiple groups with multiple grouping variables

问题 This question is a follow up of : Aggregating if each observation can belong to multiple groups. As in the linked question my observations can belong to several groups. But now I got 2 grouping variables, which makes the problem much harder (at least to me). In the example below an observation can belong to one or more of the groups A, B, C. But I also want to distinguish according to another factor, i.e. is x < 1, x <.5 or y < 0. Since all x smaller 0 are also smaller 1 each observation can

`dplyr::case_when` Evaluation error: object 'x' not found

阅读更多关于 `dplyr::case_when` Evaluation error: object 'x' not found

问题 Does anyone know why dplyr::case_when() produces the error in the following code? tibble(tmp1 = sample(c(T, F), size = 32, replace = T), tmp2 = sample(c(T, F), size = 32, replace = T), tmp3 = sample(c(T, F), size = 32, replace = T)) %>% mutate(tmp = apply(cbind(tmp1, tmp2, tmp3), 1, function(x) { case_when( all(x == F) ~ "N", any(x == T) ~ "Y" ) })) Error in mutate_impl(.data, dots) : Evaluation error: object 'x' not found. I am using R 3.4.3 with dplyr 0.7.4 on Ubuntu 16.04. The error

How do I prevent interpolation between values where there are more than 2 missing rows of data?

阅读更多关于 How do I prevent interpolation between values where there are more than 2 missing rows of data?

问题 I would like to write a conditional statement inside mutate_at() so that approx() does not interpolate between values where there are more than 2 missing rows of data. Here are the data: dat <- data.frame( time = 1:10, var1 = c(10, 10, 10, 12, 12, 12, 15, 15, 15, 15), var2 = c( 1, NA, 3, 6, NA, NA, NA, 10, 9, 8), var3 = c(10, NA, NA, 13, 14, 16, NA, 18, 19, 20) ) The is the chunk of code I would like to adapt such that it does NOT interpolate where there are more than 2 NAs between values (i

Summing a dataframe based on another dataframe

阅读更多关于 Summing a dataframe based on another dataframe

问题 I have daily data of rainfall from 10 locations across 10 years set.seed(123) df <- data.frame(loc.id = rep(1:10, each = 10*365),years = rep(rep(2001:2010,each = 365),times = 10), day = rep(rep(1:365,times = 10),times = 10), rain = runif(min = 0 , max = 35, 10*10*365)) I have a separate data frame that has certain days using which I want to sum the rainfall in df df.ref <- data.frame(loc.id = rep(1:10, each = 10), years = rep(2001:2010,times = 10), index1 = rep(250,times = 10*10), index2 =

R - lubridate: split durations into “sub-durations”

阅读更多关于 R - lubridate: split durations into “sub-durations”

问题 I have a R tidy dataset my_durations where each case in the data frame corresponds to a sample taken over a duration of time like so: > glimpse(my_durations) Observations: 300 Variables: 5 $ sample_id <int> 2, 8, 25, 41, 59, 70, 98, 100, 105, 106, 108, 114, 119, 126,... $ site_id <int> 2, 13, 12, 23, 47, 23, 66, 72, 72, 50, 50, 54, 45, 73, 48, 7... $ start_date <dttm> 2015-04-12, 2015-06-10, 2015-07-02, 2015-07-22, 2015-07-29,... $ end_date <dttm> 2015-05-14, 2015-06-18, 2015-07-08, 2015-07

Conditionally replace the values in columns to value in another column using dplyr

阅读更多关于 Conditionally replace the values in columns to value in another column using dplyr

问题 I tried really hard to find an answer to this and I apologize if it's a duplicate. I'll make some dummy data to explain my question. tibble(a=c(0.1, 0.2, 0.3), sample1 = c(0, 1, 1), sample2 = c(1, 1, 0)) # A tibble: 3 x 3 a sample1 sample2 <dbl> <dbl> <dbl> 1 0.1 0 1 2 0.2 1 1 3 0.3 1 0 How to I conditionally change the values in columns sample1 and sample2 so that if they are equal to one, they take on the value of a . The resulting tibble should look like this: # A tibble: 3 x 3 a sample1