tidyverse

Take difference between first and last observations in a row, where each row is different

倾然丶 夕夏残阳落幕 提交于 2021-02-20 17:56:41
问题 I have data that looks like the following: Region X2012 X2013 X2014 X2015 X2016 X2017 1 1 10 11 12 13 14 15 2 2 NA 17 14 NA 23 NA 3 3 12 18 18 NA 23 NA 4 4 NA NA 15 28 NA 38 5 5 14 18.5 16 27 25 39 6 6 15 NA 17 27.5 NA 39 The numbers are irrelevant here but what I am trying to do is take the difference between the earliest and latest observed points in each row to make a new column for the difference where: Region Diff 1 (15 - 10) = 5 2 (23 - 17) = 6 and so on, not actually showing the

Parsing Interview Text

社会主义新天地 提交于 2021-02-20 04:14:07
问题 I have a text file of a presidential debate. Eventually, I want to parse the text into a dataframe where each row is a statement, with one column with the speaker's name and another column with the statement. For example: "Bob Smith: Hi Steve. How are you doing? Steve Brown: Hi Bob. I'm doing well!" Would become: name text 1 Bob Smith Hi Steve. How are you doing? 2 Steve Brown Hi Bob. I'm doing well! Question: How do I split the statements from the names? I tried splitting on the colon: data

filter one dataframe via conditions in another

生来就可爱ヽ(ⅴ<●) 提交于 2021-02-19 08:55:24
问题 I want to recursively filter a dataframe, d by an arbitrary number of conditions (represented as rows in another dataframe z ). I begin with a dataframe d : d <- data.frame(x = 1:10, y = letters[1:10]) The second dataframe z , has columns x1 and x2 , which are lower and upper limits to filter d$x . This dataframe z may grow to be an arbitrary number of rows long. z <- data.frame(x1 = c(1,3,8), x2 = c(1,4,10)) I want to return all rows of d for which d$x <= z$x1[i] and d$x >= z$x2[i] for all i

R nested map through columns

前提是你 提交于 2021-02-19 06:18:08
问题 I got a function which was solved here. This function takes a column filled with annotations and another grouping column and propagates the annotation to rows with missing values. f1 <- function(data, group_col, expand_col){ data %>% dplyr::group_by({{group_col}}) %>% dplyr::mutate( {{expand_col}} := dplyr::case_when( !is.na({{expand_col}}) ~ {{expand_col}} , any( !is.na({{expand_col}}) ) & is.na({{expand_col}}) ~ paste(unique(unlist(str_split(na.omit({{expand_col}}), " ")) ), collapse = " ")

How do I remove rows based on a range of dates given by values in 2 columns?

匆匆过客 提交于 2021-02-19 05:39:26
问题 I have a data set that includes a range of dates and need to fill in the missing dates in new rows. df1 is an example of the data I am working with and df2 is an example of what I've managed to achieve (where I'm stuck). df3 is where I would like to end up! df1 ID Date DateStart DateEnd 1 2/11/2021 2/11/2021 2/17/2021 1 2/19/2021 2/19/2021 2/21/2021 2 1/15/2021 1/15/2021 1/20/2021 2 1/22/2021 1/22/2021 1/23/2021 This is where I am with this. The NAs aren't an issue because I intend to drop

How do I build an object with the R vctrs package that can combine with c()

廉价感情. 提交于 2021-02-19 04:30:10
问题 I'm trying to understand how to build objects with vectors. I thought this was straightforwards, but then had trouble when I used c() on my object. Our object has two attributes, x and descriptor, both strings in this case (my object will have attributes with differing types). We've built a constructor, new_toy_vector. I haven't built a convenience function in this example yet. new_toy_vector <- function( x = character(), descriptor = character()) { vctrs::vec_assert(x,character()) vctrs::vec

Is there more efficient or concise way to use tidyr::gather to make my data look 'tidy'?

前提是你 提交于 2021-02-18 16:59:43
问题 I am new to using tidyverse. I want to see if I am being as efficient/concise as possible using the functions in this package. I suspect I am not. My original data has the key sym as part of each column name. day a_x b_x a_y b_y 1 1 -0.56047565 1.2240818 -1.0678237 0.42646422 2 2 -0.23017749 0.3598138 -0.2179749 -0.29507148 ... I would like to make the data look tidy, like so: day sym x y 1 1 a 0.118 0.702 2 2 a -0.947 -0.262 ... 11 1 b 1.44 0.788 12 2 b 0.452 0.769 Here is my code that does

Is there more efficient or concise way to use tidyr::gather to make my data look 'tidy'?

爷,独闯天下 提交于 2021-02-18 16:59:12
问题 I am new to using tidyverse. I want to see if I am being as efficient/concise as possible using the functions in this package. I suspect I am not. My original data has the key sym as part of each column name. day a_x b_x a_y b_y 1 1 -0.56047565 1.2240818 -1.0678237 0.42646422 2 2 -0.23017749 0.3598138 -0.2179749 -0.29507148 ... I would like to make the data look tidy, like so: day sym x y 1 1 a 0.118 0.702 2 2 a -0.947 -0.262 ... 11 1 b 1.44 0.788 12 2 b 0.452 0.769 Here is my code that does

Converting `dttm` to `date` formatting with as.Date and as_date give different results in R

*爱你&永不变心* 提交于 2021-02-18 12:51:14
问题 I have a large data set with individual columns for event times and dates. I ended up creating a master dttm object with both the times and dates together, but have had trouble when I try to filter based on the date. Here is a sample data set that reflects my own: library(tidyverse) d<- structure(list(date = structure(c(1530921600, 1531008000, 1530403200, 1530489600, 1530576000, 1530489600, 1530576000, 1531008000, 1530921600, 1530662400, 1530748800, 1531180800, 1530748800, 1531526400,

Converting `dttm` to `date` formatting with as.Date and as_date give different results in R

时光怂恿深爱的人放手 提交于 2021-02-18 12:51:13
问题 I have a large data set with individual columns for event times and dates. I ended up creating a master dttm object with both the times and dates together, but have had trouble when I try to filter based on the date. Here is a sample data set that reflects my own: library(tidyverse) d<- structure(list(date = structure(c(1530921600, 1531008000, 1530403200, 1530489600, 1530576000, 1530489600, 1530576000, 1531008000, 1530921600, 1530662400, 1530748800, 1531180800, 1530748800, 1531526400,