tidyr

Remove duplicates by multiple conditions

只谈情不闲聊 提交于 2019-12-25 13:43:27
问题 I have data where an individual (Name) appears multiple times in a eggphase category. I would like for there only to be one sample per individual but I don't just want to keep the first one the R finds. I would like to keep the one where the group appears most in all other categories. Hopefully my example helps make this clear. library(tidyverse) myDF <- read.table(text="Tissue Food Eggphase Name Group wb fl after Kia a wb fl after Kia c wb wf before Kia b wb fl before Lucy c wb fl after Lucy

tidyr; %>% group_by() mutate(foo = fill() )

冷暖自知 提交于 2019-12-25 10:16:35
问题 I'm struggling to create a new variable to indicate what letter, LET , some groups, grp , within id, id , begin with. In the following I'll illustrate my question. I have data like this, library(dplyr); library(tidyr) df <- tibble(id = rep(0:1, c(7, 10)), grp = rep(c(0,1,0,1,2), c(3,4,2,5,3)), LET = rep(c('A', 'B', 'A', 'B', 'A', 'B'), c(1,4, 3, 3, 4, 2))) #> # A tibble: 17 x 3 #> id grp LET #> <int> <dbl> <chr> #> 1 0 0 A #> 2 0 0 B #> 3 0 0 B #> 4 0 1 B #> 5 0 1 B #> 6 0 1 A #> 7 0 1 A #> 8

tidyr; %>% group_by() mutate(foo = fill() )

南笙酒味 提交于 2019-12-25 10:12:45
问题 I'm struggling to create a new variable to indicate what letter, LET , some groups, grp , within id, id , begin with. In the following I'll illustrate my question. I have data like this, library(dplyr); library(tidyr) df <- tibble(id = rep(0:1, c(7, 10)), grp = rep(c(0,1,0,1,2), c(3,4,2,5,3)), LET = rep(c('A', 'B', 'A', 'B', 'A', 'B'), c(1,4, 3, 3, 4, 2))) #> # A tibble: 17 x 3 #> id grp LET #> <int> <dbl> <chr> #> 1 0 0 A #> 2 0 0 B #> 3 0 0 B #> 4 0 1 B #> 5 0 1 B #> 6 0 1 A #> 7 0 1 A #> 8

tidyr; %>% group_by() mutate(foo = fill() )

岁酱吖の 提交于 2019-12-25 10:11:09
问题 I'm struggling to create a new variable to indicate what letter, LET , some groups, grp , within id, id , begin with. In the following I'll illustrate my question. I have data like this, library(dplyr); library(tidyr) df <- tibble(id = rep(0:1, c(7, 10)), grp = rep(c(0,1,0,1,2), c(3,4,2,5,3)), LET = rep(c('A', 'B', 'A', 'B', 'A', 'B'), c(1,4, 3, 3, 4, 2))) #> # A tibble: 17 x 3 #> id grp LET #> <int> <dbl> <chr> #> 1 0 0 A #> 2 0 0 B #> 3 0 0 B #> 4 0 1 B #> 5 0 1 B #> 6 0 1 A #> 7 0 1 A #> 8

Applying functions to nested dataframes with map

旧时模样 提交于 2019-12-25 08:13:08
问题 I am having an issue with nesting and mapping that I am not sure how to get around. I have a tibble with nested dataframes, as follows: > x # A tibble: 18 × 3 event.no data dr.dur <dbl> <list> <int> 1 1 <tibble [7 × 4]> 7 2 4 <tibble [123 × 4]> 123 3 5 <tibble [9 × 4]> 9 4 7 <tibble [14 × 4]> 14 5 10 <tibble [19 × 4]> 19 6 11 <tibble [220 × 4]> 220 7 12 <tibble [253 × 4]> 253 8 14 <tibble [153 × 4]> 153 9 15 <tibble [28 × 4]> 28 10 17 <tibble [169 × 4]> 169 11 18 <tibble [7 × 4]> 7 12 19

Gathering wide columns into multiple long columns using pivot_longer

最后都变了- 提交于 2019-12-25 01:49:24
问题 I have code which converts from wide to long with gather but I have to do this column by column. I want to use pivot_longer to gather wide multiple columns with into multiple long columns rather than column by column. For example, the columns hf_1, hf_2, hf_3, hf_4, hf_5, hf_6 need to be pivoted into 2 columns (hf_com - this column with values 1,2,3,4,5,6 from wide hf columns) and (hf_com_freq - this column with value 1). The same needs to occur for the columns ac_1, ac_2, ac_3, ac_4, ac_5,

dplyr table reconstructing/data wrangling

﹥>﹥吖頭↗ 提交于 2019-12-25 00:18:02
问题 I'm trying to create a variable that defines true vs false searches. The original dataset is located here: https://github.com/wikimedia-research/Discovery-Hiring-Analyst-2016/blob/master/events_log.csv.gz The basic scenario is that there are variables that define how many times a user (defined by ID- either session_id or uuid in the original dataset) performs a true search vs a false search, such that a visit is always preceded by a search, but a search does not have to be followed by a visit

R - tidyr - spread() - dealing with NA as column name

孤街醉人 提交于 2019-12-24 16:03:04
问题 I am spreading multiple categorical variables to Boolean columns using tidyr::spread() . As the data contains NAs, spread creates a new column without a name. What I'm looking for is a way to get rid off the NAs using a) a piping solution (I've tried select_() and '['() , but don't know how to refer to the NA column's name or index) or b) a custom function, which would be even better c) a way to simply not generate the NA columns, Hadleyverse compatible, if possible. Below is my current (and

stacking/melting multiple columns into multiple columns in R

蓝咒 提交于 2019-12-24 11:19:15
问题 I am trying to melt/stack/gather multiple specific columns of a dataframe into 2 columns, retaining all the others. I have tried many, many answers on stackoverflow without success (some below). I basically have a situation similar to this post here: Reshaping multiple sets of measurement columns (wide format) into single columns (long format) only many more columns to retain and combine. It is important to mention my year columns are factors and I have many, many more columns than the sample

Elegant solution for casting (spreading) multiple columns of character vectors

限于喜欢 提交于 2019-12-24 09:29:11
问题 I want to transforms a data frame with contact information with of a for a list of municipalities in which similar information such as e.g. phone number appears in multiple columns. I have tried using both reshape2::dcast() as well as tidyr::spread() , neither of which solves my problem. I have also checked other post of stack overflow e.g. Multiple column spread Have yet to find a solution which works. It seems to me that the problems should be fairly straightforward (and solvable with