tidyr | 易学教程

Remove duplicates by multiple conditions

阅读更多关于 Remove duplicates by multiple conditions

问题 I have data where an individual (Name) appears multiple times in a eggphase category. I would like for there only to be one sample per individual but I don't just want to keep the first one the R finds. I would like to keep the one where the group appears most in all other categories. Hopefully my example helps make this clear. library(tidyverse) myDF <- read.table(text="Tissue Food Eggphase Name Group wb fl after Kia a wb fl after Kia c wb wf before Kia b wb fl before Lucy c wb fl after Lucy

tidyr; %>% group_by() mutate(foo = fill() )

阅读更多关于 tidyr; %>% group_by() mutate(foo = fill() )

问题 I'm struggling to create a new variable to indicate what letter, LET , some groups, grp , within id, id , begin with. In the following I'll illustrate my question. I have data like this, library(dplyr); library(tidyr) df <- tibble(id = rep(0:1, c(7, 10)), grp = rep(c(0,1,0,1,2), c(3,4,2,5,3)), LET = rep(c('A', 'B', 'A', 'B', 'A', 'B'), c(1,4, 3, 3, 4, 2))) #> # A tibble: 17 x 3 #> id grp LET #> <int> <dbl> <chr> #> 1 0 0 A #> 2 0 0 B #> 3 0 0 B #> 4 0 1 B #> 5 0 1 B #> 6 0 1 A #> 7 0 1 A #> 8

tidyr; %>% group_by() mutate(foo = fill() )

阅读更多关于 tidyr; %>% group_by() mutate(foo = fill() )

tidyr; %>% group_by() mutate(foo = fill() )

阅读更多关于 tidyr; %>% group_by() mutate(foo = fill() )

Applying functions to nested dataframes with map

阅读更多关于 Applying functions to nested dataframes with map

问题 I am having an issue with nesting and mapping that I am not sure how to get around. I have a tibble with nested dataframes, as follows: > x # A tibble: 18 × 3 event.no data dr.dur <dbl> <list> <int> 1 1 <tibble [7 × 4]> 7 2 4 <tibble [123 × 4]> 123 3 5 <tibble [9 × 4]> 9 4 7 <tibble [14 × 4]> 14 5 10 <tibble [19 × 4]> 19 6 11 <tibble [220 × 4]> 220 7 12 <tibble [253 × 4]> 253 8 14 <tibble [153 × 4]> 153 9 15 <tibble [28 × 4]> 28 10 17 <tibble [169 × 4]> 169 11 18 <tibble [7 × 4]> 7 12 19

Gathering wide columns into multiple long columns using pivot_longer

阅读更多关于 Gathering wide columns into multiple long columns using pivot_longer

问题 I have code which converts from wide to long with gather but I have to do this column by column. I want to use pivot_longer to gather wide multiple columns with into multiple long columns rather than column by column. For example, the columns hf_1, hf_2, hf_3, hf_4, hf_5, hf_6 need to be pivoted into 2 columns (hf_com - this column with values 1,2,3,4,5,6 from wide hf columns) and (hf_com_freq - this column with value 1). The same needs to occur for the columns ac_1, ac_2, ac_3, ac_4, ac_5,

dplyr table reconstructing/data wrangling

阅读更多关于 dplyr table reconstructing/data wrangling

问题 I'm trying to create a variable that defines true vs false searches. The original dataset is located here: https://github.com/wikimedia-research/Discovery-Hiring-Analyst-2016/blob/master/events_log.csv.gz The basic scenario is that there are variables that define how many times a user (defined by ID- either session_id or uuid in the original dataset) performs a true search vs a false search, such that a visit is always preceded by a search, but a search does not have to be followed by a visit

R - tidyr - spread() - dealing with NA as column name

阅读更多关于 R - tidyr - spread() - dealing with NA as column name

问题 I am spreading multiple categorical variables to Boolean columns using tidyr::spread() . As the data contains NAs, spread creates a new column without a name. What I'm looking for is a way to get rid off the NAs using a) a piping solution (I've tried select_() and '['() , but don't know how to refer to the NA column's name or index) or b) a custom function, which would be even better c) a way to simply not generate the NA columns, Hadleyverse compatible, if possible. Below is my current (and

stacking/melting multiple columns into multiple columns in R

阅读更多关于 stacking/melting multiple columns into multiple columns in R

问题 I am trying to melt/stack/gather multiple specific columns of a dataframe into 2 columns, retaining all the others. I have tried many, many answers on stackoverflow without success (some below). I basically have a situation similar to this post here: Reshaping multiple sets of measurement columns (wide format) into single columns (long format) only many more columns to retain and combine. It is important to mention my year columns are factors and I have many, many more columns than the sample

Elegant solution for casting (spreading) multiple columns of character vectors

阅读更多关于 Elegant solution for casting (spreading) multiple columns of character vectors

问题 I want to transforms a data frame with contact information with of a for a list of municipalities in which similar information such as e.g. phone number appears in multiple columns. I have tried using both reshape2::dcast() as well as tidyr::spread() , neither of which solves my problem. I have also checked other post of stack overflow e.g. Multiple column spread Have yet to find a solution which works. It seems to me that the problems should be fairly straightforward (and solvable with