How to combine multiple columns of an r data frame into a single column that is a list

為{幸葍}努か 提交于 2021-01-28 13:32:28

问题


I would like to combine multiple columns that I have in a data frame into one column in that data frame that is a list. For example, I have the following data frame ingredients:

name1 name2 imgID attr1 attr2 attr3...
Item1 ItemID1 Img1 water chocolate soy...
Item2 ItemID2 Img2 cocoa spice milk...

I would like to combine the attr columns into one column that is a comma-separated list of those items and if possible have them appear in the following format:

name1 name2 imgID attrs
Item1 ItemID1 Img1 c("water", "chocolate", "soy", ...)
Item2 ItemID2 Img2 c("cocoa", "spice", "milk", ...)

Is there a succinct way to write the code using a paste or join that allows me to call the columns of the data frame as ingredients[4:50] rather than each one by name? Is there also a way to not include NA or NULL values in that list?


回答1:


You could use tidyr::nest, though you'll probably want to simplify the nested data frames to character vectors afterwards, e.g.

library(tidyverse)

items <- tibble(name1 = c("Item1", "Item2"), 
                name2 = c("ItemID1", "ItemID2"), 
                imgID = c("Img1", "Img2"), 
                attr1 = c("water", "cocoa"), 
                attr2 = c("chocolate", "spice"), 
                attr3 = c("soy", "milk"))

items_nested <- items %>% 
    nest(contains('attr'), .key = 'attr') %>% 
    mutate(attr = map(attr, simplify))

items_nested
#> # A tibble: 2 x 4
#>   name1 name2   imgID attr     
#>   <chr> <chr>   <chr> <list>   
#> 1 Item1 ItemID1 Img1  <chr [3]>
#> 2 Item2 ItemID2 Img2  <chr [3]>

Other options include reshaping to long with tidyr::gather, grouping by all but the new columns, and aggregating the value column into a list in a more dplyr-focused style:

items %>% 
    gather(attr_num, attr, contains('attr')) %>% 
    group_by_at(vars(-attr_num, -attr)) %>% 
    summarise(attr = list(attr)) %>% 
    ungroup()

or uniteing the attr* columns and then separating them within a list column with strsplit in a more string-focused style:

items %>% 
    unite(attr, contains('attr')) %>% 
    mutate(attr = strsplit(attr, '_'))

or using purrr::transpose and tidyselect in a list-focused style:

items %>% 
    mutate(attr = transpose(select(., contains('attr')))) %>% 
    select(-matches('attr.'))

All options return the same thing, at least on the sample data. Further cleanup, e.g. dropping NAs, can be done by iterating over the new column with lapply/purrr::map.



来源:https://stackoverflow.com/questions/48837024/how-to-combine-multiple-columns-of-an-r-data-frame-into-a-single-column-that-is

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!