Expanding columns associated with a categorical variable into multiple columns with dplyr/tidyr while retaining id variable [duplicate]

≯℡__Kan透↙ 提交于 2019-12-11 07:39:55

问题


I have a data.frame that looks like this:

dfTall <- frame_data(
    ~id, ~x, ~y, ~z,
      1, "a", 4, 5,
      1, "b", 6, 5,
      2, "a", 5, 4,
      2, "b", 1, 9)

I want to turn it into this:

dfWide <- frame_data(
    ~id, ~y_a, ~y_b, ~z_a, ~z_b,
      1,    4,    6,    5,    5,
      2,    5,    1,    4,    9)

Currently, I'm doing this

dfTall %>%
    split(., .$x) %>%
    mapply(function(df,name) 
        {df$x <- NULL; names(df) <- paste(names(df), name, sep='_'); df}, 
        SIMPLIFY=FALSE, ., names(.)) %>%
    bind_cols() %>%
    select(-id_b) %>%
    rename(id = id_a)

In practice, I will have a larger number of numeric columns that need to be expanded (i.e., not just y and z). My current solution works, but it has issues, like the fact that multiple copies of the id variable get added into the final data.frame and need to be removed.

Can this expansion be done using a function from tidyr such as spread?


回答1:


It can be done with spread but not in a single step, as it involves multiple columns as values; You can firstly gather the value columns, unite the headers manually and then spread:

library(dplyr)
library(tidyr)

dfTall %>% 
    gather(col, val, -id, -x) %>% 
    unite(key, col, x) %>% 
    spread(key, val)

# A tibble: 2 x 5
#     id   y_a   y_b   z_a   z_b
#* <dbl> <dbl> <dbl> <dbl> <dbl>
#1     1     4     6     5     5
#2     2     5     1     4     9

If you use data.table, dcast supports cast multiple value columns:

library(data.table)
dcast(setDT(dfTall), id ~ x, value.var = c('y', 'z'))

#   id y_a y_b z_a z_b
#1:  1   4   6   5   5
#2:  2   5   1   4   9 


来源:https://stackoverflow.com/questions/45872076/expanding-columns-associated-with-a-categorical-variable-into-multiple-columns-w

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!