tidyr::unite across column patterns

戏子无情 提交于 2019-12-05 04:38:22

Two options, which are really the same thing rearranged.


Option 1. Nested calls

First, you can use lapply to apply unite_ (the standard evaluation version to which you can pass strings) programmatically across columns. To do so, you'll need to build a list of names for it to use, and then wrap the lapply in do.call(cbind to catch columns, and cbind site back to it. Altogether:

cols <- unique(substr(names(df)[-1], 1, 3))
cbind(site = df$site, do.call(cbind,
        lapply(cols, function(x){unite_(df, x, grep(x, names(df), value = TRUE), 
                                        sep = '/', remove = TRUE) %>% select_(x)})
        ))

#   site D01 D02 D03
# 1    A 1/1 1/0 1/0
# 2    B 0/1 0/1 1/1
# 3    C 0/0 1/0 0/0
# 4    D 0/1 0/0 0/0
# 5    E 1/1 1/1 0/1

Option 2: Chained

Alternately, if you really like pipes, you can actually hack the whole thing into a chain (lapply included!), swapping out a few of the base functions for dplyr ones:

df %>% select(-site) %>% names() %>% substr(1,3) %>% unique() %>%
  lapply(function(x){unite_(df, x, grep(x, names(df), value = TRUE), 
                            sep = '/', remove = TRUE) %>% select_(x)}) %>%
  bind_cols() %>% mutate(site = as.character(df$site)) %>% select(site, starts_with('D'))

# Source: local data frame [5 x 4]
# 
#    site   D01   D02   D03
#   (chr) (chr) (chr) (chr)
# 1     A   1/1   1/0   1/0
# 2     B   0/1   0/1   1/1
# 3     C   0/0   1/0   0/0
# 4     D   0/1   0/0   0/0
# 5     E   1/1   1/1   0/1

Check out the intermediate products to see how it fits together, but it's pretty much the same logic as the base approach.

This is a solution with base functions. First, I looked for indexes of ***_1 in columns. I also created names for columns for the final process, using gsub() and unique(). The sapply part pastes two columns with /. If x = 1, then, x +1 = 2. So you always choose two columns next to each other and handle the pasting job. Then, I added site with cbind() and created a data frame. The last job is to assign column names.

library(magrittr)

ind <- grep(pattern = "1$", x = names(df))

names <- unique(gsub(pattern = "_\\d+$",
                replacement = "", x = names(df)))

sapply(ind, function(x){
        foo <- paste(df[,x], df[, x+1], sep = "/")
        foo
       }) %>%
cbind(as.character(df$site), .) %>%
data.frame -> out

names(out) <- names

#  site D01 D02 D03
#1    A 1/1 1/0 1/0
#2    B 0/1 0/1 1/1
#3    C 0/0 1/0 0/0
#4    D 0/1 0/0 0/0
#5    E 1/1 1/1 0/1

You can use an easy base R approach, too:

cols <- split(names(df)[-1], sub("_\\d+", "", names(df)[-1]))

cbind(df[1], sapply(names(cols), function(col) {
  do.call(paste, c(df[cols[[col]]], sep = "/"))
}))
#  site D01 D02 D03
#1    A 1/1 1/0 1/0
#2    B 0/1 0/1 1/1
#3    C 0/0 1/0 0/0
#4    D 0/1 0/0 0/0
#5    E 1/1 1/1 0/1
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!