问题
I have a dataset that looks something like this
site <- c("A", "B", "C", "D", "E")
D01_1 <- c(1, 0, 0, 0, 1)
D01_2 <- c(1, 1, 0, 1, 1)
D02_1 <- c(1, 0, 1, 0, 1)
D02_2 <- c(0, 1, 0, 0, 1)
D03_1 <- c(1, 1, 0, 0, 0)
D03_2 <- c(0, 1, 0, 0, 1)
df <- data.frame(site, D01_1, D01_2, D02_1, D02_2, D03_1, D03_2)
I am trying to unite the D0x_1
and D0x_2
columns so that the values in the columns are separated by a slash. I can do this with the following code and it works just fine:
library(dplyr)
library(tidyr)
df.unite <- df %>%
unite(D01, D01_1, D01_2, sep = "/", remove = TRUE) %>%
unite(D02, D02_1, D02_2, sep = "/", remove = TRUE) %>%
unite(D03, D03_1, D03_2, sep = "/", remove = TRUE)
...but the problem is that it requires me to type out each unite
pair multiple times and it is unwieldy across the large number of columns in my dataset. Is there a way in dplyr
to unite across similarly patterned column names and then loop across the columns? unite_each
doesn't seem to exist.
回答1:
Two options, which are really the same thing rearranged.
Option 1. Nested calls
First, you can use lapply
to apply unite_
(the standard evaluation version to which you can pass strings) programmatically across columns. To do so, you'll need to build a list of names for it to use, and then wrap the lapply
in do.call(cbind
to catch columns, and cbind
site
back to it. Altogether:
cols <- unique(substr(names(df)[-1], 1, 3))
cbind(site = df$site, do.call(cbind,
lapply(cols, function(x){unite_(df, x, grep(x, names(df), value = TRUE),
sep = '/', remove = TRUE) %>% select_(x)})
))
# site D01 D02 D03
# 1 A 1/1 1/0 1/0
# 2 B 0/1 0/1 1/1
# 3 C 0/0 1/0 0/0
# 4 D 0/1 0/0 0/0
# 5 E 1/1 1/1 0/1
Option 2: Chained
Alternately, if you really like pipes, you can actually hack the whole thing into a chain (lapply
included!), swapping out a few of the base functions for dplyr
ones:
df %>% select(-site) %>% names() %>% substr(1,3) %>% unique() %>%
lapply(function(x){unite_(df, x, grep(x, names(df), value = TRUE),
sep = '/', remove = TRUE) %>% select_(x)}) %>%
bind_cols() %>% mutate(site = as.character(df$site)) %>% select(site, starts_with('D'))
# Source: local data frame [5 x 4]
#
# site D01 D02 D03
# (chr) (chr) (chr) (chr)
# 1 A 1/1 1/0 1/0
# 2 B 0/1 0/1 1/1
# 3 C 0/0 1/0 0/0
# 4 D 0/1 0/0 0/0
# 5 E 1/1 1/1 0/1
Check out the intermediate products to see how it fits together, but it's pretty much the same logic as the base approach.
回答2:
This is a solution with base functions. First, I looked for indexes of ***_1 in columns. I also created names for columns for the final process, using gsub()
and unique()
. The sapply part pastes two columns with /
. If x = 1, then, x +1 = 2. So you always choose two columns next to each other and handle the pasting job. Then, I added site
with cbind()
and created a data frame. The last job is to assign column names.
library(magrittr)
ind <- grep(pattern = "1$", x = names(df))
names <- unique(gsub(pattern = "_\\d+$",
replacement = "", x = names(df)))
sapply(ind, function(x){
foo <- paste(df[,x], df[, x+1], sep = "/")
foo
}) %>%
cbind(as.character(df$site), .) %>%
data.frame -> out
names(out) <- names
# site D01 D02 D03
#1 A 1/1 1/0 1/0
#2 B 0/1 0/1 1/1
#3 C 0/0 1/0 0/0
#4 D 0/1 0/0 0/0
#5 E 1/1 1/1 0/1
回答3:
You can use an easy base R approach, too:
cols <- split(names(df)[-1], sub("_\\d+", "", names(df)[-1]))
cbind(df[1], sapply(names(cols), function(col) {
do.call(paste, c(df[cols[[col]]], sep = "/"))
}))
# site D01 D02 D03
#1 A 1/1 1/0 1/0
#2 B 0/1 0/1 1/1
#3 C 0/0 1/0 0/0
#4 D 0/1 0/0 0/0
#5 E 1/1 1/1 0/1
来源:https://stackoverflow.com/questions/36002235/tidyrunite-across-column-patterns