问题
I have a df with 2 columns. I need to combine Col1 and Col2 in Col3 - alternate text separated by ">" a1-b1;a2-b2;a3-b3;...
Example
| Col1 | Col2 | Col3 |
| abcd > de > efg | ppppp > ppt > pp | abcd-ppppp > de-ppt > efg-pp |
| hij > kl > iiii | aaa > bbb > hhh | hij-aaa > kl-bbb > iiii-hhh |
| aa | fff | aa-fff |
| a > bbb | pp > a | a-pp > bbb-a |
....
How can I do that in R programming? Thanks
回答1:
This was a pain in the ass to solve. In the future, for our sanity please consider how you output your data. This could have been easily solved if, however the data was generated, you consider downstream analysis. Anyway enough whinging here is the solution.
Lets generate your data:
Col1 <- c("abcd > de > efg", "hij > kl > iiii", "aa", "a > bbb")
Col2 <- c("ppppp > ppt > pp", "aaa > bbb > hhh", "fff", "pp > a")
dat <- data.frame(Col1, Col2, stringsAsFactors = FALSE)
Next using apply
we strip, separate and flatten Col1
and Col2
and add the first separator -
:
l1 <- apply(dat, 2, function(x) trimws(unlist(strsplit(x, split = ">"))))
l2 <- apply(l1, 1, function(x) paste0(x[1], "-", x[2]))
The next part was surprisingly difficult, after much googling I found a solution (a hack) to split a list of characters by a numeric vector.
#thanks: https://techoverflow.net/2012/11/10/r-count-occurrences-of-character-in-string/
#gets occurrences of ">" for later use
countCharOccurrences <- function(char, s) {
s2 <- gsub(char,"",s)
return (nchar(s) - nchar(s2))
}
o <- countCharOccurrences(">", dat$Col1)+1
df <- as.data.frame(l2, stringsAsFactors = FALSE)
Split df
by the occurrences of ">" (i.e the values of o
):
# Thanks to this SO answer:
# https://stackoverflow.com/questions/27132290/split-dataframe-by-row-number-in-r
l2a <- split(df, cumsum(c(TRUE,(1:nrow(df) %in% cumsum(o))[-nrow(df)])))
Finally, we collapse list of dataframes and add the final separator >
:
l3 <- lapply(l2a, function(x) paste(x[,1], collapse = " > "))
Then combine with your starting dataframe:
dat$Col3 <- l3
Col1 Col2 Col3
1 abcd > de > efg ppppp > ppt > pp abcd-ppppp > de-ppt > efg-pp
2 hij > kl > iiii aaa > bbb > hhh hij-aaa > kl-bbb > iiii-hhh
3 aa fff aa-fff
4 a > bbb pp > a a-pp > bbb-a
Tada!
edit: I had forgotten l3
is a list of objects. You need to use unlist
to flatten them like this:
dat$Col3 <- unlist(l3)
来源:https://stackoverflow.com/questions/49910473/concatenate-alternate-characters-from-different-columns-in-r-programming