问题
I have a df with 2 columns. I need to combine Col1 and Col2 in Col3 - alternate text separated by ">" a1-b1;a2-b2;a3-b3;...
Example
| Col1 | Col2 | Col3 |
| abcd > de > efg | ppppp > ppt > pp | abcd-ppppp > de-ppt > efg-pp |
| hij > kl > iiii | aaa > bbb > hhh | hij-aaa > kl-bbb > iiii-hhh |
| aa | fff | aa-fff |
| a > bbb | pp > a | a-pp > bbb-a |
....
How can I do that in R programming? Thanks
回答1:
This was a pain in the ass to solve. In the future, for our sanity please consider how you output your data. This could have been easily solved if, however the data was generated, you consider downstream analysis. Anyway enough whinging here is the solution.
Lets generate your data:
Col1 <- c("abcd > de > efg", "hij > kl > iiii", "aa", "a > bbb")
Col2 <- c("ppppp > ppt > pp", "aaa > bbb > hhh", "fff", "pp > a")
dat <- data.frame(Col1, Col2, stringsAsFactors = FALSE)
Next using apply we strip, separate and flatten Col1 and Col2 and add the first separator -:
l1 <- apply(dat, 2, function(x) trimws(unlist(strsplit(x, split = ">"))))
l2 <- apply(l1, 1, function(x) paste0(x[1], "-", x[2]))
The next part was surprisingly difficult, after much googling I found a solution (a hack) to split a list of characters by a numeric vector.
#thanks: https://techoverflow.net/2012/11/10/r-count-occurrences-of-character-in-string/
#gets occurrences of ">" for later use
countCharOccurrences <- function(char, s) {
s2 <- gsub(char,"",s)
return (nchar(s) - nchar(s2))
}
o <- countCharOccurrences(">", dat$Col1)+1
df <- as.data.frame(l2, stringsAsFactors = FALSE)
Split df by the occurrences of ">" (i.e the values of o):
# Thanks to this SO answer:
# https://stackoverflow.com/questions/27132290/split-dataframe-by-row-number-in-r
l2a <- split(df, cumsum(c(TRUE,(1:nrow(df) %in% cumsum(o))[-nrow(df)])))
Finally, we collapse list of dataframes and add the final separator >:
l3 <- lapply(l2a, function(x) paste(x[,1], collapse = " > "))
Then combine with your starting dataframe:
dat$Col3 <- l3
Col1 Col2 Col3
1 abcd > de > efg ppppp > ppt > pp abcd-ppppp > de-ppt > efg-pp
2 hij > kl > iiii aaa > bbb > hhh hij-aaa > kl-bbb > iiii-hhh
3 aa fff aa-fff
4 a > bbb pp > a a-pp > bbb-a
Tada!
edit: I had forgotten l3 is a list of objects. You need to use unlist to flatten them like this:
dat$Col3 <- unlist(l3)
来源:https://stackoverflow.com/questions/49910473/concatenate-alternate-characters-from-different-columns-in-r-programming