Concatenate row-wise across specific columns of dataframe

放肆的年华 提交于 2019-11-26 09:31:11

问题


I have a data frame with columns that, when concatenated (row-wise) as a string, would allow me to partition the data frame into a desired form.

> str(data)
\'data.frame\':   680420 obs. of  10 variables:
 $ A              : chr  \"2011-01-26\" \"2011-01-26\" \"2011-02-09\" \"2011-02-09\" ...
 $ B              : chr  \"2011-01-26\" \"2011-01-27\" \"2011-02-09\" \"2011-02-10\" ...
 $ C              : chr  \"2011-01-26\" \"2011-01-26\" \"2011-02-09\" \"2011-02-09\" ...
 $ D              : chr  \"AAA\" \"AAA\" \"BCB\" \"CCC\" ...
 $ E              : chr  \"A00001\" \"A00002\" \"B00002\" \"B00001\" ...
 $ F              : int  9 9 37 37 37 37 191 191 191 191 ...
 $ G              : int  NA NA NA NA NA NA NA NA NA NA ...
 $ H              : int  4 4 4 4 4 4 4 4 4 4 ...

For each row, I would like to concatenate the data in columns F, E, D, and C into a string (with the underscore character as separator). Below is my unsuccessful attempt at this:

data$id <- sapply(as.data.frame(cbind(data$F,data$E,data$D,data$C)), paste, sep=\"_\")

And below is the undesired result:

  > str(data)
    \'data.frame\':   680420 obs. of  10 variables:
     $ A              : chr  \"2011-01-26\" \"2011-01-26\" \"2011-02-09\" \"2011-02-09\" ...
     $ B              : chr  \"2011-01-26\" \"2011-01-27\" \"2011-02-09\" \"2011-02-10\" ...
     $ C              : chr  \"2011-01-26\" \"2011-01-26\" \"2011-02-09\" \"2011-02-09\" ...
     $ D              : chr  \"AAA\" \"AAA\" \"BCB\" \"CCC\" ...
     $ E              : chr  \"A00001\" \"A00002\" \"B00002\" \"B00001\" ...
     $ F              : int  9 9 37 37 37 37 191 191 191 191 ...
     $ G              : int  NA NA NA NA NA NA NA NA NA NA ...
     $ H              : int  4 4 4 4 4 4 4 4 4 4 ...
     $ id             : chr [1:680420, 1:4] \"9\" \"9\" \"37\" \"37\" ...
      ..- attr(*, \"dimnames\")=List of 2
      .. ..$ : NULL
      .. ..$ : chr  \"V1\" \"V2\" \"V3\" \"V4\"

Any help would be greatly appreciated.


回答1:


Try

 data$id <- paste(data$F, data$E, data$D, data$C, sep="_")

instead. The beauty of vectorized code is that you do not need row-by-row loops, or loop-equivalent *apply functions.

Edit Even better is

 data <- within(data,  id <- paste(F, E, D, C, sep=""))



回答2:


Use unite of tidyr package:

require(tidyr)
data <- data %>% unite(id, F, E, D, C, sep = '_')

First parameter is the desired name, all next up to sep - columns to concatenate.




回答3:


Either stringr::str_c() or paste() will work.

require(stringr)
data <- within(data, str_c(F,E,D,C, sep="_")

or else

data <- within(data, paste(F,E,D,C, sep="_")

(stringr is better performance on large datasets)



来源:https://stackoverflow.com/questions/6308933/concatenate-row-wise-across-specific-columns-of-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!