Assigning results of strsplit to multiple columns of data frame

烂漫一生 提交于 2019-12-22 08:48:14

问题


I am trying to split a character vector into three different vectors, inside a data frame.

My data is something like:

> df <- data.frame(filename = c("Author1 (2010) Title of paper", 
                                "Author2 et al (2009) Title of paper",
                                "Author3 & Author4 (2004) Title of paper"),
                   stringsAsFactors = FALSE)

And I would like to split those 3 informations (authors, year, title) into three different columns, so that it would be:

> df
                          filename             author  year   title
 1           Author1 (2010) Title1            Author1  2010  Title1
 2     Author2 et al (2009) Title2      Author2 et al  2009  Title2
 3 Author3 & Author4 (2004) Title3  Author3 & Author4  2004  Title3

I have used strsplit to split each filename in a vector of 3 elements:

 df$temp <- strsplit(df$filename, " \\(|\\) ")

But now, I can't find a way to put each element in a separate column. I can access a specific information like that:

> df$temp[[2]][1]
[1] "Author2 et al"

but can't find how to put it in the other columns

> df$author <- df$temp[[]][1]
Error

回答1:


With the tidyr package, here's a separate solution:

separate(df, "filename", c("Author","Year","Title"), sep=" \\(|\\) "), remove=F)
#                                  filename            Author
# 1           Author1 (2010) Title of paper           Author1
# 2     Author2 et al (2009) Title of paper     Author2 et al
# 3 Author3 & Author4 (2004) Title of paper Author3 & Author4
#   Year          Title
# 1 2010 Title of paper
# 2 2009 Title of paper
# 3 2004 Title of paper

Leading and trailing spaces have been accounted for




回答2:


You could try tstrsplit from the devel version of data.table

library(data.table)#v1.9.5+
 setDT(df)[, c('author', 'year', 'title') :=tstrsplit(filename, ' \\(|\\) ')]
df
#                                  filename             author year
#1:           Author1 (2010) Title of paper           Author1  2010
#2:     Author2 et al (2009) Title of paper     Author2 et al  2009
#3: Author3 & Author4 (2004) Title of paper Author3 & Author4  2004
#             title
#1:  Title of paper
#2:  Title of paper
#3:  Title of paper

Edit: Included OP's split pattern to remove the white spaces.




回答3:


result <- cbind(df, do.call("rbind", strsplit(df$filename, " \\(|\\) ")))
colnames(result)[2:4] <- c("author", "year", "title")



回答4:


There is a base t-method (transpose) for dataframes:

 res <- t( data.frame(  strsplit(df$filename, " \\(|\\) ") ))
 colnames(res) <- c("author", "year", "title")
 rownames(res) <- seq_along(rownames(res) )
 res
#--------------
  author              year   title           
1 "Author1"           "2010" "Title of paper"
2 "Author2 et al"     "2009" "Title of paper"
3 "Author3 & Author4" "2004" "Title of paper"


来源:https://stackoverflow.com/questions/31357963/assigning-results-of-strsplit-to-multiple-columns-of-data-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!