conditional string splitting in R (using tidyr)

浪尽此生 提交于 2019-12-05 23:43:49

Assuming you may or may not have a separator and that cost and reed aren't necessarily mutually exclusive, why not search for the specific string instead of the separator?

Example:

library(stringr)
X <- data.frame(value = c(1,2,3,4), 
                variable = c("cost", "cost", "reed_cost", "reed_cost"))
X$cost <- str_detect(X$variable,"cost")
X$reed <- str_detect(X$variable,"reed") 

You could try:

X$variable <- ifelse(!grepl("_", X$variable), paste0("_", X$variable), as.character(X$variable))

 separate(X, variable, c("Policy-cost", "Reed"), "_")
 # value Policy-cost Reed
 #1     1             cost
 #2     2             cost
 #3     3        reed cost
 #4     4        reed cost

Or

X$variable <-  gsub("\\b(?=[A-Za-z]+\\b)", "_", X$variable, perl=T)
 X$variable
#[1] "_cost"     "_cost"     "reed_cost" "reed_cost"

 separate(X, variable, c("Policy-cost", "Reed"), "_")

Explanation

\\b(?=[A-Za-z]+\\b) : matches a word boundary \\b and looks ahead for characters followed by word boundary. The third and fourth elements does not match, so it was not replaced.

Another approach with base R:

cbind(X["value"], 
      setNames(as.data.frame(t(sapply(strsplit(as.character(X$variable), "_"), 
                                      function(x) 
                                        if (length(x) == 1) c("", x) 
                                        else x))), 
               c("Policy-cost", "Reed")))

#   value Policy-cost Reed
# 1     1             cost
# 2     2             cost
# 3     3        reed cost
# 4     4        reed cost
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!