问题
I observe the following character:
l <- "mod, range1 = seq(-m, n, 0.1), range2 = seq(-2, 2, 0.1), range3 = seq(-2, 2, 0.1)"
Using regular expressions in R I desire to split l
in the following structure:
[1] "mod" "range1 = seq(-m, n, 0.1)"
[3] "range2 = seq(-2, 2, 0.1)" "range3 = seq(-2, 2, 0.1)"
Unfortunetely, I didn't find a proper way to overcome the problem, yet. Anyone has an idea how is it possible to acquire such an elegeant split?
回答1:
I really doubt you can do it with regular expression. You are trying to parse your string and so you need a parser, which is generally more powerful than a regex. I don't think it's general enough, but you can take advantage of the R parser and the alist
class. Try:
res<-eval(parse(text=paste0("alist(",l,")")))
paste0(names(res),ifelse(names(res)!="","=",""),as.character(res))
#[1] "mod" "range1=seq(-m, n, 0.1)" "range2=seq(-2, 2, 0.1)"
#[4] "range3=seq(-2, 2, 0.1)"
Keep in mind that the regex proposed solutions fail if there are nested brackets. Try them and mine with:
l<-"mod, range1 = seq(-m, n, 0.1), range2 = seq(-2, exp(2), 0.1), range3 = seq(-2, 2, 0.1)"
to understand what I mean.
回答2:
Based on this regex, you can use str_extract_all
from stringr
,
library(stringr)
str_extract_all(l, '(?:[^,(]|\\([^)]*\\))+')
#[[1]]
#[1] "mod" " range1 = seq(-m, n, 0.1)" " range2 = seq(-2, 2, 0.1)" " range3 = seq(-2, 2, 0.1)"
or
trimws(unlist(str_extract_all(l, '(?:[^,(]|\\([^)]*\\))+')))
#[1] "mod" "range1 = seq(-m, n, 0.1)" "range2 = seq(-2, 2, 0.1)" "range3 = seq(-2, 2, 0.1)"
回答3:
Here is a base R
option based on the pattern
showed in the OP's post. Here we match all the characters starting from (
to the )
, skip it and split by ,
followed by space.
strsplit(l, "\\([^)]+\\)(*SKIP)(*F)|, ", perl = TRUE)[[1]]
#[1] "mod" "range1 = seq(-m, n, 0.1)"
#[3] "range2 = seq(-2, 2, 0.1)" "range3 = seq(-2, 2, 0.1)"
Update
Using @nicola's 'l'
strsplit(l, ", (?=[[:alnum:]]+\\s+\\=)", perl = TRUE)[[1]]
#[1] "mod" "range1 = seq(-m, n, 0.1)"
#[3] "range2 = seq(-2, exp(2), 0.1)" "range3 = seq(-2, 2, 0.1)"
and the previous 'l'
strsplit(l, ", (?=[[:alnum:]]+\\s+\\=)", perl = TRUE)[[1]]
#[1] "mod" "range1 = seq(-m, n, 0.1)"
#[3] "range2 = seq(-2, 2, 0.1)" "range3 = seq(-2, 2, 0.1)"
来源:https://stackoverflow.com/questions/37811450/spliting-the-character-into-parts