splitting string expression at multiple delimiters in R

I am trying to parse some math expressions in R, and I would therefore like to split them at multiple delimiters +,-,*,/, -(, +(, ), )+ etc so that I get the list of symbolic variables contained in the expression.

so e.g. I would like 2*(x1+x2-3*x3) to return "x1", "x2", "x3"

Is there a good way of doing it? Thanks.

There's probably a cleaner way of doing this, but does this cover your use case(s)?

eqn = "3 + 2*(x1+x2-3*x3 - x1/x3) - 5"

vars = unlist(strsplit(eqn, split="[-+*/)( ]|[^x][0-9]+|^[0-9]+"))
vars = vars[nchar(vars)>0]  # To remove empty strings

vars
[1] "x1" "x2" "x3" "x1" "x3"

If you only want each unique value to show up once, you can do:

vars = unlist(strsplit(eqn, split="[-+*/)( ]|[^x][0-9]+|^[0-9]+"))
vars = unique(vars[nchar(vars)>0])

vars
[1] "x1" "x2" "x3"

MrFlick

Rather than using regular expressions, you could use the R parser to find particular symbols in your expression. If I recycle the find_vars() function form this answer. You could do

extract_vars <- function(x) {
    find_vars(parse(text=x)[[1]])$found
}
expr <- "2*(x1+x2-3*x3)"
extract_vars(expr)
# [1] "x1" "x2" "x3"

Of course this method assumes that all the math expressions that your users enter would also be syntactically-valid R code.

DatamineR

More generally you can use this regex: "([A-z]\d)"

library(stringr)
f <- "2*(x1+x2-3*x3)"
pattern <- "([A-z]\\d)"
str_extract_all(f, pattern)
[[1]]
[1] "x1" "x2" "x3"

More generally use this pattern (as its symbolic math you may have other variables): "([A-z]\d)"

library(stringr)
# A little different example
var <- "2x1*(x1+x2-3*x3)*y1"
pattern <- "([A-z]\\d)"
str_extract_all(var,pattern)  
[[1]]
[1] "x1" "x1" "x2" "x3" "y1"

来源：https://stackoverflow.com/questions/27892140/splitting-string-expression-at-multiple-delimiters-in-r

标签

regex

strsplit