I am trying to parse some math expressions in R, and I would therefore like to split them at multiple delimiters +,-,*,/, -(, +(, ), )+ etc so that I get the list of symbolic variables contained in the expression.
so e.g. I would like 2*(x1+x2-3*x3) to return "x1", "x2", "x3"
Is there a good way of doing it? Thanks.
There's probably a cleaner way of doing this, but does this cover your use case(s)?
eqn = "3 + 2*(x1+x2-3*x3 - x1/x3) - 5"
vars = unlist(strsplit(eqn, split="[-+*/)( ]|[^x][0-9]+|^[0-9]+"))
vars = vars[nchar(vars)>0] # To remove empty strings
vars
[1] "x1" "x2" "x3" "x1" "x3"
If you only want each unique value to show up once, you can do:
vars = unlist(strsplit(eqn, split="[-+*/)( ]|[^x][0-9]+|^[0-9]+"))
vars = unique(vars[nchar(vars)>0])
vars
[1] "x1" "x2" "x3"
Rather than using regular expressions, you could use the R parser to find particular symbols in your expression. If I recycle the find_vars()
function form this answer. You could do
extract_vars <- function(x) {
find_vars(parse(text=x)[[1]])$found
}
expr <- "2*(x1+x2-3*x3)"
extract_vars(expr)
# [1] "x1" "x2" "x3"
Of course this method assumes that all the math expressions that your users enter would also be syntactically-valid R code.
More generally you can use this regex: "([A-z]\d)"
library(stringr)
f <- "2*(x1+x2-3*x3)"
pattern <- "([A-z]\\d)"
str_extract_all(f, pattern)
[[1]]
[1] "x1" "x2" "x3"
More generally use this pattern (as its symbolic math you may have other variables): "([A-z]\d)"
library(stringr)
# A little different example
var <- "2x1*(x1+x2-3*x3)*y1"
pattern <- "([A-z]\\d)"
str_extract_all(var,pattern)
[[1]]
[1] "x1" "x1" "x2" "x3" "y1"
来源:https://stackoverflow.com/questions/27892140/splitting-string-expression-at-multiple-delimiters-in-r