splitting string expression at multiple delimiters in R

匆匆过客 提交于 2019-12-02 20:26:21

问题


I am trying to parse some math expressions in R, and I would therefore like to split them at multiple delimiters +,-,*,/, -(, +(, ), )+ etc so that I get the list of symbolic variables contained in the expression.

so e.g. I would like 2*(x1+x2-3*x3) to return "x1", "x2", "x3"

Is there a good way of doing it? Thanks.


回答1:


There's probably a cleaner way of doing this, but does this cover your use case(s)?

eqn = "3 + 2*(x1+x2-3*x3 - x1/x3) - 5"

vars = unlist(strsplit(eqn, split="[-+*/)( ]|[^x][0-9]+|^[0-9]+"))
vars = vars[nchar(vars)>0]  # To remove empty strings

vars
[1] "x1" "x2" "x3" "x1" "x3"

If you only want each unique value to show up once, you can do:

vars = unlist(strsplit(eqn, split="[-+*/)( ]|[^x][0-9]+|^[0-9]+"))
vars = unique(vars[nchar(vars)>0])

vars
[1] "x1" "x2" "x3"



回答2:


Rather than using regular expressions, you could use the R parser to find particular symbols in your expression. If I recycle the find_vars() function form this answer. You could do

extract_vars <- function(x) {
    find_vars(parse(text=x)[[1]])$found
}
expr <- "2*(x1+x2-3*x3)"
extract_vars(expr)
# [1] "x1" "x2" "x3"

Of course this method assumes that all the math expressions that your users enter would also be syntactically-valid R code.




回答3:


More generally you can use this regex: "([A-z]\d)"

library(stringr)
f <- "2*(x1+x2-3*x3)"
pattern <- "([A-z]\\d)"
str_extract_all(f, pattern)
[[1]]
[1] "x1" "x2" "x3"



回答4:


More generally use this pattern (as its symbolic math you may have other variables): "([A-z]\d)"

library(stringr)
# A little different example
var <- "2x1*(x1+x2-3*x3)*y1"
pattern <- "([A-z]\\d)"
str_extract_all(var,pattern)  
[[1]]
[1] "x1" "x1" "x2" "x3" "y1"


来源:https://stackoverflow.com/questions/27892140/splitting-string-expression-at-multiple-delimiters-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!