Tokenizing an infix string in Java

醉酒当歌 提交于 2019-11-30 17:20:23

问题


I'm implementing the Shunting Yard Algorithm in Java, as a side project to my AP Computer Science class. I've implemented a simple one in Javascript, with only basic arithmetic expressions (addition, subtraction, multiplication, division, exponentiation). To split that into an array, what I did was find each of the operators (+-*/^), as well as numbers and parentheses, and I put a space around them, and then I split it into an array. For example, the infix string 4+(3+2) would be made into 4 + ( 3 + 2 ), and then split on whitespace.

However, I feel that this method is very slow, and it gets increasingly harder and inefficient to implement as you start to add mathematical functions, such as sine, cosine, tangent, absolute value, and others.

What would be the best way to split a string like sin(4+3)-8 into an array ["sin","(" 4,"+",3,")","-",8]?

I could use regex for this, but I don't really understand them well, and I'm trying to learn them, so if that would be the best solution to them, could the answerer please explain what it does?


回答1:


Try .spliting on the regex

(?<=[^\.a-zA-Z\d])|(?=[^\.a-zA-Z\d])

It will split the string at any place that is either preceded or followed by a non-alphanumeric character or period.

  • (?<=[^\.a-zA-Z\d]) is a positive lookbehind. It matches the place between two characters, if the preceding string matches the sub-regex contained within (?<=...).
    • [^\.a-zA-Z\d] is a negated character class. It matches a single character that is not contained within [^...].
      • \. matches the character ..
      • a-z matches any lowercase character between a and z.
      • A-Z is the same, but for uppercase.
      • \d is the equivalent of [0-9], so it matches any digit.
  • | is the equivalent of an "or". It makes the regex match either the preceding half of the regex or the following half.
  • (?=[^\.a-zA-Z\d]) is the same as the first half of the regex, except that it is a positive lookahead. It matches the place between two characters, if the following string matches the sub-regex contained within (?=...).

You can implement this regex in java like this:

String str = "sin(4+3)-8";
String[] parts = str.split("(?<=[^\\.a-zA-Z\\d])|(?=[^\\.a-zA-Z\\d])");

Result:

["sin","(" 4,"+",3,")","-","8"]


来源:https://stackoverflow.com/questions/21408570/tokenizing-an-infix-string-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!