How to tokenize an String like in lexer in java?

心不动则不痛 提交于 2019-12-24 06:31:14

问题


Lines of code which i want to tokenize according to lexer rules.

  String input1 = input.replaceAll("\\s+"," ");

     List<String> uncleanList = Arrays.asList(input1.split(" "));

I put this code in String. replaced all multi spaces into one space

String s = codeString.replaceAll("\\s+"," ");

then

  String t= s.split(" ") 

method on that string which gave me an array with the help of single space. i got this Array result(This is console output by System.out.println(Arrays.toString(s)); ):

[String, input1, =, input.replaceAll("\\s+",", ");, List<String>, uncleanList, =, Arrays.asList(input1.split(", "));]

but there are alot of ( ) brackets, <> , . dots "" etc without space separation. Now i am stuck here . How to separate symbols from alphabets or numbers on separate index .

Desired Array output when print Array on console:

 [String, input1, =, input,.,replaceAll,(,"\\s+"," ",),;, List,<,String,>, uncleanList, =, Arrays,.,asList,(,input1,.,split,(," ",),),;,]   

回答1:


When you don't have a delimiter to use, split stops being an effective way to do tokenization. Instead of using split to find the parts you don't want, use find to find the parts you do want, like this:

Pattern pattern = Pattern.compile("\\w+|[+-]?[0-9\\._Ee]+|\\S");
Matcher matcher = pattern.matcher(input);

// Find all matches
while (matcher.find()) {
  String token = matcher.group();
}

The example regex I provide here is simpler that what you really want. The important thing is that you provide the default pattern (\S) to match any non-whitespace character that isn't included in a longer match. That will take care of all the single-character tokens.

Some of the longer tokens you have to match, like strings and comments, are pretty complicated, so it will take some work to get this right.



来源:https://stackoverflow.com/questions/58920961/how-to-tokenize-an-string-like-in-lexer-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!