Which is the right regular expression to use for Numbers and Strings?

蓝咒 提交于 2019-12-10 13:17:32

问题


I am trying to create simple IDE and coloring my JTextPane based on

  • Strings (" ")
  • Comments (// and /* */)
  • Keywords (public, int ...)
  • Numbers (integers like 69 and floats like 1.5)

The way i color my source code is by overwritting the insertString and removeString methods inside the StyledDocument.

After much testing, i have completed comments and keywords.

Q1: As for my Strings coloring, I color my strings based on this regular expression:

Pattern strings = Pattern.compile("\"[^\"]*\"");
Matcher matcherS = strings.matcher(text);

while (matcherS.find()) {
    setCharacterAttributes(matcherS.start(), matcherS.end() - matcherS.start(), red, false);
}

This works 99% of the time except for when my string contains a specific kind of string where there is a "\ inside the code. This messes up my whole color coding. Can anyone correct my regular expression to fix my error?

Q2: As for Integers and Decimal coloring, numbers are detected based on this regular expression:

Pattern numbers = Pattern.compile("\\d+");
Matcher matcherN = numbers.matcher(text);
while (matcherN.find()) {
    setCharacterAttributes(matcherN.start(), matcherN.end() - matcherN.start(), magenta, false);
}

By using the regular expression "\d+", I am only handling integers and not floats. Also, integers that are part of another string are matched which is not what i want inside an IDE. Which is the correct expression to use for integer color coding?

Below is a screenshot of the output:

Thank you for any help in advance!


回答1:


For the strings, this is probably the fastest regex -

"\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\""

Formatted:

 " [^"\\]* 
 (?: \\ . [^"\\]* )*
 "

For integers and decimal numbers, the only foolproof expression I know of is
this -

"(?:\\d+(?:\\.\\d*)?|\\.\\d+)"

Formatted:

 (?:
      \d+ 
      (?: \. \d* )?
   |  \. \d+ 
 )

As a side note, If you're doing each independently from the start of
the string you could be possibly overlapping highlights.




回答2:


Try with:

  1. \\b\\d+(\\.\\d+)?\\b for int, float and double,
  2. "(?<=[{(,=\\s+]+)".+?"(?=[,;)+ }]+)" for Strings,



回答3:


For Integer go with

(?<!(\\^|\\d|\\.))[+-]?(\\d+(\\.\\d+)?)(?!(x|\\d|\\.))



回答4:


  1. Match a String ignoring the \" situations

    ".*?(?<!\\)"

The above will start a match once it sees a " and it will continue matching on anything until it gets to the next " which is not preceded by a \. This is achieved using the lookbehind feature explained very well at http://www.regular-expressions.info/lookaround.html

  1. Match all numbers with & without decimal points

(\d+)(\.\d+)? will give you at least one digit followed by a point and any number of other digits greater than 1.

  1. The question of matching numbers inside strings can be achieved in 2 ways :

    • a Modifying the above so that they have to exist with whitespace on either side \W(\d+)(\.\d+)?\W, which I don't think will be satisfactory in mathematical situations (ie 10+10) or at the end of an expression (ie 10;).

    • b Making this a matter of precedence. If the String colouring is checked after the numbers then that part of the string will be coloured pink at first but then immediately overwritten with red. String colouring takes precedence.




回答5:


R1: I believe there is no regex-based answer to non-escaped " characters in the middle of an ongoing string. You'd need to actively process the text to eliminate or circumvent the false-positives for characters that are not meant to be matched, based on your specific syntax rules (which you didn't specify).

However: If you mean to simply ignore escaped ones, \", like java does, then I believe you can simply include the escape+quote pair in the center as a group, and the greedy * will take care of the rest: \"((\\\\\")|[^\"])*\"

R2: I believe the following regex would work for finding both integers and fractions: \\d+(\.\\d+)?

You can expand it to find other kinds of numerals too. For example, \\d+([\./]\\d+)?, would additionally match numerals like "1/4".



来源:https://stackoverflow.com/questions/31299328/which-is-the-right-regular-expression-to-use-for-numbers-and-strings

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!