Saving substrings using Regular Expressions

我们两清 提交于 2019-12-20 03:36:13

问题


I'm new to regular expressions in Java (or any language, for that matter) and I'm wanting to do a find using them. The tricky part that I don't understand how to do is replace something inside the string that matches.

For example, if the line I'm looking for is

Person item6 [can {item thing [wrap]}]

I'm able to write a regex that finds that line, but finding what the word "thing" is (as it may differ among different lines) is my problem. I may want to either replace that word with something else or save it in a variable for later. Is there any easy way to do this using Java's regex engine?


回答1:


Yes. You wrap it in "capturing groups", which is just some ( ) around the part of the regular expression matching the interesting word.

Here is an example:

public static void main(String[] args) {

    Pattern pat = Pattern.compile("testing (\\d+) widgets");

    String text = "testing 5 widgets";

    Matcher matcher = pat.matcher(text);

    if (matcher.matches()) {
        System.out.println("Widgets tested : " + matcher.group(1));
    } else {
        System.out.println("No match");
    }

}

Pattern and Matcher come from java.util.regex. There are some shortcuts in the String class, but these are the most flexible




回答2:


The problem specification isn't very clear, but here are some ideas that may work:

Use lookarounds and replaceAll/First

The following regex matches the \w+ that is preceded by the string "{item " and followed by the string " [". Lookarounds are used to match exactly the \w+ only. Metacharacters { and [ are escaped as necessary.

String text =
    "Person item6 [can {item thing [wrap]}]\n" +
    "Cat item7 [meow meow {item thang [purr]}]\n" +
    "Dog item8 [maybe perhaps {itemmmm thong [woof]}]" ;

String LOOKAROUND_REGEX = "(?<=\\{item )\\w+(?= \\[)";

System.out.println(
    text.replaceAll(LOOKAROUND_REGEX, "STUFF")
);

This prints:

Person item6 [can {item STUFF [wrap]}]
Cat item7 [meow meow {item STUFF [purr]}]
Dog item8 [maybe perhaps {itemmmm thong [woof]}]

References

  • regular-expressions.info/Lookarounds
  • String.replaceAll(String regex, String replacement)

Use capturing groups instead of lookarounds

Lookarounds should be used judiciously. Lookbehinds in particular in Java is very limited. A more commonly applied technique is to use capturing groups to match more than just the interesting parts.

The following regex matches a similar pattern from before, \w+, but also includes the "{item " prefix and " [" suffix. Additionally, the m in item can repeat without limitation (something that can't be matched in a lookbehind in Java).

String CAPTURING_REGEX = "(\\{item+ )(\\w+)( \\[)";

System.out.println(
    text.replaceAll(CAPTURING_REGEX, "$1STUFF$3")
);

This prints:

Person item6 [can {item STUFF [wrap]}]
Cat item7 [meow meow {item STUFF [purr]}]
Dog item8 [maybe perhaps {itemmmm STUFF [woof]}]

Our pattern has 3 capturing groups:

(\{item+ )(\w+)( \[)
\________/\___/\___/
 group 1    2    3

Note that we can't simply replace what we matched with "STUFF", because we match some "extraneous" parts. We're not interested in replacing them, so we capture these parts and just put them back in the replacement string. The way we refer to what a group captured in replacement strings in Java is to use the $ sigil; thus the $1 and $3 in the above example.

References

  • regular-expressions.info/Grouping

Use a Matcher for more flexibility

Not everything can be done with replacement strings. Java doesn't have postprocessing to capitalize a captured string, for example. In these more general replacement scenarios, you can use a Matcher loop like the following:

Matcher m = Pattern.compile(CAPTURING_REGEX).matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
    System.out.println("Match found");
    for (int i = 0; i <= m.groupCount(); i++) {
        System.out.printf("Group %d captured <%s>%n", i, m.group(i));
    }
    m.appendReplacement(sb,
        String.format("%s%s %<s and more %<SS%s",
            m.group(1), m.group(2), m.group(3)
        )
    );
}
m.appendTail(sb);

System.out.println(sb.toString());

The above prints:

Match found
Group 0 captured <{item thing [>
Group 1 captured <{item >
Group 2 captured <thing>
Group 3 captured < [>

Match found
Group 0 captured <{item thang [>
Group 1 captured <{item >
Group 2 captured <thang>
Group 3 captured < [>

Match found
Group 0 captured <{itemmmm thong [>
Group 1 captured <{itemmmm >
Group 2 captured <thong>
Group 3 captured < [>

Person item6 [can {item thing thing and more THINGS [wrap]}]
Cat item7 [meow meow {item thang thang and more THANGS [purr]}]
Dog item8 [maybe perhaps {itemmmm thong thong and more THONGS [woof]}]

References

  • java.util.regex.Pattern
  • java.util.regex.Matcher
    • group(int) - access individual captured strings
    • appendReplacement -- unfortunately, StringBuffer-only
  • java.util.Formatter - used in printf and String.format in above example

Attachments

  • Source code of above example in ideone.com


来源:https://stackoverflow.com/questions/3010684/saving-substrings-using-regular-expressions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!