Matching on Substrings in Delimited List Using Regex

冷暖自知 提交于 2019-12-12 19:16:58

问题


I'm attempting to formulate a regex in Java to capture multiple strings in a space-delimited list. Here is the string I am trying to capture from ...

String output = "regulations { qux def } standards none rules { abc-123 456-defghi wxyz_678  } security { enabled }";

And I want use a regex to match on each word in the space-delimited list between the brackets immediately following rules. In other words, I would like the regex to match on abc-123, 456-defghi, and wxyz_678. These substrings in this list can contain any characters except whitespace, and there can be any number of substrings in the list; I've just used the above 3 specifically to illustrate by example. The following isn't working since I need to modify it to be able to match multiple times ...

String regex = "rules\\s\\{\\s([^\\s]*)\\s\\}";
final Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(output);
while (matcher.find()) {
    System.out.println(matcher.group(1));
}

How could I modify the above regex to account for multiple possible matches and get the following output?

abc-123
456-defghi
wxyz_678

回答1:


Here is a 1-step approach: use 1 regex to "match them all".

The regex:

(?:\brules\s+\{|(?!^)\G)\s+([\w-]+)

The regex is matching a whole word rules followed by 1 or more whitespace symbols and if it finds 1 or more whitespace followed by sequences of 1 or more alphanumeric symbols or hyphens, it also matches right after the last successful match. The word rules is a kind of a boundary for us here.

Java code:

String output = "regulations { qux def } standards none rules { abc-123 456-defghi wxyz_678  } security { enabled }"; 
String regex = "(?:\\brules\\s+\\{|(?!^)\\G)\\s+([\\w-]+)";
final Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(output);
while (matcher.find()) {
    System.out.println(matcher.group(1));
}

Here is a 2-step approach: 1) get the substring between rules { and }, 2) split with whitespace.

String output = "regulations { qux def } standards none rules { abc-123 456-defghi wxyz_678  } security { enabled }"; 
String subst = output.replaceFirst("(?s)^.*\\brules\\s*[{]\\s*([^{}]+)[}].*$", "$1");
String[] res = subst.split("\\s+");
System.out.println(Arrays.toString(res));

See IDEONE demo and the regex demo.

The regex is much simpler, it just matches all up to and including rules {, then captures what is inside the {...}, and then matches } and the rest of string. With the backreference $1 we restore this Group 1 value to subst variable. Then just split.



来源:https://stackoverflow.com/questions/34069272/matching-on-substrings-in-delimited-list-using-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!