Java regex to match multiline records starting with fixed label

你。 提交于 2021-01-28 14:39:38

问题


Following is an example of a list of multiline records, each starting with a fixed string label (LABEL):

<Irrelevant line>
...
<Irrelevant line>
LABEL ...
...
...
LABEL ...
...
...
LABEL ...
...
...
LABEL ...
...
...

Is there a Java regular expression that can much the above and extract each record, i.e.

LABEL ...
...
...

Also, is this the fastest way of extracting those records, or reading line-by-line and checking the start of the string would yield faster results?


回答1:


To iterate over all the LABEL groups, use this:

Pattern regex = Pattern.compile("(?sm)LABEL.*?(?=^LABEL|\\Z)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    // the current LABEL group: regexMatcher.group()
} 

See the demo for the various matches.

Explanation

  • (?s) activates DOTALL mode, allowing the dot to match across lines
  • (?m) turns on multi-line mode, allowing ^ and $ to match on each line
  • LABEL matches literal characters
  • .*? lazily matches all chars up to...
  • the point where the lookahead (?=^LABEL|\\Z) can assert that what follows is the next LABEL or the end of the string



回答2:


I think you can start with the expression:

^LABEL\s*\w*

OR

^LABEL.*

It may need some improvements but you can at least start with it.




回答3:


The below would match all the lines which starts with the string LABEL,

(?=^LABEL).*

DEMO




回答4:


In my point of view you can iterate stream per line and check if the line starts with "LABEL".

I think you can use "substring" method like

line.substring(0,"LABLEL".length());//you need add more checks to improve code security

In my point of view Regural Expressions are very useful to find pattern no a specific text.



来源:https://stackoverflow.com/questions/24605556/java-regex-to-match-multiline-records-starting-with-fixed-label

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!