Java regex: how to back-reference capturing groups in a certain context when their number is not known in advance

柔情痞子 提交于 2019-12-11 06:22:15

问题


As an introductory note, I am aware of the old saying about solving problems with regex and I am also aware about the precautions on processing XML with RegEx. But please bear with me for a moment...

I am trying to do a RegEx search and replace on a group of characters. I don't know in advance how often this group will be matched, but I want to search with a certain context only.

An example: If I have the following string "**ab**df**ab**sdf**ab**fdsa**ab**bb" and I want to search for "ab" and replace with "@ab@", this works fine using the following regex:

Search regex:

(.*?)(ab)(.*?)

Replace:

$1@$2@$3

I get four matches in total, as expected. Within each match, the group IDs are the same, so the back-references ($1, $2 ...) work fine, too.

However, if I now add a certain context to the string, the regex above fails:

Search string:

<context>abdfabsdfabfdsaabbb</context>

Search regex:

<context>(.*?)(ab)(.*?)</context>

This will find only the first match. But even if I add a non-capturing group to the original regex, it doesn't work ("<context>(?:(.*?)(ab)(.*?))*</context>").

What I would like is a list of matches as in the first search (without the context), whereby within each match the group IDs are the same.

Any idea how this could be achieved?


回答1:


Solution

Your requirement is similar to the one in this question: match and capture multiple instances of a pattern between a prefix and a suffix. Using the method as described in this answer of mine:

(?s)(?:<context>|(?!^)\G)(?:(?!</context>|ab).)*ab

Add capturing group as you need.

Caveat

Note that the regex only works for tags that are only allowed to contain only text. If a tag contains other tags, then it won't work correctly.

It also matches ab inside <context> tag without a closing tag </context>. If you want to prevent this then:

(?s)(?:<context>(?=.*?</context>)|(?!^)\G)(?:(?!</context>|ab).)*ab

Explanation

Let us break down the regex:

(?s)                        # Make . matches any character, without exception
(?:
  <context>
    |
  (?!^)\G
)
(?:(?!</context>|ab).)*
ab

(?:<context>|(?!^)\G) makes sure that we either gets inside a new <context> tag, or continue from the previous match and attempt to match more instance of sub-pattern.

(?:(?!</context>|ab).)* match whatever text that we don't care about (not ab) and prevent us from going past the closing tag </context>. Then we match the pattern we want ab at the end.



来源:https://stackoverflow.com/questions/21428545/java-regex-how-to-back-reference-capturing-groups-in-a-certain-context-when-the

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!