Ignoring an optional suffix with a greedy regex

两盒软妹~` 提交于 2019-12-10 18:45:49

问题


I'm performing regex matching in .NET against strings that look like this:

1;#Lists/General Discussion/Waffles Win
2;#Lists/General Discussion/Waffles Win/2_.000
3;#Lists/General Discussion/Waffles Win/3_.000

I need to match the URL portion without the numbers at the end, so that I get this:

Lists/General Discussion/Waffles Win

This is the regex I'm trying:

(?:\d+;#)(?<url>.+)(?:/\d+_.\d+)*

The problem is that the last group is being included as part of the middle group's match. I've also tried without the * at the end but then only the first string above matches and not the rest.

I have the multi-line option enabled. Any ideas?


回答1:


A few different alternatives:

@"^\d+;#([^/]+(?:/[^/]+)*?)(?:/\d+_\.\d+)?$"

This matches as few path segments as possible, followed by an optional last part, and the end of the line.

@"^\d+;#([^/]+(?:/(?!\d+_\.\d+$)[^/]+)*)"

This matches as many path segments as possible, as long as it is not the digit-part at the end of the line.

@"^\d+;#(.*?)(?:/\d+_\.\d+)?$"

This matches as few characters as possible, followed by an optional last part, and the end of the line.




回答2:


You could try

^(\d+;#)([^/]+(/[^\d][^/]*)*)

and get the 2nd group. The first group matches the 1;#; the second group is split into the first part or the URL (assumed to contain any character other than /), then match any number of groups of /, followed by a non-digit, followed by anything other than /.

Tested on this site, appears to do what you want. Give it a try with some more samples.



来源:https://stackoverflow.com/questions/1422434/ignoring-an-optional-suffix-with-a-greedy-regex

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!