Regex to find last occurrence of pattern in a string

北城余情 提交于 2020-12-05 12:11:20

问题


My string being of the form:

"as.asd.sd fdsfs. dfsd  d.sdfsd. sdfsdf sd   .COM"

I only want to match against the last segment of whitespace before the last period(.)

So far I am able to capture whitespace but not the very last occurrence using:

\s+(?=\.\w)

How can I make it less greedy?


回答1:


You can try like so:

(\s+)(?=\.[^.]+$)

(?=\.[^.]+$) Positive look ahead for a dot and characters except dot at the end of line.

Demo:

https://regex101.com/r/k9VwC6/3




回答2:


"as.asd.sd ffindMyLastOccurrencedsfs. dfindMyLastOccurrencefsd  d.sdfsd. sdfsdf sd   ..COM"

.*(?=((?<=\S)\s+)).*

replaced by `>\1<`

>   <

As a more generalized example

"as.asd.sd ffindMyLastOccurrencedsfs. dfindMyLastOccurrencefsd  d.sdfsd. sdfsdf sd   ..COM"

.*(?=(findMyLastOccurrence|(?<=\S)\s+|(?<=[^\.])\.+)).*

replaced by `>\1<`

>..<

Explanation:

Part 1 .*

  • is greedy and finds everything as long as the needles are found. Thus, it also captures all needle occurrences until the very last needle.

edit to add:

  • in case we are interested in the first hit, we can prevent the greediness by writing .*?

Part 2 (?=(findMyLastOccurrence|(?<=\S)\s+|(?<=[^\.])\.+|(?<=**Not**NeedlePart)NeedlePart+))

  • defines the 'break' condition for the greedy 'find-all'. It consists of several parts:
    (?=(needles))
    • positive lookahead: ensure that previously found everything is followed by the needles findMyLastOccurrence|(?<=\S)\s+|(?<=[^\.])\.+)|(?<=**Not**NeedlePart)NeedlePart+
    • several needles for which we are looking. Needles are patterns themselves.
    • In case we look for a collection of whitespaces, dots or other needleparts, the pattern we are looking for is actually: anything which is not a needlepart, followed by one or more needleparts (thus needlepart is +). See the example for whitespaces \s negated with \S, actual dot . negated with [^.]

Part 3 .*

  • as we aren't interested in the remainder, we capture it and dont use it any further. We could capture it with parenthesis and use it as another group, but that's out of scope here



回答3:


In a general case, you can match the last occurrence of any pattern using the following scheme:

pattern(?![\s\S]*pattern)
(?s)pattern(?!.*pattern)
pattern(?!(?s:.*)pattern)

where [\s\S]* matches any zero or more chars as many as possible. (?s) and (?s:.) can be used with regex engines that support these constructs so as to use . to match any chars.

In this case, rather than \s+(?![\s\S]*\s), you may use

\s+(?!\S*\s)

See the regex demo. Note the \s and \S are inverse classes, thus, it makes no sense using [\s\S]* here, \S* is enough.

Details:

  • \s+ - one or more whitespace chars
  • (?!\S*\s) - that are not immediately followed with any 0 or more non-whitespace chars and then a whitespace.



回答4:


You can try this. It will capture the last white space segment - in the first capture group.

(\s+)\.[^\.]*$


来源:https://stackoverflow.com/questions/41870124/regex-to-find-last-occurrence-of-pattern-in-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!