REGEX for search and exclude combined

南楼画角 提交于 2021-01-28 04:20:55

问题


Overview:

I am trying to combine two REGEX queries into one:

  • \d+\.\d+\.\d+\.\d+
  • ^(?!(10\.|169\.)).*$

I wrote this as a two part query. The first part would isolate IPs in a block of text and after I copy and paste this I select everything and that does not being with a 10 or 169.

Questions:

It seems like I am over complicating this:

  • Can anybody see a better way to do this?
  • Is there a way to combine these two queries?

回答1:


A quick-and-gimme-regex style answer

Basic one (whole string looks like an IP): ^\d+\.\d+\.\d+\.\d+$

Lite (period-separated 4-digit chunks, a whole word): \b\d+\.\d+\.\d+\.\d+\b

Medium (excluding junk like 1.2.4.6.7.9.0): (?<!\d\.)\b\d+\.\d+\.\d+\.\d+\b(?!\.\d+)

Advanced 1 (not starting with 10 or 169): (?<!\d\.)\b(?!(?:1(?:0|69))\.)\d+\.\d+\.\d+\.\d+\b(?!\.\d+)

Advanced 2 (not ending with 8 or 10): (?<!\d\.)\b\d+\.\d+\.\d+\.(?!(?:8|10)\b)\d+\b(?!\.\d+)

Details for the curious

The \b is a word boundary that makes it possible to match exact "words" (entities consisting of [a-zA-Z0-9_] characteters) inside a longer text. So, if we do not want to match 12.12.23.56 inside g12.12.23.56g, we use the Lite version.

The lookarounds together with the word boundary, make it possible to further restrict the matches. (?<!\d\.) - a negative lookbehind - and a (?!\.\d+) - a negative lookahead - will fail a match if the IP-resembling substring is preceded with a digit+. or followed with a .+digit. So, we do not match 12.12.34.56.78.90899-like entities with this regex. Choose Medium regex for that case.

Now, you need to restrict the matches to those that do not start with some numeric value. You need to make use of either a lookbehind, or a lookahead. When choosing between a lookbehind or a lookahead solution, prefer the lookahead, because 1) it is less resource consuming, and 2) more flavors support it. Thus, to fail all matches where IP first number is equal to 10 or 169, we can use a negative lookahead anchored after the leading word boundary: (?!(?:1(?:0|69))\.). The syntax is (?!...) and inside, we match either 1 followed with 0 and then a ., or 1 followed with 69 and then .. Note that we could write (?!10\.|169\.) but there is some redundant backtracking overhead then, as 1 part is repeating. Best practice is to "contract" alternations so that the beginning of each branch did not repeat, make the alternation group more linear. So, use Advanced 1 regex version to get those IPs.

A similar case is the Advanced 2 regex for getting some IPs that do not end with some value.




回答2:


Sure. Just put the anchored negative look ahead at the start:

^(?!10\.|169\.)\d+\.\d+\.\d+\.\d+$

Note: Unnecessary brackets have been removed.


To match within a line, ie remove the anchors and use a "word boundary" \b as the anchor:

\b(?!10\.|169\.)\d+\.\d+\.\d+\.\d+


来源:https://stackoverflow.com/questions/35417466/regex-for-search-and-exclude-combined

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!