Negative Lookaround Regex - Only one occurrence - Java

微笑、不失礼 提交于 2020-01-01 06:28:31

问题


I am trying to find if a string contains only one occurrence of a word ,

e.g.

String : `jjdhfoobarfoo` , Regex : `foo` --> false

String : `wewwfobarfoo` , Regex : `foo` --> true

String : `jjfffoobarfo` , Regex : `foo` --> true

multiple foo's may happen anywhere in the string , so they can be non-consecutive,

I test the following regex matching in java with string foobarfoo, but it doesn't work and it returns true :

static boolean testRegEx(String str){
    return str.matches(".*(foo)(?!.*foo).*");
}

I know this topic may seem duplicate , but I am surprised because when I use this regex : (foo)(?!.*foo).* it works !

Any idea why this happens ?


回答1:


Use two anchored look-aheads:

static boolean testRegEx(String str){
    return str.matches("^(?=.*foo)(?!.*foo.*foo.*$).*");
}

A couple of key points are that there is a negative look-ahead to check for 2 foo's that is anchored to start, and importantly containes an end of input.




回答2:


If you want to check if a string contains another string exactly once, here are two possible solutions, (one with regex, one without)

static boolean containsRegexOnlyOnce(String string, String regex) {
    Matcher matcher = Pattern.compile(regex).matcher(string);
    return matcher.find() && !matcher.find();
}

static boolean containsOnlyOnce(String string, String substring) {
    int index = string.indexOf(substring);
    if (index != -1) {
        return string.indexOf(substring, index + substring.length()) == -1;
    }
    return false;
}

All of them work fine. Here's a demo of your examples:

    String str1 = "jjdhfoobarfoo";
    String str2 = "wewwfobarfoo";
    String str3 = "jjfffoobarfo";
    String foo = "foo";
    System.out.println(containsOnlyOnce(str1, foo)); // false
    System.out.println(containsOnlyOnce(str2, foo)); // true
    System.out.println(containsOnlyOnce(str3, foo)); // true
    System.out.println(containsRegexOnlyOnce(str1, foo)); // false
    System.out.println(containsRegexOnlyOnce(str2, foo)); // true
    System.out.println(containsRegexOnlyOnce(str3, foo)); // true



回答3:


You can use this pattern:

^(?>[^f]++|f(?!oo))*foo(?>[^f]++|f(?!oo))*$

It's a bit long but performant.

The same with the classical example of the ashdflasd string:

^(?>[^a]++|a(?!shdflasd))*ashdflasd(?>[^a]++|a(?!shdflasd))*$

details:

(?>               # open an atomic group
    [^f]++        # all characters but f, one or more times (possessive)
  |               # OR
    f(?!oo)       # f not followed by oo
)*                # close the group, zero or more times

The possessive quantifier ++ is like a greedy quantifier + but doesn't allow backtracks.

The atomic group (?>..) is like a non capturing group (?:..) but doesn't allow backtracks too.

These features are used here for performances (memory and speed) but the subpattern can be replaced by:

(?:[^f]+|f(?!oo))*



回答4:


The problem with your regex is that the first .* initially consumes the whole string, then backs off until it finds a spot where the rest of the regex can match. That means, if there's more than one foo in the string, your regex will always match the last one. And from that position, the lookahead will always succeed as well.

Regexes that you use for validating have to be more precise than the ones you use for matching. Your regex is failing because the .* can match the sentinel string, 'foo'. You need to actively prevent matches of foo before and after the one you're trying to match. Casimir's answer shows one way to do that; here's another:

"^(?>(?!foo).)*+foo(?>(?!foo).)*+$"

It's not quite as efficient, but I think it's a lot easier to read. In fact, you could probably use this regex:

"^(?!.*foo.*foo).+$"

It's a great deal more inefficient, but a complete regex n00b would probably figure out what it does.

Finally, notice that none of theses regexes--mine or Casimir's--uses lookbehinds. I know it seems like the perfect tool for the job, but no. In fact, lookbehind should never be the first tool you reach for. And not just in Java. Whatever regex flavor you use, it's almost always easier to match the whole string in the normal way than it is to use lookbehinds. And usually much more efficient, too.




回答5:


Someone answered the question, but deleted it ,

The following short code works correctly :

static boolean testRegEx(String str){
    return !str.matches("(.*?foo.*){0}|(.*?foo.*){2,}");
}

Any idea on how to invert the result inside the regex itself ?



来源:https://stackoverflow.com/questions/17374967/negative-lookaround-regex-only-one-occurrence-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!