How to match the first word after an expression with regex?

前端 未结 6 977
借酒劲吻你
借酒劲吻你 2020-12-01 12:19

For example, in this text:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc eu tellus vel nunc pretium lacinia. Proin sed lorem. Cras sed

6条回答
  •  清歌不尽
    2020-12-01 12:39

    Some of the other responders have suggested using a regex that doesn't depend on lookbehinds, but I think a complete, working example is needed to get the point across. The idea is that you match the whole sequence ("ipsum" plus the next word) in the normal way, then use a capturing group to isolate the part that interests you. For example:

    String s = "Lorem ipsum dolor sit amet, consectetur " +
        "adipiscing elit. Nunc eu tellus vel nunc pretium " +
        "lacinia. Proin sed lorem. Cras sed ipsum. Nunc " +
        "a libero quis risus sollicitudin imperdiet.";
    
    Pattern p = Pattern.compile("ipsum\\W+(\\w+)");
    Matcher m = p.matcher(s);
    while (m.find())
    {
      System.out.println(m.group(1));
    }
    

    Note that this prints both "dolor" and "Nunc". To do that with the lookbehind version, you would have to do something hackish like:

    Pattern p = Pattern.compile("(?<=ipsum\\W{1,2})(\\w+)");
    

    That's in Java, which requires the lookbehind to have an obvious maximum length. Some flavors don't have even that much flexibility, and of course, some don't support lookbehinds at all.

    However, the biggest problem people seem to be having in their examples is not with lookbehinds, but with word boundaries. Both David Kemp and ck seem to expect \b to match the space character following the 'm', but it doesn't; it matches the position (or boundary) between the 'm' and the space.

    It's a common mistake, one I've even seen repeated in a few books and tutorials, but the word-boundary construct, \b, never matches any characters. It's a zero-width assertion, like lookarounds and anchors (^, $, \z, etc.), and what it matches is a position that is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one.

提交回复
热议问题