Whole-word matching with regex.h

随声附和 提交于 2021-01-27 04:12:37

问题


I want a C++ regex that matches "bananas" or "pajamas" but not "bananas2" or "bananaspajamas" or "banana" or basically anything besides those exact two words. So I did this:

#include <regex.h>
#include <stdio.h>
int main()
{
  regex_t rexp;

  int rv = regcomp(&rexp, "\\bbananas\\b|\\bpajamas\\b", REG_EXTENDED | REG_NOSUB);
  if (rv != 0) {
    printf("Abandon hope, all ye who enter here\n");
  }
  regmatch_t match;
  int diditmatch = regexec(&rexp, "bananas", 1, &match, 0);
  printf("%d %d\n", diditmatch, REG_NOMATCH);
}

and it printed 1 1 as if there wasn't a match. What happened? I also tried \bbananas\b|\bpajamas\b for my regex and that failed too.

I asked Whole-word matching using regex about std::regex, but std::regex is awful and slow so I'm trying regex.h.


回答1:


The POSIX standard specifies neither word boundary syntax nor look-behind and look-ahead syntax (which could be used to emulate a word boundary) for both BRE and ERE. Therefore, it's not possible to write a regex with word boundaries that works across different POSIX-compliant platforms.

For a portable solution, you should consider using PCRE, or Boost.Regex if you plan to code in C++.

Otherwise, you are stuck with a non-portable solution. If you are fine with such restriction, there are several alternatives:

  • If you link with GNU C library, it extends the syntax to include word boundary, among other things: \b (word boundary), \B (non word boundary), \< (start of word), \> (end of word).
  • Some systems extends the BRE and ERE grammar to include [[:<:]] (start of word), [[:>:]] (end of word) syntax.



回答2:


Konrad left a great answer that solved my problem but it disappeared somehow so I can't accept it. Here's the code that printed the right thing, for posterity:

#include <regex.h>
#include <stdio.h>

int main()
{
  regex_t rexp;

  int rv = regcomp(&rexp, "[[:<:]]bananas[[:>:]]|[[:<:]]pajamas[[:>:]]", REG_EXTENDED | REG_NOSUB);
  if (rv != 0) {
    printf("Abandon hope, all ye who enter here\n");
  }
  regmatch_t match;
  int diditmatch = regexec(&rexp, "bananas", 1, &match, 0);
  printf("%d %d\n", diditmatch, REG_NOMATCH);
}



回答3:


Use

s == "balances" || s == "pajamas"

instead where s is a std::string.

Regular expressions can overcomplicate a simple solution. Avoid them in particular if you want a fixed match.



来源:https://stackoverflow.com/questions/31112053/whole-word-matching-with-regex-h

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!