问题
I want a C++ regex that matches "bananas" or "pajamas" but not "bananas2" or "bananaspajamas" or "banana" or basically anything besides those exact two words. So I did this:
#include <regex.h>
#include <stdio.h>
int main()
{
regex_t rexp;
int rv = regcomp(&rexp, "\\bbananas\\b|\\bpajamas\\b", REG_EXTENDED | REG_NOSUB);
if (rv != 0) {
printf("Abandon hope, all ye who enter here\n");
}
regmatch_t match;
int diditmatch = regexec(&rexp, "bananas", 1, &match, 0);
printf("%d %d\n", diditmatch, REG_NOMATCH);
}
and it printed 1 1 as if there wasn't a match. What happened? I also tried \bbananas\b|\bpajamas\b for my regex and that failed too.
I asked Whole-word matching using regex about std::regex, but std::regex is awful and slow so I'm trying regex.h.
回答1:
The POSIX standard specifies neither word boundary syntax nor look-behind and look-ahead syntax (which could be used to emulate a word boundary) for both BRE and ERE. Therefore, it's not possible to write a regex with word boundaries that works across different POSIX-compliant platforms.
For a portable solution, you should consider using PCRE, or Boost.Regex if you plan to code in C++.
Otherwise, you are stuck with a non-portable solution. If you are fine with such restriction, there are several alternatives:
- If you link with GNU C library, it extends the syntax to include word boundary, among other things:
\b(word boundary),\B(non word boundary),\<(start of word),\>(end of word). - Some systems extends the BRE and ERE grammar to include
[[:<:]](start of word),[[:>:]](end of word) syntax.
回答2:
Konrad left a great answer that solved my problem but it disappeared somehow so I can't accept it. Here's the code that printed the right thing, for posterity:
#include <regex.h>
#include <stdio.h>
int main()
{
regex_t rexp;
int rv = regcomp(&rexp, "[[:<:]]bananas[[:>:]]|[[:<:]]pajamas[[:>:]]", REG_EXTENDED | REG_NOSUB);
if (rv != 0) {
printf("Abandon hope, all ye who enter here\n");
}
regmatch_t match;
int diditmatch = regexec(&rexp, "bananas", 1, &match, 0);
printf("%d %d\n", diditmatch, REG_NOMATCH);
}
回答3:
Use
s == "balances" || s == "pajamas"
instead where s is a std::string.
Regular expressions can overcomplicate a simple solution. Avoid them in particular if you want a fixed match.
来源:https://stackoverflow.com/questions/31112053/whole-word-matching-with-regex-h