Write c++ regular expression to match a #include preprocessing directive

大憨熊 提交于 2021-01-29 04:06:46

问题


I want to write a regular expression in c++ to match a #include preprocessing directive. So I wrote this:

std::regex includePattern("^[[:blank:]|[:space:]]*#[[:blank:]|[:space:]]*include[[:blank:]|[:space:]]+[<|\"]{1}[_[:alpha:]]+[_[:alnum:]]*");

This is worked for:

std::string matchString = "#include <vector>";

But only matches parts of the string excluded the trailing ">", but if I change the regex to this:

std::regex includePattern("^[[:blank:]|[:space:]]*#[[:blank:]|[:space:]]*include[[:blank:]|[:space:]]+[<|\"]{1}[_[:alpha:]]+[_[:alnum:]]*[>|\"]{1}");

It just won't give me the desired result, just tell me "not found" ! Is there anything wrong?

Can anybody help me to write an accurate regular expression in c++ to match a #include preprocessing directive?

Thanks in advance!


回答1:


It depends on whether the input to be matched may contain new lines? [[:space:]] will match any white space, including new lines, [[:blank:]] will match any white space except new lines (and I'm not sure it is supported by the standard). Anyway, something like:

"^\\s*#\\s*include\\s+[<\"][^>\"]*[>\"]\\s*"

should do the trick, but...

  • If your source has new lines where it shouldn't, it still might match.

  • If your source has escaped new lines, say in the middle of the token include or the file name, it won't match. (This is legal C++, but no one in their right mind would do it.)

  • If your source has mismatched delimiters, a " at one end, and a < or a < at the other, it will still match.

  • And it doesn't handle comments at the end of line. Handling C++ style comments (//) should only be a matter of adding "(?://.*)?" to the end of the expression. Handling C style comments (particularly since there can be several) is a bit more complicated.

To ensure that the delimiters match, you'd probably have to put everything after the include in an or:

"^\\s*#\\s*include\\s+(?:<[^>]*>|\"[^\"]*\")\\s*"

Again, you'd need to add to the end to handle comments.




回答2:


You aren't validating are you ?
One thing, you might be able to count on include's coming after the BOL and possible spaces.
And delimited on its right side with a whitespace.
Other than that, I wouldn't try to validate whats on the right of that.

Using Multi-line modifier only -
"(?m)^[^\\S\\r\\n]*#include[^\\S\\r\\n]+(.*?)[^\\S\\r\\n]*"

Expanded:

 (?m)
 ^ [^\S\r\n]* 
 \#include
 [^\S\r\n]+ 
 ( .*? )               # (1)
 [^\S\r\n]* 



回答3:


If you need to capture the type of inclusion < or " and the included file name you could use:

std::string reg = "\\s*#\\s*include\\s*([<\"])([^>\"]+)([>\"])"; // escaped version

- or -

std::string raw = R"reg(\s*#\s*include\s*([<"])([^>"]+)([>"]))reg"; // raw string version

Live Demo

Group 1 = `<` or `"`
Group 2 = file name
Group 3 = `>` or `"`



回答4:


The following regex will match #include directives such as #include <vector>

^#include\s+<\w+>$

Note: this won't include directives such as #include stdio.h.



来源:https://stackoverflow.com/questions/26492513/write-c-regular-expression-to-match-a-include-preprocessing-directive

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!