Optimization techniques used by std::regex_constants::optimize

六月ゝ 毕业季﹏ 提交于 2021-02-18 10:58:43

问题


I am working with std::regex, and whilst reading about the various constants defined in std::regex_constants, I came across std::optimize, reading about it, it sounds like it is useful in my application (I only need one instance of the regex, initialized at the beginning, but it is used multiple times throughout the loading process).

According to the working paper n3126 (pg. 1077), std::regex_constants::optimize:

Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output.

I was curious as to what type of optimization would be performed, but there doesn't seem to be much literature about it (indeed, it seems to be undefined), and one of the only things I found was at cppreference.com, which stated that std::regex_constants::optimize:

Instructs the regular expression engine to make matching faster, with the potential cost of making construction slower. For example, this might mean converting a non-deterministic FSA to a deterministic FSA.

However, I have no formal background in computer science, and whilst I'm aware of the basics of what an FSA is, and understand the basic difference between a deterministic FSA (each state only has one possible next state), and a non-deterministic FSA (with multiple potential next states); I do not understand how this improves matching time. Also, I would be interested to know if there are any other optimizations in various C++ Standard Library implementations.


回答1:


There's some useful information on the topic of regex engines and performance trade offs (far more than can fit in a stackoverflow answer) in Mastering Regular Expressions by Jeffrey Friedl.

It's worth noting that Boost.Regex, which was the source for N3126, documents optimize as "This currently has no effect for Boost.Regex."

P.S.

indeed, it seems to be implementation-defined

No, it's unspecified. Implementation-defined means an implementation is required to define the choice of behaviour. Implementations are not required to document how their regex engines are implemented or what (if any) difference the optimize flag makes.

P.S. 2

in various STL implementations

std::regex is not part of the STL, the C++ Standard Library is not the same thing as the STL.




回答2:


See http://swtch.com/~rsc/regexp/regexp1.html for a nice explanation on how NFA based regex implementations can avoid the exponential backtracking that occurs in DFA matchers in certain circumstances.



来源:https://stackoverflow.com/questions/11592665/optimization-techniques-used-by-stdregex-constantsoptimize

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!