How to save/serialize compiled regular expression (std::regex) to a file?

倖福魔咒の 提交于 2019-12-14 01:23:37

问题


I'm using <regex> from Visal Studio 2010. I understand that when I create regex object then it's compiled. There is no compile method like in other languages and libraries but I thinks that's how it work, am I right?

I need to store large amount of this compiled regexes in a file so I would just get chunk of memory block and get my compiled regex.

I can't figure how to do this. I found that in PCRE it is possible but it's Linux library. There is a Windows [version2 but it's 3 years old and I would like to use more high-level approach (there isn't c++ wrapper in windows version).

So is it possible to use save std:regex or boost::regex (it's the same right?) as a chunk of memory and then simply reuse it later?

Or is there other simple library for Windows that allows to do this?

EDIT: Thanks for great answers. I'll simply check if it would be sufficient to simply store a regex as a string and then if it would still be slow I'll test and compare it with this old PCRE library.


回答1:


I don't think it can be done without modifying the boost library to support it.

I don't know specifically how the boost regex library is implemented, but most regex libraries compile things to a binary blob that's then interpreted later as a series of instructions for a sort of limited virtual machine.

If boost's regex library is implemented in this way, serializing it would be relatively easy. Just get at the binary blob somehow and dump it to disk. The existence of the POSIX regex API for the boost library tells me that this is probably how it's implemented.

OTOH, another way to implement it (and a not so common way) is by generating something like an abstract syntax tree for the regex. This means that the individual pieces of the regex would be represented by their own objects and those objects would be linked together into some larger structure that represented the whole regex.

If boost does it this way then serialization will be very complex.

This is not possible with C++, but what I really wish happened is that boost could compile constant string regular expressions at compile time with template meta-programming. The reason this is not possible is that it isn't possible to iterate over the contents of a string (even a constant string) with a template.




回答2:


You can use the regex strings themselves as the 'serialized' regex - just save those to a file, then when you want to reconstitute the regex objects, just pass the saved strings to the regex constructor.

The only drawbacks I can think of:

  • it might take some more time to 'reconstitute' the regex database, but I really don't know how much (I suspect that the time would be dominated by I/O anyway, so I'm not sure if the difference would be significant - I really don't know how much overhead there is in regex compilation by the boost library's implementation)
  • if you want the stored regexes obfuscated, you'll have to do that yourself instead of relying on the compiled-binary state to be unreadable

The advantages to this are:

  • it's 100% supported, so it's not fragile/brittle
  • it's portable across compiler versions and platforms (ie., not fragile/brittle)

Is the time to compile the regex database (excluding I/O) really significant enough to warrant trying to save the compiled state?




回答3:


I'm not sure, but did you take a look at boost::serialization, which can serialize a C++ object?



来源:https://stackoverflow.com/questions/4499808/how-to-save-serialize-compiled-regular-expression-stdregex-to-a-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!