Regular Expressions in C

后端 未结 7 912
梦如初夏
梦如初夏 2021-01-01 02:42

I\'m curious, does anybody know a good way to do regular expression matching in C? The only way I can think of is through Flex. Is this the only way or is there a better w

相关标签:
7条回答
  • 2021-01-01 03:15

    Depending on what dialect you're looking for and what platform you're on:

    • POSIX regular expressions are likely to be in the standard C library - see <regex.h> and regcomp, regerror, regexec, regfree.
    • libPCRE provides most of perl's extended regex features
    • glib has GRegex
    • Henry Spencer's Tcl Regex library.

    Henry Spencer has also done another regex library, used by the current versions of TCL and PostgreSQL. This one is interesting because it is a hybrid NFA/DFA implementation.

    Regexes that accept the same string multiple ways (like a*a?) inherently require an NFA. The usual implementation models the NFA as a DFA using backtracking, which can be as much as O(2^length(input)) for particularly degenerate pattern. But, the simple recursive implementation can be extended with capturing, backreferences, callbacks to code, and all the usual "extra" features that many languages offer besides testing for matches.

    The "NFA" approach tracks multiple current states and updates all of them with each incoming character (See Ross Cox's writeup of Thompson NFA regexes for more explanation). This approach is O(input.length*pattern.length), which is faster - immensely so in the worst cases - but cannot perform backrefs or captures since it doesn't keep track of how it got to a state.

    Spencer's approach is a hybrid, compiling some portions of a pattern to the NFA approach, and using backtracking only where necessary for captures that were actually requested. This is often a substantial win (see TCL's position in the regex-dna shootout).

    0 讨论(0)
  • 2021-01-01 03:16

    Uh ... The best way is to

    #include <regex.h>
    

    That is the POSIX standard API for regular expressions.

    For non-POSIX systems, rolling your own is one alternative, a basic regex engine is not too hard to accomplish. I'm sure there are off-the-shelf solutions too, I haven't neded one.

    Come to think of it, I think glib has one.

    0 讨论(0)
  • 2021-01-01 03:29

    A completely different approach is to try a Parsing Expression Grammar (PEG). A PEG comes at the pattern matching problem from the point of view of a parser, and can even take advantage of multiple rules that form a complete grammar. That makes it possible to write expressions that match balanced parenthesis, which are otherwise quite difficult to express in most regexp dialects.

    Although PEGs are relatively new, there should be a few implementations out there that are usable from C.

    The PEG implementation I've personally used is LPeg. It is neatly bound to Lua, and coincidentally was written by one of Lua's principle authors, Roberto Ierusalimschy. It provides a complete PEG implementation, and also includes an adapter that translates a regexp into the equivalent PEG for execution.

    Linking the Lua core to a C program just to get access to LPeg might sound like overkill, but it really wouldn't be that difficult to do, even if you had no plan to use Lua for other purposes.

    0 讨论(0)
  • 2021-01-01 03:30

    Boost REGEX

    0 讨论(0)
  • 2021-01-01 03:31

    Look this: http://www.pcre.org/

    It is a VERY nice lib!

    0 讨论(0)
  • 2021-01-01 03:31

    http://www.gnu.org/software/hello/manual/libc/Regular-Expressions.html

    0 讨论(0)
提交回复
热议问题