Regex implementation that can handle machine-generated regex's: *non-backtracking*, O(n)?

后端 未结 5 2110
暖寄归人
暖寄归人 2020-12-24 09:15

Edit 2: For a practical demonstration of why this remains important, look no further than stackoverflow\'s own regex-caused outage today (2016-07-2

5条回答
  •  醉话见心
    2020-12-24 10:04

    If you can handle using unsafe code (and the licensing issue) you could take the implementation from this TRE windows port.

    You might be able to use this directly with P/Invoke and explicit layout structs for the following:

    typedef int regoff_t;
    typedef struct {
      size_t re_nsub;  /* Number of parenthesized subexpressions. */
      void *value;     /* For internal use only. */
    } regex_t;
    
    typedef struct {
      regoff_t rm_so;
      regoff_t rm_eo;
    } regmatch_t;
    
    
    typedef enum {
      REG_OK = 0,       /* No error. */
      /* POSIX regcomp() return error codes.  (In the order listed in the
         standard.)  */
      REG_NOMATCH,      /* No match. */
      REG_BADPAT,       /* Invalid regexp. */
      REG_ECOLLATE,     /* Unknown collating element. */
      REG_ECTYPE,       /* Unknown character class name. */
      REG_EESCAPE,      /* Trailing backslash. */
      REG_ESUBREG,      /* Invalid back reference. */
      REG_EBRACK,       /* "[]" imbalance */
      REG_EPAREN,       /* "\(\)" or "()" imbalance */
      REG_EBRACE,       /* "\{\}" or "{}" imbalance */
      REG_BADBR,        /* Invalid content of {} */
      REG_ERANGE,       /* Invalid use of range operator */
      REG_ESPACE,       /* Out of memory.  */
      REG_BADRPT            /* Invalid use of repetition operators. */
    } reg_errcode_t;
    

    Then use the exports capable of handling strings with embedded nulls (with wide character support)

    /* Versions with a maximum length argument and therefore the capability to
       handle null characters in the middle of the strings (not in POSIX.2). */
    int regwncomp(regex_t *preg, const wchar_t *regex, size_t len, int cflags);
    
    int regwnexec(const regex_t *preg, const wchar_t *string, size_t len,
          size_t nmatch, regmatch_t pmatch[], int eflags);
    

    Alternatively wrap it via a C++/CLI solution for easier translation and more flexibility (I would certainly suggest this is sensible if you are comfortable with C++/CLI).

提交回复
热议问题