Segmentation Fault with complex regex - Regex.h

走远了吗. 提交于 2019-12-11 11:02:33

问题


I'm trying to find URLs within a large block of text

regex_t reg;

const char *regex="REGEXGOESHERE";
regmatch_t matches[16];

//Read data into variable filecontent
regcomp(&reg, regex, REG_EXTENDED);

int offset=0;
int j;
int found=0;
int start,end;
while( regexec(&reg, filecontent+offset, 16, matches, 0) == 0)
{
    printf("\n\n");
    start = matches[0].rm_so+offset;
    end = matches[0].rm_eo-1+offset;

    printf("regex /%s/ at bytes %d-%d\n",
                regex, start, end);

    for (j=start; j<=end; j++)
    {
        printf("%c",filecontent[j]);
    }
    offset += matches[0].rm_eo;
    found = 1;
}
close(f);

Now this works for a simple regex in const char * regex, like say regex = "https?.*.png". But if I want a complex regex for a URL like (https?:\/\/.*\.(?:png|jpg)), I have to escape the backslashes and hence it becomes:

"(https?:\\/\\/.*\\.(?:png|jpg))";

And then running it gives a segmentation fault.

What might be going wrong?


回答1:


You need to check the return of regcomp, it will tell you that your regex is invalid.

#include <regex.h>
#include <stdio.h>

int main() {
  regex_t reg;

  const char *regex="(https?:\\/\\/.*\\.(?:png|jpg))";
  regmatch_t matches[16];
  int ret;
  char err[1024];

  //Read data into variable filecontent
  ret = regcomp(&reg, regex, REG_EXTENDED);
  if (ret != 0) {
        regerror(ret, &reg, err, 1024);
        printf("%s\n", err);
        return 1;
  }

  regfree(&reg);
  return 0;
}

You will get Invalid preceding regular expression

The reason is (?: which is not supported by POSIX regex, even extended ones.



来源:https://stackoverflow.com/questions/22567118/segmentation-fault-with-complex-regex-regex-h

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!