How do I write a simple regular expression pattern matching function in C or C++?

后端 未结 9 1126
我寻月下人不归
我寻月下人不归 2021-02-01 06:58

This is a question in my paper test today, the function signature is

int is_match(char* pattern,char* string)

The pattern is limited to only A

9条回答
  •  情书的邮戳
    2021-02-01 07:46

    The full power of regular expressions and finite state machines are not needed to solve this problem. As an alternative there is a relatively simple dynamic programming solution.

    Let match(i, j) be 1 if it is possible to match the the sub-string string[i..n-1] with the sub-pattern pattern[j, m - 1], where n and m are the lengths of string and pattern respectively. Otherwise let match(i, j) be 0.

    The base cases are:

    • match(n, m) = 1, you can match an empty string with an empty pattern;

    • match(i, m) = 0, you can't match a non-empty string with an empty pattern;

    The transition is divided into 3 cases depending on whether the current sub-pattern starts with a character followed by a '*', or a character followed by a '?' or just starts with a character with no special symbol after it.

    #include 
    #include 
    #include 
    
    int is_match(char* pattern, char* string)
    {
      int n = strlen(string);
      int m = strlen(pattern);
    
      int i, j;
      int **match;
    
      match = (int **) malloc((n + 1) * sizeof(int *));
      for(i = 0; i <= n; i++) {
        match[i] = (int *) malloc((m + 1) * sizeof(int));
      }
    
      for(i = n; i >= 0; i--) {
        for(j = m; j >= 0; j--) {
          if(i == n && j == m) {
            match[i][j] = 1;
          }
          else if(i < n && j == m) {
            match[i][j] = 0;
          }
          else {
            match[i][j] = 0;
            if(pattern[j + 1] == '*') {
              if(match[i][j + 2]) match[i][j] = 1;
              if(i < n && pattern[j] == string[i] && match[i + 1][j]) match[i][j] = 1;
            }
            else if(pattern[j + 1] == '?') {
              if(match[i][j + 2]) match[i][j] = 1;
              if(i < n && pattern[j] == string[i] && match[i + 1][j + 2]) match[i][j] = 1;
            }
            else if(i < n && pattern[j] == string[i] && match[i + 1][j + 1]) {
              match[i][j] = 1;
            }
          }
        }
      }
    
      int result = match[0][0];
    
      for(i = 0; i <= n; i++) {
        free(match[i]);
      }
    
      free(match);
    
      return result;
    }
    
    int main(void)
    {
      printf("is_match(dummy, dummy)  = %d\n", is_match("dummy","dummy"));
      printf("is_match(dumm?y, dummy) = %d\n", is_match("dumm?y","dummy"));
      printf("is_match(dum?y, dummy)  = %d\n", is_match("dum?y","dummy"));
      printf("is_match(dum*y, dummy)  = %d\n", is_match("dum*y","dummy")); 
    
      system("pause");
    
      return 0;
    }
    

    The time complexity of this approach is O(n * m). The memory complexity is also O(n * m) but with a simple modification can be reduced to O(m).

提交回复
热议问题