How do you perform a preg_match where the pattern is an array, in php?

前端 未结 7 1383
予麋鹿
予麋鹿 2020-12-16 20:41

I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing

相关标签:
7条回答
  • 2020-12-16 20:55

    You can combine all the patterns from the list to single regular expression using implode() php function. Then test your string at once using preg_match() php function.

    $patterns = array(
      'abc',
      '\d+h',
      '[abc]{6,8}\-\s*[xyz]{6,8}',
    );
    
    $master_pattern = '/(' . implode($patterns, ')|(') . ')/'
    
    if(preg_match($master_pattern, $string_to_check))
    {
      //do something
    }
    

    Of course there could be even less code using implode() inline in "if()" condition instead of $master_pattern variable.

    0 讨论(0)
  • 2020-12-16 20:56

    If you're merely searching for the presence of a string in another string, use strpos as it is faster.

    Otherwise, you could just iterate over the array of patterns, calling preg_match each time.

    0 讨论(0)
  • 2020-12-16 20:57

    First of all, if you literally are only doing dozens every minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:

    $matches = false;
    foreach ($pattern_array as $pattern)
    {
      if (preg_match($pattern, $page))
      {
        $matches = true;
      } 
    }
    

    You can indeed combine all the patterns into one using the or operator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns contain the or operator.

    I would recommend at least grouping your patterns using parenthesis like:

    foreach ($patterns as $pattern)
    {
      $grouped_patterns[] = "(" . $pattern . ")";
    }
    $master_pattern = implode($grouped_patterns, "|");
    

    But... I'm not really sure if this ends up being faster. Something has to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.

    Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.

    So doing this:

    foreach ($strings_to_match as $string_to_match)
    {
      if (strpos($page, $string_to_match) !== false))
      {
        // etc.
        break;
      }
    }
    foreach ($pattern_array as $pattern)
    {
      if (preg_match($pattern, $page))
      {
        // etc.
        break;
      } 
    }
    

    and avoiding as many preg_match() as possible is probably going to be your best gain. strpos() is a lot faster than preg_match().

    0 讨论(0)
  • 2020-12-16 21:10
    // assuming you have something like this
    $patterns = array('a','b','\w');
    
    // converts the array into a regex friendly or list
    $patterns_flattened = implode('|', $patterns);
    
    if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
    {
    }
    
    // PS: that's off the top of my head, I didn't check it in a code editor
    
    0 讨论(0)
  • 2020-12-16 21:18

    If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /x modifier. Now your list of regular expressions would look like this:

    $regex = "/
    pattern1|   # search for occurences of 'pattern1'
    pa..ern2|   # wildcard search for occurences of 'pa..ern2'
    pat[ ]tern| # search for 'pat tern', whitespace is escaped
    mypat       # Note that the last pattern does NOT have a pipe char
    /x";
    

    With the /x modifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.

    This would avoid the looping through the array.

    0 讨论(0)
  • 2020-12-16 21:21

    If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.

    0 讨论(0)
提交回复
热议问题