How could I find all whitespaces excluding the ones between quotes?

前端 未结 5 1526
野性不改
野性不改 2020-12-10 06:48

I need to split string by spaces, but phrase in quotes should be preserved unsplitted. Example:

  word1 word2 \"this is a phrase\" word3 word4 \"this is a s         


        
相关标签:
5条回答
  • 2020-12-10 07:07

    assuming your quotes are well defined, ie, in pairs, you can explode and go through for loop every 2 fields. eg

    $str = "word1 word2 \"this is a phrase\" word3 word4 \"this is a second phrase\" word5 word6 \"lastword\"";
    print $str ."\n";
    $s = explode('"',$str);
    for($i=1;$i<count($s);$i+=2){
        if ( strpos($s[$i] ," ")!==FALSE) {
            print "Spaces found: $s[$i]\n";
        }
    }
    

    output

    $ php test.php
    Spaces found: this is a phrase
    Spaces found: this is a second phrase
    

    No complicated regexp required.

    0 讨论(0)
  • 2020-12-10 07:10

    With the help of user MizardX from #regex irc channel (irc.freenode.net) solution was found. It even supports single quotes.

    $str= 'word1 word2 \'this is a phrase\' word3 word4 "this is a second phrase" word5 word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5';
    
    $regexp = '/\G(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)*\K\s+/';
    
    $arr = preg_split($regexp, $str);
    
    print_r($arr);
    

    Result is:

    Array (
        [0] => word1
        [1] => word2
        [2] => 'this is a phrase'
        [3] => word3
        [4] => word4
        [5] => "this is a second phrase"
        [6] => word5
        [7] => word1
        [8] => word2
        [9] => "this is a phrase"
        [10] => word3
        [11] => word4
        [12] => "this is a second phrase"
        [13] => word5  
    )
    

    PS. Only disadvantage is that this regexp works only for PCRE 7.

    It turned out that I do not have PCRE 7 support on production server, only PCRE 6 is installed there. Even though it is not as flexible as previous one for PCRE 7, regexp that will work is (got rid of \G and \K):

    /(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)+/
    

    For the given input result is the same as above.

    0 讨论(0)
  • 2020-12-10 07:15

    Anybody want to benchmark tokenizing vs. regex? My guess is the explode() function is a little too hefty for any speed benefit. Nonetheless, here's another method:

    (edited because I forgot the else case for storing the quoted string)

    $str = 'word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5';
    
    // initialize storage array
    $arr = array();
    // initialize count
    $count = 0;
    // split on quote
    $tok = strtok($str, '"');
    while ($tok !== false) {
        // even operations not in quotes
        $arr = ($count % 2 == 0) ? 
                                   array_merge($arr, explode(' ', trim($tok))) :
                                   array_merge($arr, array(trim($tok)));
        $tok = strtok('"');
        ++$count;
    }
    
    // output results
    var_dump($arr);
    
    0 讨论(0)
  • 2020-12-10 07:16

    using the regex from the other question you linked this is rather easy?

    <?php
    
    $string = 'word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5';
    
    preg_match_all( '/(\w+|"[\w\s]*")+/' , $string , $matches );
    
    print_r( $matches[1] );
    
    ?>
    

    output:

    Array
    (
         [0] => word1
         [1] => word2
         [2] => "this is a phrase"
         [3] => word3
         [4] => word4
         [5] => "this is a second phrase"
         [6] => word5
    )
    
    0 讨论(0)
  • 2020-12-10 07:18
    $test = 'word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5';
    preg_match_all( '/([^"\s]+)|("([^"]+)")/', $test, $matches);
    
    0 讨论(0)
提交回复
热议问题