How can I split a string by whitespace unless inside of a single quoted string?

后端 未结 3 933
无人共我
无人共我 2020-12-19 13:07

I\'m seeking a solution to splitting a string which contains text in the following format:

\"abcd efgh \'ijklm no pqrs\' tuv\"

which will p

相关标签:
3条回答
  • 2020-12-19 13:28
    use strict; use warnings;
    
    my $text = "abcd efgh 'ijklm no pqrs' tuv 'xwyz 1234 9999' 'blah'";
    my @out;
    
    my @parts = split /'/, $text;
    
    for ( my $i = 1; $i < $#parts; $i += 2 ) {
        push @out, split( /\s+/, $parts[$i - 1] ), $parts[$i];
    }
    
    push @out, $parts[-1];
    
    use Data::Dumper;
    print Dumper \@out;
    
    0 讨论(0)
  • 2020-12-19 13:44

    Use Text::ParseWords:

    #!/usr/bin/perl
    
    use strict; use warnings;
    use Text::ParseWords;
    
    my @words = parse_line('\s+', 0, "abcd efgh 'ijklm no pqrs' tuv");
    
    use Data::Dumper;
    print Dumper \@words;
    

    Output:

    C:\Temp> ff
    $VAR1 = [
              'abcd',
              'efgh',
              'ijklm no pqrs',
              'tuv'
            ];

    You can look at the source code for Text::ParseWords::parse_line to see the pattern used.

    0 讨论(0)
  • 2020-12-19 13:44

    So you've decided to use a regex? Now you have two problems.

    Allow me to infer a little bit. You want an arbitrary number of fields, where a field is composed of text without containing a space, or it is separated by spaces and begins with a quote and ends with a quote (possibly with spaces inbetween).

    In other words, you want to do what a command line shell does. You really should just reuse something. Failing that, you should capture a field at a time, with a regex something like:

    ^ *([^ ]+|'[^']*')(.*)
    

    Where you append group one to your list, and continue the loop with the contents of group 2.

    A single pass through a regex wouldn't be able to capture an arbitrarily large number of fields. You might be able to split on a regex (python will do this, not sure about perl), but since you are matching the stuff outside the spaces, I'm not sure that is even an option.

    0 讨论(0)
提交回复
热议问题