Regular expression to match CSV delimiters

后端 未结 6 981
天命终不由人
天命终不由人 2020-12-17 22:39

I\'m trying to create a PCRE that will match only the commas used as delimiters in a line from a CSV file. Assuming the format of a line is this:

1,\"abcd\",         


        
6条回答
  •  失恋的感觉
    2020-12-17 23:04

    Andy's right: correctly parsing CSV is a lot harder than you probably realise, and has all kinds of ugly edge cases. I suspect that it's mathematically impossible to correctly parse CSV with regexes, particularly those understood by sed.

    Instead of sed, use a Perl script that uses the Text::CSV module from CPAN (or the equivalent in your preferred scripting language). Something like this should do it:

    use Text::CSV;
    use feature 'say';
    
    my $csv = Text::CSV->new ( { binary => 1, eol => $/ } )
        or die "Cannot use CSV: ".Text::CSV->error_diag ();
    my $rows = $csv->getline_all(STDIN);
    for my $row (@$rows) {
        say join("\t", @$row);
    }
    

    That assumes that you don't have any tab characters embedded in your data, of course - perhaps it would be better to do the subsequent stages in a Real Scripting Language as well, so you could take advantage of proper lists?

提交回复
热议问题