I\'m working on a project that involves parsing a large csv formatted file in Perl and am looking to make things more efficient.
My approach has been to split(
The right way to do it -- by an order of magnitude -- is to use Text::CSV_XS. It will be much faster and much more robust than anything you're likely to do on your own. If you're determined to use only core functionality, you have a couple of options depending on speed vs robustness.
About the fastest you'll get for pure-Perl is to read the file line by line and then naively split the data:
my $file = 'somefile.csv';
my @data;
open(my $fh, '<', $file) or die "Can't read file '$file' [$!]\n";
while (my $line = <$fh>) {
chomp $line;
my @fields = split(/,/, $line);
push @data, \@fields;
}
This will fail if any fields contain embedded commas. A more robust (but slower) approach would be to use Text::ParseWords. To do that, replace the split
with this:
my @fields = Text::ParseWords::parse_line(',', 0, $line);