I have a multiple CSV files each with a different amount of entries each with roughly 300 lines each.
The first line in each file is the Data labels
Let's get this out of the way: you cannot read a CSV by just splitting on commas. You've just demonstrated why; commas might be escaped or inside quotes. Those commas are totally valid, they're part of the data. Discarding them mangles the data in the CSV.
For this reason, and others, CSV files must be read using a CSV parsing library. To find which commas are data and which commas are structural also requires parsing the CSV using a CSV parsing library. So you won't be saving yourself any time by trying to remove the commas from inside quotes. Instead you'll give yourself more work while mangling the data. You'll have to use a CSV parsing library.
Text::CSV_XS is a very good, very fast CSV parsing library. It has a ton of features, most of which you do not need. Fortunately it has examples for doing most common actions.
For example, here's how you read and print each row from a file called file.csv
.
use strict;
use warnings;
use autodie;
use v5.10; # for `say`
use Text::CSV_XS;
# Open the file.
open my $fh, "<", "file.csv";
# Create a new Text::CSV_XS object.
# allow_whitespace allows there to be whitespace between the fields
my $csv = Text::CSV_XS->new({
allow_whitespace => 1
});
# Read in the header line so it's not counted as data.
# Then you can use $csv->getline_hr() to read each row in as a hash.
$csv->header($fh);
# Read each row.
while( my $row = $csv->getline($fh) ) {
# Do whatever you want with the list of cells in $row.
# This prints them separated by semicolons.
say join "; ", @$row;
}
A good CSV parser will have no trouble with this since commas are inside the quoted fields, so you can simply parse the file with it.
A really nice module is Text::CSV_XS, which is loaded by default when you use the wrapper Text::CSV. The only thing to address in your data is the spaces between fields since they aren't in CSV specs, so I use the option for that in the example below.
If you indeed must remove commas for further work do that as the parser hands you lines.
use warnings;
use strict;
use feature 'say';
use Text::CSV;
my $file = 'commas_in_fields.csv';
my $csv = Text::CSV->new( { binary => 1, allow_whitespace => 1 } )
or die "Cannot use CSV: " . Text::CSV->error_diag ();
open my $fh, '<', $file or die "Can't open $file: $!";
my @headers = @{ $csv->getline($fh) }; # if there is a separate header line
while (my $line = $csv->getline($fh)) { # returns arrayref
tr/,//d for @$line; # delete commas from each field
say "@$line";
}
This uses tr
on $_
in the for
loop, changing the elements of the array, for conciseness.
I'd like to repeat and emphasize what others have explained: do not parse CSV by hand, since only trouble awaits; use a library. This is very much akin to parsing XML and similar formats: no regex please, but libraries.