问题
Need to split large csv file into multiple files by lines using php and linux.
CSV contains -
"id","name","address"
"1","abc","this is test address1 which having multiple newline
separators."
"2","abc","this is test address2
which having multiple newline separators"
"3","abc","this is test address3.
which having multiple
newline separators."
I used linux comand - split -l 5000 testfile.
But it can not able to split csv in correct format because in csv there is one field address having multiple newline characters so command with split file from that line.
I've also tried to use PHP:
$inputFile = 'filename.csv';
$outputFile = "outputfile";
$splitSize = 5000;
$in = fopen($inputFile, 'r'):
$header = fgetcsv($in);
$rowCount = 0;
$fileCount = 1;
while (!feof($in)) {
if (($rowCount % $splitSize) == 0) {
if ($rowCount > 0) {
fclose($out);
}
$filename = $outputFile . $fileCount++;
$out = fopen($filename .'.csv', 'w');
chmod($filename,777);
fputcsv($out, $header);
}
$data = fgetcsv($in);
if ($data) {
fputcsv($out, $data);
$rowCount++;
}
}
fclose($out);
How to resolve this problem?
回答1:
Using Ruby:
ruby -e 'require "csv"
f = ARGV.shift
CSV.foreach(f).with_index{ |e, i|
File.write("#{f}.#{i}", CSV.generate_line(e, force_quotes: true))
}' file.csv
Php:
<?php
$inputFile = 'file.csv';
$outputFile = 'file.out';
$splitSize = 1;
if (($in = fopen($inputFile, 'r'))) {
$header = fgetcsv($in);
$rowCount = 0;
$fileCount = 0;
while (($data = fgetcsv($in))) {
if (($rowCount % $splitSize) == 0) {
if ($rowCount > 0) {
fclose($out);
}
$filename = $outputFile . ++$fileCount . '.csv';
$out = fopen($filename, 'w');
chmod($filename, 755);
fputcsv($out, $header);
}
fputcsv($out, $data);
$rowCount++;
}
fclose($out);
}
?>
回答2:
Linux has a great little utility called split, which can take a file and split it into chunks of whatever size you want, eg 100 line chunks.
However, for CSV files etc, each chunk generally needs to have the header row in there. Unfortunately the split command doesn’t have an option for that. However with a little bit more code you can achieve it.
tail -n +2 file.txt | split -l 4 - split_
for file in split_*
do
head -n 1 file.txt > tmp_file
cat $file >> tmp_file
mv -f tmp_file $file
done
来源:https://stackoverflow.com/questions/25241018/how-to-split-large-csv-into-multiple-small-csv-using-linux