how to split large csv into multiple small csv using linux?

南楼画角 提交于 2021-02-08 06:38:42

问题


Need to split large csv file into multiple files by lines using php and linux.

CSV contains -

"id","name","address"
"1","abc","this is test address1 which having multiple  newline
separators."
"2","abc","this is test address2
which having multiple newline  separators"
"3","abc","this is test address3.
which having multiple
newline separators."

I used linux comand - split -l 5000 testfile.

But it can not able to split csv in correct format because in csv there is one field address having multiple newline characters so command with split file from that line.

I've also tried to use PHP:

$inputFile = 'filename.csv';
$outputFile = "outputfile";
$splitSize = 5000;
$in = fopen($inputFile, 'r'):
$header = fgetcsv($in);
$rowCount = 0;
$fileCount = 1;

while (!feof($in)) { 
    if (($rowCount % $splitSize) == 0) {
        if ($rowCount > 0) {
            fclose($out);
        }   
        $filename = $outputFile . $fileCount++;
        $out = fopen($filename .'.csv', 'w');
        chmod($filename,777);
        fputcsv($out, $header);
    }   
    $data = fgetcsv($in);
    if ($data) {
        fputcsv($out, $data);
        $rowCount++;
    }   
}
fclose($out);

How to resolve this problem?


回答1:


Using Ruby:

ruby -e 'require "csv"
        f = ARGV.shift
        CSV.foreach(f).with_index{ |e, i|
            File.write("#{f}.#{i}", CSV.generate_line(e, force_quotes: true))
        }' file.csv

Php:

<?php
    $inputFile = 'file.csv';
    $outputFile = 'file.out';
    $splitSize = 1;
    if (($in = fopen($inputFile, 'r'))) {
        $header = fgetcsv($in);
        $rowCount = 0;
        $fileCount = 0;
        while (($data = fgetcsv($in))) {
            if (($rowCount % $splitSize) == 0) {
                if ($rowCount > 0) {
                    fclose($out);
                }
                $filename = $outputFile . ++$fileCount . '.csv';
                $out = fopen($filename, 'w');
                chmod($filename, 755);
                fputcsv($out, $header);
            }
            fputcsv($out, $data);
            $rowCount++;
        }
        fclose($out);
    }
?>



回答2:


Linux has a great little utility called split, which can take a file and split it into chunks of whatever size you want, eg 100 line chunks.

However, for CSV files etc, each chunk generally needs to have the header row in there. Unfortunately the split command doesn’t have an option for that. However with a little bit more code you can achieve it.

tail -n +2 file.txt | split -l 4 - split_
for file in split_*
do
head -n 1 file.txt > tmp_file
cat $file >> tmp_file
mv -f tmp_file $file
done


来源:https://stackoverflow.com/questions/25241018/how-to-split-large-csv-into-multiple-small-csv-using-linux

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!