Convert large CSV files to JSON

∥☆過路亽.° 提交于 2019-12-05 00:13:03

问题


I don't mind if this is done with a separate program, with Excel, in NodeJS or in a web app.

It's exactly the same problem as described here:

Large CSV to JSON/Object in Node.js

It seems that the OP didn't get that answer to work (yet accepted it anyway?). I've tried working with it but can't seem to get it to work either.

In short: I'm working with a ~50,000 row CSV and I want to convert it to JSON. I've tried just about every online "csv to json" webapp out there, all crash with this large of a dataset.

I've tried many Node CSV to JSON modules but, again, they all crash. The csvtojson module seemed promising, but I got this error: FATAL ERROR: JS Allocation failed - process out of memory.

What on earth can I do to get this data in a useable format? As above, I don't mind if it's an application, something that works within Excel, a webapp or a Node module, so long as I either get a .JSON file or an object that I can work with within Node.

Any ideas?


回答1:


You mentioned csvtojson module above and that is an open source project which I am maintaining.

I am sorry it did not work out for you and it was caused by a bug solved several months ago. I also added some extra lines in README for your scenario. Please check out Process Big CSV File in Command Line.

Please make sure you have the latest csvtojson release. (Currently it is 0.2.2)

You can update it by running

npm install -g csvtojson

After you've installed latest csvtojson, you just need to run:

csvtojson [path to bigcsvdata] > converted.json

This streams data from the csvfile. Or if you want to stream data from another application:

cat [path to bigcsvdata] | csvtojson > converted.json

They will output the same thing.

I have manually tested it with a csv file over 3 million records and it works without an issue.

I believe you just need a simple tool. The purpose of the lib is to relief stress like this. Please do let me know if you meet any problems next time so I could solve it in time.




回答2:


The npm csv package is able to process a CSV stream, without having to store the full file in memory. You'll need to install node.js and csv (npm install csv). Here is a sample application, which will write JSON objects to a file:

var csv = require('csv')
var fs = require('fs')
var f = fs.createReadStream('Fielding.csv')
var w = fs.createWriteStream('out.txt')

w.write('[');

csv()
.from.stream(f, {columns:true})
.transform(function(row, index) {
    return (index === 0 ? '' : ',\n') + JSON.stringify(row);
})
.to.stream(w, {columns: true, end: false})
.on('end', function() {
     w.write(']');
     w.end();
 });

Please note the columns options, needed to keep the columns name in the JSON objects (otherwise you'll get a simple array) and the end options set to false, which tells node not to close the file stream when the CSV stream closes: this allows us to add the last ']'. The transform callback provides a way for your program to hook into the data stream and transform the data before it is written to the next stream.




回答3:


When you work with such large dataset, you need to write streamed processing rather than load > convert > save. As loading such big thing - would not fit the memory.

CSV file it self is very simple and has little differences over formats. So you can write simple parser yourself. As well JSON is usually simple as well, and can be easily processed line by line without need of loading whole thing.

  1. createReadStream from CSV file.
  2. createWriteStream for new JSON file.
  3. on('data', ...) process read data: append to general string, and extract full lines if available.
  4. On the way if line/lines available from readStream, convert them to JSON objects and push into writeStream of new JSON file.

This is well doable with pipe and own pipe in the middle that will convert lines into objects to be written into new file.

This approach will allow to avoid loading the whole file into memory, but process it gradually with load part, process and write it and go forward slowly.




回答4:


You can try use OpenRefine (or Google Refine).

Import your CSV file. Then you can export. Edit template for a JSON format.

http://multimedia.journalism.berkeley.edu/tutorials/google-refine-export-json/




回答5:


This should do the job.

npm i --save csv2json fs-extra // install the modules

const csv2json = require('csv2json');
const fs = require('fs-extra');

const source = fs.createReadStream(__dirname + '/data.csv');
const output = fs.createWriteStream(__dirname + '/result.json');
 source
   .pipe(csv2json())
   .pipe(output );



回答6:


  • Use python CLI

converts all csv files in a folder to json file, no \n\r

import json
import csv

for x in range(1, 11):
    f = open('9447440523-Huge'+str(x)+'.csv', 'r')
    reader = csv.DictReader(f)
    i=0;
    jsonoutput = str(x)+'.json'
    with open(jsonoutput, 'a') as f:
            f.write('[')
            for x in reader:
                json.dump(x, f)
                f.write(',')
            f.write(']')


来源:https://stackoverflow.com/questions/18759516/convert-large-csv-files-to-json

工具导航Map

JSON相关