How can I split a large file csv file (7GB) in Python

前端 未结 5 1709
我在风中等你
我在风中等你 2020-12-23 22:34

I have a 7GB csv file which I\'d like to split into smaller chunks, so it is readable and faster for analysis in Python on a notebook. I would like to grab a sm

相关标签:
5条回答
  • 2020-12-23 22:48

    You don't need Python to split a csv file. Using your shell:

    $ split -l 100 data.csv
    

    Would split data.csv in chunks of 100 lines.

    0 讨论(0)
  • 2020-12-23 22:57

    I had to do a similar task, and used the pandas package:

    for i,chunk in enumerate(pd.read_csv('bigfile.csv', chunksize=500000)):
        chunk.to_csv('chunk{}.csv'.format(i), index=False)
    
    0 讨论(0)
  • 2020-12-23 22:58

    See the Python docs on file objects (the object returned by open(filename) - you can choose to read a specified number of bytes, or use readline to work through one line at a time.

    0 讨论(0)
  • 2020-12-23 23:08

    I agree with @jonrsharpe readline should be able to read one line at a time even for big files.

    If you are dealing with big csv files might I suggest using pandas.read_csv. I often use it for the same purpose and always find it awesome (and fast). Takes a bit of time to get used to idea of DataFrames. But once you get over that it speeds up large operations like yours massively.

    Hope it helps.

    0 讨论(0)
  • 2020-12-23 23:09

    Maybe something like this?

    #!/usr/local/cpython-3.3/bin/python
    
    import csv
    
    divisor = 10
    
    outfileno = 1
    outfile = None
    
    with open('big.csv', 'r') as infile:
        for index, row in enumerate(csv.reader(infile)):
            if index % divisor == 0:
                if outfile is not None:
                    outfile.close()
                outfilename = 'big-{}.csv'.format(outfileno)
                outfile = open(outfilename, 'w')
                outfileno += 1
                writer = csv.writer(outfile)
            writer.writerow(row)
    
    0 讨论(0)
提交回复
热议问题