How can I break down a large csv file into small files based on common records by python

自闭症网瘾萝莉.ら 提交于 2019-12-02 13:39:23

For the data you have provided, the following script will produce your requested output files. It will perform this operation on ALL CSV files found in the folder:

from itertools import groupby
import glob
import csv
import os

def remove_unwanted(rows):
    return [['' if col == 'NULL' else col for col in row[2:]] for row in rows]

output_folder = 'temp'  # make sure this folder exists

# Search for ALL CSV files in the current folder
for csv_filename in glob.glob('*.csv'):
    with open(csv_filename) as f_input:
        basename = os.path.splitext(os.path.basename(csv_filename))[0]      # e.g. bigfile

        csv_input = csv.reader(f_input)
        header = next(csv_input)
        # Create a list of entries with '0' in last column
        id_list = remove_unwanted(row for row in csv_input if row[7] == '0')
        f_input.seek(0)     # Go back to the start
        header = remove_unwanted([next(csv_input)])

        for k, g in groupby(csv_input, key=lambda x: x[1]):
            if k == '':
                break

            # Format an output file name in the form 'bigfile_53.csv'
            file_name = os.path.join(output_folder, '{}_{}.csv'.format(basename, k))

            with open(file_name, 'wb') as f_output:
                csv_output = csv.writer(f_output)
                csv_output.writerows(header)
                csv_output.writerows(remove_unwanted(g))
                csv_output.writerows(id_list)

This will result in the files bigfile_53.csv, bigfile_59.csv and bigfile_61.csv being created in an output folder called temp. For example bigfile_53.csv will appear as follows:

Entries containing the string 'NULL' will be converted to an empty string, and the first two columns will be removed (as per OP's comment).

Tested in Python 2.7.9

You should look into the csv module. You can read your input file line by line and sort each line according to the BB column. This should be easy to do with a dictionary who's keys are the value in the BB column and the values are a list containing the information from that row. You can then write these lists to csv files using the csv module.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!