delete rows in csv based on specific column value python

我的梦境 提交于 2021-02-10 18:24:50

问题


I have a large csv with the following header columns id, type, state, location, number of students

and the following values:

124, preschool, Pennsylvania, Pittsburgh, 1242
421, secondary school, Ohio, Cleveland, 1244
213, primary school, California, Los Angeles, 3213
155, secondary school, Pennsylvania, Pittsburgh, 2141
etc...

The file is not ordered and I want a new csv file that contains all the schools with the number of students above 2000.

The answers that I found were regarding to ordered csv files, or splitting them after a specific number of rows.


回答1:


Here's a solution using csv module:

import csv

with open('fin.csv', 'r') as fin, open('fout.csv', 'w', newline='') as fout:

    # define reader and writer objects
    reader = csv.reader(fin, skipinitialspace=True)
    writer = csv.writer(fout, delimiter=',')

    # write headers
    writer.writerow(next(reader))

    # iterate and write rows based on condition
    for i in reader:
        if int(i[-1]) > 2000:
            writer.writerow(i)

Result:

id,type,state,location,number of students
213,primary school,California,Los Angeles,3213
155,secondary school,Pennsylvania,Pittsburgh,2141



回答2:


In case you just want to read file and avoid any other processing, you can use regex - (assuming this is the last column, and value are positive integers) -

import re
f1 = open('Test1.txt','wb')
with open("Test.txt") as f:
    for line in f:
        match = re.search(r'[2-9][0-9]{3,}$', line)
        if (match):
            f1.write(line)

f1.close()

Same thing will be much faster if you do it on bash -

while read line; do
  K='[2-9][0-9]{3,}$'
  if [[ $line =~ $K ]] ; then echo $line; fi
done <Test.txt


来源:https://stackoverflow.com/questions/50623009/delete-rows-in-csv-based-on-specific-column-value-python

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!