Fix numbering on CSV files that have deleted lines

问题

I have a bunch of CSV files that I have edited and gotten rid of all of the lines that have 'DIF' in them. The problem that I realized later is that the count number in the file stays the same as before. Here is an example of the CSV before I edit it.

Name    bunch of stuff                          
header stuff    stuff                           
header stuff    stuff                           
header stuff    stuff                           
header stuff    stuff                           
header stuff    stuff                           
Count   11                           
NUMBER,ITEM
N1,Shoe
N2,Heel
N3,Tee
N4,Polo
N5,Sneaker
N6,DIF
N7,DIF
N8,DIF
N9,DIF
N10,Heel
N11,Tee

This is how the output CSV looks. I want the number next to 'Count' to equal the number now in the 'ITEMS' column as well as have everything in the 'NUMBER' column to be sequential.

Name    bunch of stuff                          
header stuff    stuff                           
header stuff    stuff                           
header stuff    stuff                           
header stuff    stuff                           
header stuff    stuff                           
Count   11                           
NUMBER,ITEM
N1,Shoe
N2,Heel
N3,Tee
N4,Polo
N5,Sneaker
N10,Heel
N11,Tee

Here is my current code that does that. It does what I want it to, but it screws up the rest of the CSV like I mentioned above.

import csv
import glob
import os

fns = glob.glob('*.csv') #goes through every CSV file in directory

for fn in fns:
     reader=csv.reader(open(fn,"rb"))
     with open (os.path.join('out', fn), 'wb') as f:
        w = csv.writer(f)
        for row in reader:
            if not ' DIF' in row: #remove DIF
                w.writerow(row)

I've tried a few small things to fix it, but I am fairly new to programming and nothing I try seems to do much. Any help would be appreciated.

Thank You

回答1:

If you need to update the count, then you have to read twice and count the number of rows you are keeping first. You can keep a separate counter to rewrite the first column once you are writing the matched lines:

import re

numbered = re.compile(r'N\d+').match

for fn in fns:
     # open for counting
     reader = csv.reader(open(fn,"rb"))
     count = sum(1 for row in reader if row and not any(r.strip() == 'DIF' for r in row) and numbered(row[0]))

     # reopen for filtering
     reader = csv.reader(open(fn,"rb"))

     with open (os.path.join('out', fn), 'wb') as f:
        counter = 0
        w = csv.writer(f)
        for row in reader:
            if row and 'Count' in row[0].strip():
                row = ['Count', count]
            if row and not any(r.strip() == 'DIF' for r in row): #remove DIF
                if numbered(row[0]):
                    counter += 1
                    row[0] = 'N%d' % counter
            w.writerow(row)

回答2:

Your question is a little unclear I think you want N to be updated with the number relative to the position on the updated list I am assuming you are on Windows

Since it appears that you are not using row dictionaries I am going to do it a little differently

my_files = glob.glob('c:\\thedirectory\\orsubdirectorywhereyourfilesare\\*.csv')
for each_file in my_files:
    initial = open(each_file).readlines()
    no_diff = [row for row in initial if 'DIF' not in row]
    newCount =  len(no_diff) - no_diff.index('NUMBER,ITEM\n') -1  #you might have to tweak this
    outList = []
    counter = 0
    for row in no_diff:
        if 'Count' in row:
            new_row = 'Count ' + str(newCount) + '\n' # this is a new line character
            outList.append(new_row)
        elif row.startswith('NUMBER'):
            outList.append(row)
        elif row.startswith('Name'):
            outList.append(row)
        elif row.startswith('N'):
            print counter
            row_end = row.split(',')[-1]
            row_begin = 'N' + str(counter + 1)
            new_row = row_begin + ',' + row_end
            outList.append(new_row)
            counter += 1
        else:
            outList.append(row)
    outref = open(each_file)
    outref.writelines(outList)
    outref.close()

I copied this into a file

'Name    bunch of stuff                          \n'
'header stuff    stuff                           \n'
'header stuff    stuff                           \n'
'header stuff    stuff                           \n'
'header stuff    stuff                           \n'
'header stuff    stuff                           \n'
'Count   11                           \n'
'NUMBER,ITEM\n'
'N1,Shoe\n'
'N2,Heel\n'
'N3,Tee\n'
'N4,Polo\n'
'N5,Sneaker\n'
'N6,DIF\n'
'N7,DIF\n'
'N8,DIF\n'
'N9,DIF\n'
'N10,Heel\n'
'N11,Tee'

I ran the code above (which I had to tweak) and got this result

'Name    bunch of stuff                          \n'
'header stuff    stuff                           \n'
'header stuff    stuff                           \n'
'header stuff    stuff                           \n'
'header stuff    stuff                           \n'
'header stuff    stuff                           \n'
'Count 7\n'
'NUMBER,ITEM\n'
'N1,Shoe\n'
'N2,Heel\n'
'N3,Tee\n'
'N4,Polo\n'
'N5,Sneaker\n'
'N6,Heel\n'
'N7,Tee'

Now the other approach here and on you second question are definitely more elegant but elegance only comes after you really understand the code. There are too many moving parts in my opinion. You need to

read a file
handle parts of the file
write it back out

If you add in regular expressions and csv handling then you are exploding all of the areas you can get into trouble. Those are great tools and I use them often but now to start learning how to program in Python Otherwise look at csv.DictReader if your header is not too messy

来源：https://stackoverflow.com/questions/18257999/fix-numbering-on-csv-files-that-have-deleted-lines

标签

python

csv

glob