Optimizing or speeding up reading from .xy files into excel

问题

I have a few .xy files (2 columns with x and y values). I have been trying to read all of them and paste the "y" values into a single excel file (The "x" values are the same in all these files). The code I have till now reads the files one by one but its extremely slow (it takes about 20 seconds on each file). I have quite a few .xy files and the time adds up considerably. The code I have till now is:

import os,fnmatch,linecache,csv
from openpyxl import Workbook

wb = Workbook() 
ws = wb.worksheets[0]
ws.title = "Sheet1"


def batch_processing(file_name):
    row_count = sum(1 for row in csv.reader(open(file_name)))
    try:
        for row in xrange(1,row_count):

            data = linecache.getline(file_name, row)
            print data.strip().split()[1]   
            print data
            ws.cell("A"+str(row)).value = float(data.strip().split()[0])
            ws.cell("B"+str(row)).value = float(data.strip().split()[1])

        print file_name
        wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
    except IndexError:
        pass


workingdir = "C:\Users\Mine\Desktop\P22_PC"
os.chdir(workingdir)
for root, dirnames, filenames in os.walk(workingdir):
    for file_name in fnmatch.filter(filenames, "*_Cs.xy"):
        batch_processing(file_name)

Any help is appreciated. Thanks.

回答1:

I think your main issue is that you're writing to Excel and saving on every single line in the file, for every single file in the directory. I'm not sure of how long it takes to actually write the value to Excel, but just moving the save out of the loop and saving only once everything has been added should cut a little time. Also, how large are these files? If they are massive, then linecache may be a good idea, but assuming they aren't overly large then you can probably do without it.

def batch_processing(file_name):

    # Using 'with' is a better way to open files - it ensures they are
    # properly closed, etc. when you leave the code block
    with open(filename, 'rb') as f:
        reader = csv.reader(f)
        # row_count = sum(1 for row in csv.reader(open(file_name)))
        # ^^^You actually don't need to do this at all (though it is clever :)
        # You are using it now to govern the loop, but the more Pythonic way is
        # to do it as follows
        for line_no, line in enumerate(reader):
            # Split the line and create two variables that will hold val1 and val2
            val1, val2 = line
            print val1, val2 # You can also remove this - printing takes time too
            ws.cell("A"+str(line_no+1)).value = float(val1)
            ws.cell("B"+str(line_no+1)).value = float(val2)

    # Doing this here will save the file after you process an entire file.
    # You could save a bit more time and move this to after your walk statement - 
    # that way, you are only saving once after everything has completed
    wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")

来源：https://stackoverflow.com/questions/13558411/optimizing-or-speeding-up-reading-from-xy-files-into-excel

标签

python

excel

file

batch-file