问题
I have a few .xy files (2 columns with x and y values). I have been trying to read all of them and paste the "y" values into a single excel file (The "x" values are the same in all these files). The code I have till now reads the files one by one but its extremely slow (it takes about 20 seconds on each file). I have quite a few .xy files and the time adds up considerably. The code I have till now is:
import os,fnmatch,linecache,csv
from openpyxl import Workbook
wb = Workbook()
ws = wb.worksheets[0]
ws.title = "Sheet1"
def batch_processing(file_name):
row_count = sum(1 for row in csv.reader(open(file_name)))
try:
for row in xrange(1,row_count):
data = linecache.getline(file_name, row)
print data.strip().split()[1]
print data
ws.cell("A"+str(row)).value = float(data.strip().split()[0])
ws.cell("B"+str(row)).value = float(data.strip().split()[1])
print file_name
wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
except IndexError:
pass
workingdir = "C:\Users\Mine\Desktop\P22_PC"
os.chdir(workingdir)
for root, dirnames, filenames in os.walk(workingdir):
for file_name in fnmatch.filter(filenames, "*_Cs.xy"):
batch_processing(file_name)
Any help is appreciated. Thanks.
回答1:
I think your main issue is that you're writing to Excel and saving on every single line in the file, for every single file in the directory. I'm not sure of how long it takes to actually write the value to Excel, but just moving the save
out of the loop and saving only once everything has been added should cut a little time. Also, how large are these files? If they are massive, then linecache
may be a good idea, but assuming they aren't overly large then you can probably do without it.
def batch_processing(file_name):
# Using 'with' is a better way to open files - it ensures they are
# properly closed, etc. when you leave the code block
with open(filename, 'rb') as f:
reader = csv.reader(f)
# row_count = sum(1 for row in csv.reader(open(file_name)))
# ^^^You actually don't need to do this at all (though it is clever :)
# You are using it now to govern the loop, but the more Pythonic way is
# to do it as follows
for line_no, line in enumerate(reader):
# Split the line and create two variables that will hold val1 and val2
val1, val2 = line
print val1, val2 # You can also remove this - printing takes time too
ws.cell("A"+str(line_no+1)).value = float(val1)
ws.cell("B"+str(line_no+1)).value = float(val2)
# Doing this here will save the file after you process an entire file.
# You could save a bit more time and move this to after your walk statement -
# that way, you are only saving once after everything has completed
wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
来源:https://stackoverflow.com/questions/13558411/optimizing-or-speeding-up-reading-from-xy-files-into-excel