I\'m making a Python script which parses data files. The parsed data is then sent to an Excel file. The data can be rather huge. I\'m looking at 10 to 20 columns, but the nu
Using the COM for reading files from a excel file is a extreme waste of time. It's like killing flyes with a tank. Take into account that the win32com does complicated calls with the windows API, that talk to excel, retrieves the data and sends it back to python. Why do that when the information is already there as a file?
There are libraries that parse directly the excel file, and as you can imagine they can be x100 times faster, since there is no over-complex calls to the win API.
I've worked a lot successfully with openpyxl, but there are others libraries out there that can be as good or even better.
Just an example for huge data (uses generators instead of loading everything into memory):
from openpyxl import load_workbook
wb = load_workbook(filename='large_file.xlsx', use_iterators=True)
ws = wb.get_sheet_by_name(name='big_data') # ws is now an IterableWorksheet
for row in ws.iter_rows(): # it brings a new method: iter_rows()
for cell in row:
print cell.internal_value
Equivalent methods are available to write into cells. You can even format them, although it is not (or used to be) very complete.
EDIT
Example on how to write a big amount of info to a xlsx file:
from openpyxl import Workbook
from openpyxl.cell import get_column_letter
wb = Workbook()
dest_filename = r'empty_book.xlsx'
ws = wb.active
ws.title = "range names"
for col_idx in xrange(1, 40):
col = get_column_letter(col_idx)
for row in xrange(1, 600):
ws.cell('%s%s'%(col, row)).value = '%s%s' % (col, row)
ws = wb.create_sheet()
ws.title = 'Pi'
ws['F5'] = 3.14
wb.save(filename=dest_filename)