Openpyxl optimizing cells search speed

白昼怎懂夜的黑 提交于 2019-11-27 16:21:12

Looping over a worksheet multiple times is inefficient. The reason for the search getting progressively slower looks to be increasingly more memory being used in each loop. This is because last_row = FindXlCell("Cell[0,0]", last_row) means that the next search will create new cells at the end of the rows: openpyxl creates cells on demand because rows can be technically empty but cells in them are still addressable. At the end of your script the worksheet has a total of 598000 rows but you always start searching from A1.

If you wish to search a large file for text multiple times then it would probably make sense to create a matrix keyed by the text with the coordinates being the value.

Something like:

matrix = {}
for row in ws:
    for cell in row:
         matrix[cell.value] = (cell.row, cell.col_idx)

In a real-world example you'd probably want to use a defaultdict to be able to handle multiple cells with the same text.

This could be combined with read-only mode for a minimal memory footprint. Except, of course, if you want to edit the file.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!