Openpyxl optimizing cells search speed

前端未结

关注

 1  1402

I need to search the Excel sheet for cells containing some pattern. It takes more time than I can handle. The most optimized code I could write is below. Since the data patt

相关标签:

1条回答

醉酒成梦

2020-12-04 03:47
Looping over a worksheet multiple times is inefficient. The reason for the search getting progressively slower looks to be increasingly more memory being used in each loop. This is because last_row = FindXlCell("Cell[0,0]", last_row) means that the next search will create new cells at the end of the rows: openpyxl creates cells on demand because rows can be technically empty but cells in them are still addressable. At the end of your script the worksheet has a total of 598000 rows but you always start searching from A1.

If you wish to search a large file for text multiple times then it would probably make sense to create a matrix keyed by the text with the coordinates being the value.

Something like:
```
matrix = {}
for row in ws:
    for cell in row:
         matrix[cell.value] = (cell.row, cell.col_idx)
```
In a real-world example you'd probably want to use a defaultdict to be able to handle multiple cells with the same text.

This could be combined with read-only mode for a minimal memory footprint. Except, of course, if you want to edit the file.
0 讨论(0)
发布评论:

提交评论
- 加载中...