Optimizing performance for ClosedXML loops and row deletion

最后都变了- 提交于 2020-05-17 06:11:51

问题


I'm reading an Excel file and looping through the rows, deleting those that meet a condition

using (var wb = new XLWorkbook(path))
{
    var ws = wb.Worksheet(sheet);
    int deleted = 0;
    for (int row_i = 2; row_i <= ws.LastRowUsed().RowNumber(); row_i++)
    {
        ExcelRow row = new ExcelRow(ws.Row(row_i-deleted));
        row.styleCol = header.styleCol;
        K key = keyReader(row);
        if (!writeData(row,dict[key])) deleted++;
    }
    wb.Save();
}

The code is very slow for a file with thousands of rows, even without deletions, or when hundreds of rows must be deleted.


回答1:


First, please read the speed rant: https://ericlippert.com/2012/12/17/performance-rant/

As for optimisation potential:

The bottleneck should be the Disk. Unless you got something like a RAID 0 of SSD's or some serious computation in keyReader or those dictionaries , there is no way the CPU will be a relevant factor. So the most important thing is to never retreive the same value twice.

If you want to eliminate the compuatation time, you could implement some defered background loading of the next column. You should be easily able to replace direct access with a Enumerator. This will reduce the execution time basically down to Disk speed.




回答2:


There are 2 important optimizations you have to do. The first is quite trivial, but has a great impact: you need to store the last row, because the function to get it is time expensive, more than you could expect.

int lastrow = ws.LastRowUsed().RowNumber();
for (int row_i = 2; row_i <= lastrow; row_i++)

The second is a bit more involved and it is related to the multiple (and slow) row/cell shifts (XLShiftDeletedCells.ShiftCellsUp) when you don't delete a single range. In that case I can suggest a workaround. Do not delete the single row during your writeData - notice that therefore you won't decrement

ExcelRow row = new ExcelRow(ws.Row(row_i)); // no deletion in the loop

your loop index - but momentarily add a column (temp_col) to mark the rows as "ok" or "skip" and eventually sort it, so that you can delete all the rows in a single range.

if (deleted > 0)
{
    int lastcol = ws.LastColumnUsed().ColumnNumber();
    var tab = ws.Range(ws.Cell(2, 1), ws.Cell(lastrow, lastcol));
    tab.Sort(temp_col);
    tab = ws.Range(ws.Cell(lastrow - deleted + 1, 1), ws.Cell(lastrow, lastcol));
    tab.Delete(XLShiftDeletedCells.ShiftCellsUp);
}
ws.Column(temp_col).Delete();

Performance Test

No need to add anything about the first point. The second is original of this answer and I can confirm that, by measuring the elapsed time with a Stopwatch, the observed reduction of the execution time is more than 80% in my situation (from 200 to 27 seconds).



来源:https://stackoverflow.com/questions/61749972/optimizing-performance-for-closedxml-loops-and-row-deletion

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!