.NET OpenXML performance issues

时光毁灭记忆、已成空白 提交于 2019-12-03 13:19:58

So it looks like someone in the MSDN community docs ran into similar performance implications. The code below is very inefficient. Someone recommended using a hash table.

For our solution we just removed the insertion of shared strings altogether and went from 1:03 seconds to 0:03 seconds in download time.

//Old: (1:03)
            cell = ExcelWriter.InsertCellIntoWorksheet("A", rowOffset, workSheetPart);
            index = ExcelWriter.InsertSharedStringItem(thing.CreateDate.ToShortDateString(), sharedStringPart);
            cell.CellValue = new CellValue(index.ToString());
            cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.SharedString);

 //New: (0:03)
             cell = ExcelWriter.InsertCellIntoWorksheet("A", rowOffset, workSheetPart);
             cell.CellValue = new CellValue(thing.CreateDate.ToShortDateString());
              cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.String);

MSDN Docs (slow solution, they should use a Hash Table instead)

      private static int InsertSharedStringItem(string text, SharedStringTablePart         shareStringPart)
  {
// If the part does not contain a SharedStringTable, create one.
if (shareStringPart.SharedStringTable == null)
{
    shareStringPart.SharedStringTable = new SharedStringTable();
}

int i = 0;

// Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
foreach (SharedStringItem item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
{
    if (item.InnerText == text)
    {
        return i;
    }

    i++;
}

// The text does not exist in the part. Create the SharedStringItem and return its index.
shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new DocumentFormat.OpenXml.Spreadsheet.Text(text)));
shareStringPart.SharedStringTable.Save();

return i;
 }  

@The Internet

Note that String data type is actually for formulas, for text should use InlineString. See 17.18.11 ST_CellType (Cell Type):

  • inlineStr (Inline String) - Cell containing an (inline) rich string, i.e., one not in the shared string table. If this cell type is used, then the cell value is in the is element rather than the v element in the cell (c element).
  • str (String) - Cell containing a formula string.

The big improment is more Save() function out of loop

 //Save data
        shareStringPart.SharedStringTable.Save();
        worksheetPart.Worksheet.Save();

For 500 records, for me it change from 10 mins to 1 min.

@kunjee

If you want performance create all required objects upfront so that are not checked on each invocation of this method. This is why SharedStringTable is passed in as parameter instead of the part.

Dictionaries are for fast, indexed lookup, have better performance than a for loop. Are bit faster than hashtables because are strongly typed so don't require boxing. Being strongly typed is a great benefit anyway.

private static int InsertSharedStringItem(string sharedString, SharedStringTable sharedStringTable, Dictionary<string, int> sharedStrings)
{
    int sharedStringIndex;

    if (!sharedStrings.TryGetValue(sharedString, out sharedStringIndex))
    {
        // The text does not exist in the part. Create the SharedStringItem now.
        sharedStringTable.AppendChild(new SharedStringItem(new Text(sharedString)));

        sharedStringIndex = sharedStrings.Count;

        sharedStrings.Add(sharedString, sharedStringIndex);
    }

    return sharedStringIndex;
}

As mentioned by The Internet, they should have used a Hashtable and as proposed by zquanghoangz they should have moved the Save() out of the loop.

InlineString does work, but it gives MS Excel a headache when opening the generated file with uninformative error messages which can be repaired, but still gives an annoying pop-up.

static Cell AddCellWithSharedStringText(
    [NotNull]string text, 
    [NotNull]Hashtable texts, 
    [NotNull]SharedStringTablePart shareStringPart
)
{
    if (!texts.ContainsKey(text))
    {
        shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new Text(text)));
        texts[text] = texts.Count;
    }
    var idx = (int)texts[text];
    Cell c1 = new Cell();
    c1.DataType = CellValues.SharedString;
    c1.CellValue = new CellValue(idx.ToString());
    return c1;
}

This solution brought the export time down from ~5 minutes to 6 seconds on a [9880 x 66] grid.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!