How can I improve the performance of retrieving values from SharedStringTable in OpenXml Excel spreadsheet tools?

前端 未结 1 1131
走了就别回头了
走了就别回头了 2020-12-21 19:31

I\'m using DocumentFormat.OpenXml to read an Excel spreadsheet. I have a performance bottleneck with the code used to look up the cell value from the Shar

相关标签:
1条回答
  • 2020-12-21 20:27

    I would read the whole shared string table into your dictionary in one go rather than looking up each value as required. This will allow you to move through the file in order and stash the values ready for a hashed lookup which will be more efficient than scanning the SST for each value you require.

    Running something like the following at the start of your process will allow you to access each value using dictionary[parsedValue].

    private static void LoadDictionary()
    {
        int i = 0;
    
        foreach (var ss in sharedStringTablePart.SharedStringTable.ChildElements)
        {
            dictionary.Add(i++, ss.InnerText);
        }
    }
    

    If your file is very large, you might see some gains using a SAX approach to read the file rather than the DOM approach above:

    private static void LoadDictionarySax()
    {
        using (OpenXmlReader reader = OpenXmlReader.Create(sharedStringTablePart))
        {
            int i = 0;
            while (reader.Read())
            {
                if (reader.ElementType == typeof(SharedStringItem))
                {
                    SharedStringItem ssi = (SharedStringItem)reader.LoadCurrentElement();
                    dictionary.Add(i++, ssi.Text != null ? ssi.Text.Text : string.Empty);
                }
            }
        }
    }
    

    On my machine, using a file with 60000 rows and 2 columns it was around 300 times quicker using the LoadDictionary method above instead of the GetValue method from your question. The LoadDictionarySax method gave similar performance but on a larger file (100000 rows with 10 columns) the SAX approach was around 25% faster than the LoadDictionary method. On an even larger file (100000 rows, 26 columns), the LoadDictionary method threw an out of memory exception but the LoadDictionarySax worked without issue.

    0 讨论(0)
提交回复
热议问题