Why does OpenXML read rows twice

心不动则不痛 提交于 2019-12-07 12:53:18

问题


I count rows in two worksheets like this:

foreach (WorksheetPart worksheetPart in workbookPart.WorksheetParts)
{
    OpenXmlPartReader reader = new OpenXmlPartReader(worksheetPart);
    if (count == 0)
    {
        while (reader.Read())
        {
            if (reader.ElementType == typeof(Row))
            {
                count_first++;
            }
        }
    }
    else if (count == 1)
    {
        while (reader.Read())
        {
            if (reader.ElementType == typeof(Row))
            {
                count_second++;
            }
        }
    }
    count++;
}

For both worksheets in count_first and count_second I get twice as much as there are rows with data. Why is that and what does it actually mean? Does it mean that OpenXML parses each list twice?

EDIT

Well, I found a solution. To get it right away, I guess, you should keep this sacred knowledge in some secret place. So, here it is:

while (reader.Read())
{
    if (reader.ElementType == typeof(Row))
    {
        do
        {
            count_first++;
        }   while (reader.ReadNextSibling());
    }
}

回答1:


The reason you are getting twice the count is due to the way the OpenXmlReader reads each element. The reader treats the open and close nodes as independant items which can be differentiated by checking the IsStartElement and IsEndElement properties.

To demonstrate this you can run something like this:

using (OpenXmlReader reader = OpenXmlReader.Create(worksheetPart))
{
    while (reader.Read())
    {
        if (reader.ElementType == typeof(Row))
        {
            do
            {
                Console.WriteLine("{0} {1} {2}", 
                                  reader.ElementType,
                                  reader.IsStartElement,
                                  reader.IsEndElement);
            } while (reader.Read());

            Console.WriteLine("Finished");
        }
    }
}

Which will produce output along the lines of the following* for a sheet with two rows and two columns (I've highlighted the Rows for readibility):

Row True False
Cell True False
CellValue True False
CellValue False True
Cell False True
Cell True False
CellValue True False
CellValue False True
Cell False True
Row False True
Row True False
Cell True False
CellValue True False
CellValue False True
Cell False True
Cell True False
CellValue True False
CellValue False True
Cell False True
Row False True

There are 2 ways that you can solve this depending on how you want to read the document. The first way (as you point out in your answer) is to move to the next sibling by calling ReadNextSibling - this essentially "jumps" the end element (and any children of the Row). Changing the above example to use ReadNextSibling in the do loop:

do
{
    Console.WriteLine("{0} {1} {2}", 
                       reader.ElementType,
                       reader.IsStartElement,
                       reader.IsEndElement);
} while (reader.ReadNextSibling());

You'll get output* of:

Row True False
Row True False

The second way would be to just count the start elements (or indeed the end elements; just not both):

while (reader.Read())
{
    if (reader.ElementType == typeof(Row) && reader.IsStartElement)
    {
        count_first++;
    }
}

Which one you choose depends on whether you wish to read the Cell values and how you'd like to read them (SAX or DOM).

* In reality each row is prefixed with the namespace of "DocumentFormat.OpenXml.Spreadsheet." which I've removed for readibility.




回答2:


This works as expected:

while (reader.Read())
{
    if (reader.ElementType == typeof(Row))
    {
        do
        {
            count_first++;
        }   while (reader.ReadNextSibling());
    }
}


来源:https://stackoverflow.com/questions/28900251/why-does-openxml-read-rows-twice

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!