OpenXML to Create a DataTable from Excel - Money Cell Value Incorrect

后端 未结 1 510
不思量自难忘°
不思量自难忘° 2020-12-21 07:15

I am attempting to create a datatable from an Excel spreadsheet using OpenXML. When getting a row\'s cell value using Cell.CellValue.innerXml the value returned for a moneta

相关标签:
1条回答
  • 2020-12-21 07:38

    As you point out in your question, the format is stored separately from the cell value using number formats in the stylesheet.

    You should be able to extend the code you have for formatting dates to include formatting for numbers. Essentially you need to grab the NumberingFormat that corresponds to the cellFormat.NumberFormatId.Value you are already reading. The NumberingFormat can be found in the styleSheet.NumberingFormats elements.

    Once you have this you can access the FormatCode property of the NumberingFormat which you can then use to format your data as you see fit.

    Unfortunately the format is not quite that straightforward to use. Firstly, according to MSDN here not all formats are written to the file so I guess you will have to have those somewhere accessible and load them depending on the NumberFormatId you have.

    Secondly the format of the format string is not compatable with C# so you'll need to do some manipulation. Details of the format layout can be found on MSDN here.

    I have knocked together some sample code that handles the currency situation you have in your question but you may need to give some more thought to the parsing of the excel format string into a C# one.

    private static string GetCellValue(SharedStringTablePart stringTablePart, DocumentFormat.OpenXml.Spreadsheet.Cell cell, DocumentFormat.OpenXml.Spreadsheet.Stylesheet styleSheet)
    {
        string value = cell.CellValue.InnerXml;
    
        if (cell.DataType != null && cell.DataType.Value == DocumentFormat.OpenXml.Spreadsheet.CellValues.SharedString)
        {
            return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
        }
        else
        {
            if (cell.StyleIndex != null)
            {
                DocumentFormat.OpenXml.Spreadsheet.CellFormat cellFormat = (DocumentFormat.OpenXml.Spreadsheet.CellFormat)styleSheet.CellFormats.ChildElements[(int)cell.StyleIndex.Value];
    
                int formatId = (int)cellFormat.NumberFormatId.Value;
    
                if (formatId == 14) //[h]:mm:ss
                {
                    DateTime newDate = DateTime.FromOADate(double.Parse(value));
                    value = newDate.Date.ToString(CultureInfo.InvariantCulture);
                }
                else
                {
                    //find the number format
                    NumberingFormat format = styleSheet.NumberingFormats.Elements<NumberingFormat>()
                                    .FirstOrDefault(n => n.NumberFormatId == formatId);
                    double temp;
    
                    if (format != null 
                        && format.FormatCode.HasValue 
                        && double.TryParse(value, out temp))
                    {
                        //we have a format and a value that can be represented as a double
    
                        string actualFormat = GetActualFormat(format.FormatCode, temp);
                        value = temp.ToString(actualFormat);
                    }
                }
            }
            return value;
        }
    }
    
    private static string GetActualFormat(StringValue formatCode, double value)
    {
        //the format is actually 4 formats split by a semi-colon
        //0 for positive, 1 for negative, 2 for zero (I'm ignoring the 4th format which is for text)
        string[] formatComponents = formatCode.Value.Split(';');
    
        int elementToUse = value > 0 ? 0 : (value < 0 ? 1 : 2);
    
        string actualFormat = formatComponents[elementToUse];
    
        actualFormat = RemoveUnwantedCharacters(actualFormat, '_');
        actualFormat = RemoveUnwantedCharacters(actualFormat, '*');
    
        //backslashes are an escape character it seems - I'm ignoring them
        return actualFormat.Replace("\"", ""); ;
    }
    
    private static string RemoveUnwantedCharacters(string excelFormat, char character)
    {
        /*  The _ and * characters are used to control lining up of characters
            they are followed by the character being manipulated so I'm ignoring
            both the _ and * and the character immediately following them.
            Note that this is buggy as I don't check for the preceeding
            backslash escape character which I probably should
            */
        int index = excelFormat.IndexOf(character);
        int occurance = 0;
        while (index != -1)
        {
            //replace the occurance at index using substring
            excelFormat = excelFormat.Substring(0, index) + excelFormat.Substring(index + 2);
            occurance++;
            index = excelFormat.IndexOf(character, index);
        }
        return excelFormat;
    }
    

    Given a sheet with the value 570.80999999999995 formatted using currency (in the UK) the output I get is £570.81.

    0 讨论(0)
提交回复
热议问题