What indicates an Office Open XML Cell contains a Date/Time value?

可紊 提交于 2019-11-26 19:46:33

The s attribute references a style xf entry in styles.xml. The style xf in turn references a number format mask. To identify a cell that contains a date, you need to perform the style xf -> numberformat lookup, then identify whether that numberformat mask is a date/time numberformat mask (rather than, for example, a percentage or an accounting numberformat mask).

The style.xml file has elements like:

<xf numFmtId="14" ... applyNumberFormat="1" />
<xf numFmtId="1" ... applyNumberFormat="1" />

These are the xf entries, which in turn give you a numFmtId that references the number format mask.

You should find the numFmts section somewhere near the top of style.xml, as part of the styleSheet element

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> 
    <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
        <numFmts count="3">
            <numFmt numFmtId="164" formatCode="[$-414]mmmm\ yyyy;@" /> 
            <numFmt numFmtId="165" formatCode="0.000" /> 
            <numFmt numFmtId="166" formatCode="#,##0.000" /> 
        </numFmts>

The number format id may be here, or it may be one of the built-in formats. Number format codes (numFmtId) less than 164 are "built-in".

The list that I have is incomplete:

0 = 'General';
1 = '0';
2 = '0.00';
3 = '#,##0';
4 = '#,##0.00';

9 = '0%';
10 = '0.00%';
11 = '0.00E+00';
12 = '# ?/?';
13 = '# ??/??';
14 = 'mm-dd-yy';
15 = 'd-mmm-yy';
16 = 'd-mmm';
17 = 'mmm-yy';
18 = 'h:mm AM/PM';
19 = 'h:mm:ss AM/PM';
20 = 'h:mm';
21 = 'h:mm:ss';
22 = 'm/d/yy h:mm';

37 = '#,##0 ;(#,##0)';
38 = '#,##0 ;[Red](#,##0)';
39 = '#,##0.00;(#,##0.00)';
40 = '#,##0.00;[Red](#,##0.00)';

44 = '_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)';
45 = 'mm:ss';
46 = '[h]:mm:ss';
47 = 'mmss.0';
48 = '##0.0E+0';
49 = '@';

27 = '[$-404]e/m/d';
30 = 'm/d/yy';
36 = '[$-404]e/m/d';
50 = '[$-404]e/m/d';
57 = '[$-404]e/m/d';

59 = 't0';
60 = 't0.00';
61 = 't#,##0';
62 = 't#,##0.00';
67 = 't0%';
68 = 't0.00%';
69 = 't# ?/?';
70 = 't# ??/??';

The missing values are mainly related to east asian variant formats.

The chosen answer is spot-on, but note that Excel defines some number format (numFmt) codes differently from the OpenXML spec. Per the Open XML SDK 2.5 Productivity Tool's documentation (on the "Implementer Notes" tab for the NumberingFormat class):

The standard defines built-in format ID 14: "mm-dd-yy"; 22: "m/d/yy h:mm"; 37: "#,##0 ;(#,##0)"; 38: "#,##0 ;[Red]"; 39: "#,##0.00;(#,##0.00)"; 40: "#,##0.00;[Red]"; 47: "mmss.0"; KOR fmt 55: "yyyy-mm-dd".

Excel defines built-in format ID
14: "m/d/yyyy"
22: "m/d/yyyy h:mm"
37: "#,##0_);(#,##0)"
38: "#,##0_);[Red]"
39: "#,##0.00_);(#,##0.00)"
40: "#,##0.00_);[Red]"
47: "mm:ss.0"
55: "yyyy/mm/dd"

Most are minor variations, but #14 is a doozy. I wasted a couple of hours troubleshooting why leading zeros weren't being added to single-digits months and days (e.g. 01/05/14 vs. 1/5/14).

Thought I'd add my solution that I've put together to determine if the double value FromOADate is really a date or not. Reason being is I have a zip code in my excel file as well. The numberingFormat will be null if it's text.

Alternatively you could use the numberingFormatId and check against a list of Ids that Excel uses for dates.

In my case I've explicitly determined the formatting of all fields for the client.

    /// <summary>
    /// Creates the datatable and parses the file into a datatable
    /// </summary>
    /// <param name="fileName">the file upload's filename</param>
    private void ReadAsDataTable(string fileName)
    {
        try
        {
            DataTable dt = new DataTable();
            using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(string.Format("{0}/{1}", UploadPath, fileName), false))
            {
                WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
                IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
                string relationshipId = sheets.First().Id.Value;
                WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
                Worksheet workSheet = worksheetPart.Worksheet;
                SheetData sheetData = workSheet.GetFirstChild<SheetData>();
                IEnumerable<Row> rows = sheetData.Descendants<Row>();

                var cellFormats = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats;
                var numberingFormats = workbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats;

                // columns omitted for brevity

                // skip first row as this row is column header names
                foreach (Row row in rows.Skip(1))
                {
                    DataRow dataRow = dt.NewRow();

                    for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
                    {
                        bool isDate = false;
                        var styleIndex = (int)row.Descendants<Cell>().ElementAt(i).StyleIndex.Value;
                        var cellFormat = (CellFormat)cellFormats.ElementAt(styleIndex);

                        if (cellFormat.NumberFormatId != null)
                        {
                            var numberFormatId = cellFormat.NumberFormatId.Value;
                            var numberingFormat = numberingFormats.Cast<NumberingFormat>()
                                .SingleOrDefault(f => f.NumberFormatId.Value == numberFormatId);

                            // Here's yer string! Example: $#,##0.00_);[Red]($#,##0.00)
                            if (numberingFormat != null && numberingFormat.FormatCode.Value.Contains("mm/dd/yy"))
                            {
                                string formatString = numberingFormat.FormatCode.Value;
                                isDate = true;
                            }
                        }

                        // replace '-' with empty string
                        string value = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i), isDate);
                        dataRow[i] = value.Equals("-") ? string.Empty : value;
                    }

                    dt.Rows.Add(dataRow);
                }
            }

            this.InsertMembers(dt);
            dt.Clear();
        }
        catch (Exception ex)
        {
            LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex);
        }
    }

    /// <summary>
    /// Reads the cell's value
    /// </summary>
    /// <param name="document">current document</param>
    /// <param name="cell">the cell to read</param>
    /// <returns>cell's value</returns>
    private string GetCellValue(SpreadsheetDocument document, Cell cell, bool isDate)
    {
        string value = string.Empty;

        try
        {
            SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;
            value = cell.CellValue.InnerXml;

            if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)
            {
                return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;
            }
            else
            {
                // check if this is a date or zip.
                // integers will be passed into this else statement as well. 
                if (isDate)
                {
                    value = DateTime.FromOADate(double.Parse(value)).ToString();
                }

                return value;
            }
        }
        catch (Exception ex)
        {
            LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex);
        }

        return value;
    }

In styles.xml see if there is a numFmt node. I think that will hold a numFmtId of "9" which will relate to the date format that's used.

I don't know where that is in the ECMA, but if you search for numFmt, you might find it.

In case anyone else is having a hard time with this, here is what I've done:

1) Create a new excel file and put in a date time string in cell A1

2) Change formatting on the cell to whatever you want, then save file.

3) Run following powershell script to extract out the stylesheet from .xlxs

[Reflection.Assembly]::LoadWithPartialName("DocumentFormat.OpenXml")

$xlsx = (ls C:\PATH\TO\FILE.xlsx).FullName
$package = [DocumentFormat.OpenXml.Packaging.SpreadsheetDocument]::Open($xlsx, $true)

[xml]$style = $package.WorkbookPart.WorkbookStylesPart.Stylesheet.OuterXml
Out-File -InputObject $style.OuterXml -FilePath "style.xml"

style.xml now contains the information that you can inject to DocumentFormat.OpenXml.Spreadsheet.Stylesheet(string outerXml), leading to

4) Use the extracted file to construct excel object model

var style = File.ReadAllText(@"c:\PATH\TO\EXTRACTED\Style.xml");
var stylesheetPart = WorkbookPart_REFERENCE.AddNewPart<WorkbookStylesPart>();
stylesheetPart.Stylesheet = new Stylesheet(style);
stylesheetPart.Stylesheet.Save();
Gordon1000

It was unclear to me how to reliably determine whether a cell has date/time value. After spending some time experimenting I had come up with the code (see post) that would look for both built-in and custom date/time formats.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!