XML parse VBA excel (function trip, & MSXML2.DOMDocument)

后端 未结 2 1635
故里飘歌
故里飘歌 2020-12-11 12:52

I need to parse hundreds of XML files having all the same structure as follows:


  
    

        
相关标签:
2条回答
  • 2020-12-11 13:32

    It is the special characters (german alphabet) meaning you need to do something like a batch replace on the XML files so opening line is not this:

    <?xml version="1.0" encoding="UTF-8"?>
    

    but this:

    <?xml version="1.0" encoding="iso-8859-1" ?>
    

    Code to test with after:

    Option Explicit
    Public Sub test()
        Dim xmlDoc As Object
        Set xmlDoc = CreateObject("MSXML2.DOMDocument") 'New MSXML2.DOMDocument60
        With xmlDoc
            .validateOnParse = True
            .setProperty "SelectionLanguage", "XPath"
            .async = False
            If Not .Load("C:\Users\User\Desktop\Test.xml") Then
                Err.Raise .parseError.ErrorCode, , .parseError.reason
            End If
        End With
        Debug.Print xmlDoc.SelectNodes("//Query").Length
    End Sub
    

    This is the XML I am using:

    <?xml version="1.0" encoding="iso-8859-1" ?>
      <Concepts>
          <ConceptModel name="food">
        <Filters>
          <Filter type="CC"/>
        </Filters>
        <Queries>
          <Query lang="EN">(cheese, bread, wine)</Query>
          <Query lang="DE">(Käse, Brot, Wein)</Query>
          <Query lang="FR">(fromaige, pain, vin)</Query>
       </Queries>
      </ConceptModel>
    </Concepts>
    
    0 讨论(0)
  • 2020-12-11 13:44

    As close as possible to your OP

    I 'd draw your attention to several errors or misunderstandings:

    • [1] Invalid .LoadXML Syntax

    What is then the difference between .LoadXML ("C:\folder\folder\name.xml") and .Load ("C:\folder\folder\name.xml") ?

    Load expects a file path and then loads the file content into the oXML object.

    LoadXML doesn't expect a file parameter, but its actual XML text content that has to be a well formed string.

    • [2] XML distinguishes between lower and upper case, therefore nodes need to be addressed by their exact literal names: the <Query> node wouldn't be identified by "query", "ConceptModel" isn't the same as "conceptmodel".

    As second issue I would like to ask if Dim oXml As MSXML2.DOMDocument would be the same as Dim oXml As MSXML2.DOMDocument60, since I checked in tools/references "Microsof XML, v6.0"?

    No, it isn't. - Please note that the former declaration would load version 3.0 by default. However it's absolutely preferrable to get the version 6.0 (any other versions are obsolete nowadays!)

    As you are using so called early binding (referencing "Microsoft XML, v6.0"), I'll do the same but am referring to the current version 6.0:

    Dim oXml As MSXML2.DOMDocument60        ' declare the xml doc object
    Set oXml = New MSXML2.DOMDocument60     ' set an instance of it to memory
    
    • [3] misunderstanding some XPath expressions

    A starting slash "/" in the XPath expression always refers to the DocumentElement (<Concepts> here), you can add .DocumentElement to your document object instead. A starting double slash "//xyz" would find any "xyz" node if existant.

    For instance

        oXml.SelectNodes("//Query").Length 
    

    returns the same childNodes number (here: 3) as

        oXml.DocumentElement.SelectNodes("//Query").Length   ' or 
        oXml.SelectSingleNode("//Queries").ChildNodes.Length ' or even       
        oXml.SelectNodes("/*/*/*/Query").Length`.
    

    Code example with reference to XML version 6.0

    Of course you'd have to loop over several xml files, the example only uses one (starting in row 2).

    Just for the case of not well formed xml files I added a detailled error Routine that enables you to identify the presumed error location. Load and LoadXML both return a boolean value (True if loaded correctly, False if not).

    Sub xmlTest()
    
    Dim ws   As Worksheet: Set ws = ThisWorkbook.Sheets(3)
    Dim oXml As MSXML2.DOMDocument60
    Set oXml = New MSXML2.DOMDocument60
    With oXml
        .validateOnParse = True
        .setProperty "SelectionLanguage", "XPath"   ' necessary in version 3.0, possibly redundant here
        .async = False
    
        If Not .Load(ThisWorkbook.Path & "\xml\" & "name.xml") Then
            Dim xPE        As Object    ' Set xPE = CreateObject("MSXML2.IXMLDOMParseError")
            Dim strErrText As String
            Set xPE = .parseError
            With xPE
               strErrText = "Load error " & .ErrorCode & " xml file " & vbCrLf & _
               Replace(.URL, "file:///", "") & vbCrLf & vbCrLf & _
              xPE.reason & _
              "Source Text: " & .srcText & vbCrLf & vbCrLf & _
              "Line No.:    " & .Line & vbCrLf & _
              "Line Pos.: " & .linepos & vbCrLf & _
              "File Pos.:  " & .filepos & vbCrLf & vbCrLf
            End With
            MsgBox strErrText, vbExclamation
            Set xPE = Nothing
            Exit Sub
        End If
    
        ' Debug.Print "|" & oXml.XML & "|"
    
        Dim Queries  As IXMLDOMNodeList, Query As IXMLDOMNode
        Dim Searched As String
        Dim i&, ii&
        i = 2       ' start row
      ' start XPath  
        Searched = "ConceptModel/Queries/Query"                     ' search string
        Set Queries = oXml.DocumentElement.SelectNodes(Searched)    ' XPath
      ' 
        ws.Cells(i, 1) = IIf(Queries.Length = 0, "No items", Queries.Length & " items")
        ii = 1
        For Each Query In Queries
            ii = ii + 1
            ws.Cells(i, ii) = Query.Text
        Next
    
    End With
    
    End Sub
    

    Additional hints

    You also might be interested in an example how to list all child nodes via XMLDOM and to obtain attribute names from XML using VBA.

    I include a further hint due to later comment (thanks to @barrowc )

    "A further issue with using MSXML, v3.0 is that the default selection language is XSLPatterns instead of XPath. Details on some of the differences between MSXML versions are here and the differences between the two selection languages are discussed here."

    In the current MSXML2 version 6.0 XPath 1.0 is fully supported. So it seems XSL Patterns have been implemented by Microsoft in earlier days, basically it can be regarded as a simplified subset of XPath expressions before W3C standardisation of XPath.

    MSXML2 Version 3.0 allows the integration of XPath 1.0 at least by explicit selection language setting:

    oXML.setProperty "SelectionLanguage", "XPath"   ' oXML being the DOMDocument object as used in original post  
    
    0 讨论(0)
提交回复
热议问题