Updating an .xml document according to a .csv file

独自空忆成欢 提交于 2021-02-08 03:25:28

问题


I'm quite of a newbie in VBA, and I'm struggling to find a solution to my problem. Basically, what I need to do is editing some nodes in an .xml file according to the content of a .csv document.

In particular, whenever I loop through the XML document (i.e., "C:\Users\xxx\Desktop\ppp.xml") and I stumble upon a particular node (let it be thing), I need to read the text of that node and look for it in the CSV file (i.e., C:\Users\xxx\Desktop\mycopy.csv"). Then edit the text of a different node (let it be qt) in the same XML file. I was thinking about the following rationale:

  1. Since the XML file editing (below I used Microsoft XML, v3.0) needs to be done according to the .csv content.
  2. I firstly converted the CSV into a Excel workbook (.xlsx) (I don't know much of managing CSV files, so that way was more manageable for me).
  3. Then perform a sort of Vlookup version in VBA.

That works fine, if I run separately this part of the code shown below. Since I know some XML in VBA, I have a basic knowledge of how to edit nodes and attributes. However, I struggle to link the XML file to the Excel workbook. I've taken a look to a lot of XML editing examples in VBA, but the editing is performed according to the same XML, without looking for a value in a different file. I'll post a sample of my code, which obviously doesn't work, hoping it's clear enough. Thanks.

Option Explicit
    
Sub editxml()
    
    Dim Obj As DOMDocument  
    Dim xmlpath As String
    Dim loadcheck As Boolean
    Dim Node As IXMLDOMNodeList  
    Dim Nm As IXMLDOMNode 
    Dim thing As Object, q As Object
    
    Dim wb As Workbook         
    Dim ws As Worksheet
    Dim mycsvfile As String 
    Dim i As Integer, numcol As Integer
    Dim line As String
    Dim row As Integer 
    Dim matrix As Variant  
    
    Dim rngSearch As Range, rngLast As Range, rngFound As Range
    Dim strFirstAddress As String
    
    Set Obj = New DOMDocument
    Obj.async = False: Obj.validateOnParse = False
    
    xmlpath = "C:\Users\xxx\Desktop\ppp.xml"
    Obj.SetProperty "SelectionNamespaces", "xmlns:ns0='http://update.DocumentTypes.Schema.ppp.Xml'"
    
    loadcheck = Obj.Load(xmlpath)
    If loadcheck = True Then
        MsgBox "File XML uploaded"
    Else
        MsgBox "File XML not uploaded"
    End If
    
    Set Node = Obj.DocumentElement.SelectNodes("AA/BB/CC/DD")
    
    For Each Nm In Node
        Set thing = Nm.SelectSingleNode("thing")
        Set q = Nm.SelectSingleNode("qt")
        
        If thing.Text = rngFound Then
        q.Text = "do somewhat else"
        End If
    Next
        
    Obj.Save (xmlpath)
    
    Set wb = Workbooks.Add
    wb.SaveAs Filename:="csvtoxlsxfind" & ".xlsx"  
    Set ws = wb.Sheets(1)
    
    With ws
        row = 1
    
        mycsvfile = "C:\Users\xxx\Desktop\mycopy.csv"  
    
        Open mycsvfile For Input As #1
        
        Do Until EOF(1)
            Line Input #1, line    
            matrix = Split(line, ",") 
            
            numcol = UBound(matrix) - LBound(matrix) + 1    
       
            For i = 1 To numcol     
                Cells(row, i) = matrix(i - 1)      
            Next i
            row = row + 1
        
        Loop
        Close #1
        
        'set the search range, i.e where I have to find the value:
        Set rngSearch = .Range("AR:AR")
    
        'specify last cell in range:
        Set rngLast = rngSearch.Cells(rngSearch.Cells.Count)
    
        'Find the "thing" in search range, when it first occurrs (rngFound=1st occurrence).
        Set rngFound = rngSearch.find(What:=thing.Text, After:=rngLast, LookIn:=xlValues, LookAt:=xlWhole, SearchOrder:=xlByRows, SearchDirection:=xlNext, MatchCase:=False, SearchFormat:=False)
    
        'if the "thing" is found in search range:
        If Not rngFound Is Nothing Then
            'saves the address of the first occurrence of the "thing" in the strFirstAddress variable:
            strFirstAddress = rngFound.Address
        
            Do
            'Find next occurrence of the "thing". 
            
            MsgBox rngFound.Address & " " & rngFound.Offset(0, -29).Value * rngFound.Offset(0, -6)
            
            Set rngFound = rngSearch.FindNext(rngFound)
            rngFound.Font.Color = vbRed
            rngFound.Offset(0, -40).Font.Color = vbRed
            
            Loop Until rngFound.Address = strFirstAddress
            
        Else
            MsgBox "thing not found"
        End If
    End With
    
End Sub 

I'm well aware that the part of the code that doesn't make sense is the following:

    For Each Nm In Node
        Set thing = Nm.SelectSingleNode("thing")
        Set q = Nm.SelectSingleNode("qt")
        
        If thing.Text = rngFound Then
        q.Text = "do somewhat else"
        End If
    Next

Since I haven't defined rngFound yet (this would be the result of my Vlookup search).

Does the logic I followed make some sense, or the code needs to be rewritten from scratch? Is is possible to avoid the Excel .xlsx conversion of the CSV file, and so doing the search directly in the CSV?

Update (answering to Tim Williams' question) In the following part of the code, I need to update the text of every node "thing" with the product of two cells in the .csv file, something like

 If thing.Text = rngFound Then
     q.Text = ws.Range("A:A").value*ws.Range("K:K").value
 End If

Would it be possible to apply something like offset function to the elements in the collection object? I know that offset can only be applied to a range, so I think a new function needs to be created for that purpose, am I right?


回答1:


Untested but should be about right I think. Since "find all matching cells in a range" is a pretty common task I like to use a standalone function for that, instead of cluttering the main code with that logic.

Sub editxml()
    
    Dim Obj As MSXML2.DOMDocument60
    Dim xmlpath As String
    Dim Node As IXMLDOMNodeList
    Dim Nm As IXMLDOMNode
    Dim thing As Object, q As Object
    Dim wb As Workbook, ws As Worksheet
    Dim matches As Collection
    
    Set Obj = New DOMDocument60
    Obj.async = False
    Obj.validateOnParse = False
    
    xmlpath = "C:\Users\xxx\Desktop\ppp.xml"
    Obj.SetProperty "SelectionNamespaces", "xmlns:ns0='http://update.DocumentTypes.Schema.ppp.Xml'"
    
    If Obj.Load(xmlpath) = True Then
        MsgBox "File XML uploaded"
    Else
        MsgBox "File XML not uploaded"
        Exit Sub
    End If
    
    'open the CSV file
    Set wb = Workbooks.Open("C:\Users\xxx\Desktop\mycopy.csv")
    Set ws = wb.Worksheets(1)
    
    Set Node = Obj.DocumentElement.SelectNodes("AA/BB/CC/DD")
    
    For Each Nm In Node
        Set thing = Nm.SelectSingleNode("thing")
        Set q = Nm.SelectSingleNode("qt")
        
        'moved the Find logic to a standalone function
        Set matches = FindAll(ws.Range("AR:AR"), thing.Text)
        
        'did we get any matches in the range?
        If matches.Count > 0 Then
            'It's not clear what should go here - are you replacing
            ' with some other text from the CSV, or just a fixed value?
            q.Text = "do somewhat else"
            
            'you can apply formatting to the found cells here...
        End If
    Next
        
    Obj.Save xmlpath
    
End Sub

'find all matching cells in a range and return them in a Collection
Public Function FindAll(rng As Range, val As String) As Collection
    Dim rv As New Collection, f As Range, addr As String
    Set f = rng.Find(what:=val, after:=rng.Cells(rng.Cells.Count), _
        LookIn:=xlValues, LookAt:=xlWhole, SearchOrder:=xlByRows, _
        SearchDirection:=xlNext, MatchCase:=False)
    If Not f Is Nothing Then addr = f.Address() 'store first cell found
    Do Until f Is Nothing
        rv.Add f
        Set f = rng.FindNext(after:=f)
        If f.Address() = addr Then Exit Do 'exit if we've looped back to first cell
    Loop
    Set FindAll = rv
End Function



回答2:


Anytime you need to edit an XML, consider XSLT, the special purpose programming language (that can be layered in VBA) designed to transform XML files. Specifically for your case, you can iterate through a CSV import to pass values into a paramterized XSLT.

To demonstrate (until OP includes sample data) below uses an XML of the top three vba and xslt StackOverflow users:

Data

XML (Notice empty <total_rep> nodes to be updated)

<?xml version="1.0"?>
<stackoverflow>
  <group lang="vba">
    <topusers>
      <user>Siddharth Rout</user>
      <link>https://stackoverflow.com/users/1140579/siddharth-rout</link>
      <location>Mumbai, India</location>
      <total_rep></total_rep>
      <tag1>excel</tag1>
      <tag2>vba</tag2>
      <tag3>excel-formula</tag3>
    </topusers>
    <topusers>
      <user>Scott Craner</user>
      <link>https://stackoverflow.com/users/4851590/scott-craner</link>
      <location>Flyover Country</location>
      <total_rep></total_rep>
      <tag1>excel</tag1>
      <tag2>vba</tag2>
      <tag3>excel-formula</tag3>
    </topusers>
    <topusers>
      <user>Tim Williams</user>
      <link>https://stackoverflow.com/users/478884/tim-williams</link>
      <location>San Francisco, CA, United States</location>
      <total_rep></total_rep>
      <tag1>vba</tag1>
      <tag2>excel</tag2>
      <tag3>arrays</tag3>
    </topusers>
  </group>
  <group lang="xslt">
    <topusers>
      <user>Dimitre Novatchev</user>
      <link>https://stackoverflow.com/users/36305/dimitre-novatchev</link>
      <location>United States</location>
      <total_rep></total_rep>
      <tag1>xslt</tag1>
      <tag2>xml</tag2>
      <tag3>xpath</tag3>
    </topusers>
    <topusers>
      <user>Martin Honnen</user>
      <link>https://stackoverflow.com/users/252228/martin-honnen</link>
      <location>Germany</location>
      <total_rep></total_rep>
      <tag1>xslt</tag1>
      <tag2>xml</tag2>
      <tag3>xpath</tag3>
    </topusers>
    <topusers>
      <user>Michael Kay</user>
      <link>https://stackoverflow.com/users/415448/michael-kay</link>
      <location>Reading, United Kingdom</location>
      <total_rep></total_rep>
      <tag1>xml</tag1>
      <tag2>xslt</tag2>
      <tag3>xpath</tag3>
    </topusers>
  </group>
</stackoverflow>

CSV

user total_rep
Siddharth Rout 134,062
Scott Craner 123,313
Tim Williams 116,760
Dimitre Novatchev 227,632
Martin Honnen 134,713
Michael Kay 135,177

XSLT

(Save below as .xsl file, a special .xml file, to be loaded in VBA)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" encoding="UTF-8"/>
  <xsl:strip-space elements="*"/>

  <!-- INITIALIZE PARAMS -->
  <xsl:param name="user_param"/>
  <xsl:param name="total_rep_param"/>
  
  <!-- IDENTITY TRANSFORM -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  
  <!-- CONDITIONALLY UPDATE TEXT BY DIFFERENT NODE MATCH -->
  <xsl:template match="topusers">
    <xsl:copy>
        <xsl:apply-templates select="user|link|location"/>
        <total_rep>
            <xsl:choose>
                <xsl:when test="user = $user_param">
                    <xsl:value-of select="$total_rep_param"/>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:value-of select="total_rep"/>
                </xsl:otherwise>
            </xsl:choose>
        </total_rep>
        <xsl:apply-templates select="tag1|tag2|tag3"/>
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

VBA

(Two subroutines to load CSV and transform XML, parameter names to match above in XSLT)

Sub LoadCSV()
    Dim csv_file As String
    
    csv_file = "C:\Path\To\File.csv"
    
    With ThisWorkbook.Worksheets("Sheet1")
        .Columns("A:D").EntireColumn.Delete
        
        With .QueryTables.Add(Connection:="TEXT;" & csv_file, _
                          Destination:=.Range("A1"))
               .TextFileParseType = xlDelimited
               .TextFileCommaDelimiter = True
               .Refresh
        End With
    
       .QueryTables(1).SaveData = False
       .QueryTables.Item(1).Delete
    End With
End Sub
Sub TransformXML()
    ' REFERENCE Microsoct XML, v6.0
    Dim xmlDoc As MSXML2.DOMDocument60
    Dim xslDoc As MSXML2.FreeThreadedDOMDocument60
    Dim xslTemp As MSXML2.XSLTemplate60
    Dim xslProc As Object
    
    Dim i As Long, lastrow As Long
    Dim param1 As Variant, param2 As Variant

    Call LoadCSV                                        ' LOAD CSV FILE
    
    Set xmlDoc = New MSXML2.DOMDocument60               ' LOAD XML FILE
    xmlDoc.Load "C:\Path\To\Input.xml"

    Set xslDoc = New MSXML2.DOMDocument60               ' LOAD XSL SCRIPT
    xslDoc.Load "C:\Path\To\Script.xsl"
    
    ' INITIALIZE NEEDED OBJECTS
    Set xslTemp = New MSXML2.XSLTemplate60
    Set xslTemp.stylesheet = xslDoc
    Set xslProc = xslTemp.createProcessor()
    
    With ThisWorkbook.Worksheets("Sheet1")
        lastrow = .Cells(.Rows.Count, "A").End(xlUp).Row

        ' ITERATE THROUGH CSV DATA
        For i = 2 To lastrow
            param1 = .Range("A" & i).Value
            param2 = .Range("B" & i).Value
            
            With xslProc
               .input = xmlDoc
               .addParameter "user_param", param1      ' ADD PARAMETER(S)
               .addParameter "total_rep_param", param2

               .transform                              ' TRANSFORM XML
                xmlDoc.LoadXML .output                 ' LOAD RESULT TREE
            End With
        Next i
    End With

    xmlDoc.Save "C:\Path\To\Output.xml"                ' SAVE OUTPUT TO FILE
       
    Set xmlDoc = Nothing: Set xslDoc = Nothing
    Set xslTemp = Nothing: Set xslProc = Nothing
End Sub

Output

(Notice <total_rep> populated from CSV)

<?xml version="1.0"?>
<stackoverflow>
    <group lang="vba">
        <topusers>
            <user>Siddharth Rout</user>
            <link>https://stackoverflow.com/users/1140579/siddharth-rout</link>
            <location>Mumbai, India</location>
            <total_rep>134062</total_rep>
            <tag1>excel</tag1>
            <tag2>vba</tag2>
            <tag3>excel-formula</tag3>
        </topusers>
        <topusers>
            <user>Scott Craner</user>
            <link>https://stackoverflow.com/users/4851590/scott-craner</link>
            <location>Flyover Country</location>
            <total_rep>123313</total_rep>
            <tag1>excel</tag1>
            <tag2>vba</tag2>
            <tag3>excel-formula</tag3>
        </topusers>
        <topusers>
            <user>Tim Williams</user>
            <link>https://stackoverflow.com/users/478884/tim-williams</link>
            <location>San Francisco, CA, United States</location>
            <total_rep>116760</total_rep>
            <tag1>vba</tag1>
            <tag2>excel</tag2>
            <tag3>arrays</tag3>
        </topusers>
    </group>
    <group lang="xslt">
        <topusers>
            <user>Dimitre Novatchev</user>
            <link>https://stackoverflow.com/users/36305/dimitre-novatchev</link>
            <location>United States</location>
            <total_rep>227632</total_rep>
            <tag1>xslt</tag1>
            <tag2>xml</tag2>
            <tag3>xpath</tag3>
        </topusers>
        <topusers>
            <user>Martin Honnen</user>
            <link>https://stackoverflow.com/users/252228/martin-honnen</link>
            <location>Germany</location>
            <total_rep>134713</total_rep>
            <tag1>xslt</tag1>
            <tag2>xml</tag2>
            <tag3>xpath</tag3>
        </topusers>
        <topusers>
            <user>Michael Kay</user>
            <link>https://stackoverflow.com/users/415448/michael-kay</link>
            <location>Reading, United Kingdom</location>
            <total_rep>135177</total_rep>
            <tag1>xml</tag1>
            <tag2>xslt</tag2>
            <tag3>xpath</tag3>
        </topusers>
    </group>
</stackoverflow>


来源:https://stackoverflow.com/questions/65703113/updating-an-xml-document-according-to-a-csv-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!