Merge multiple XML files from command line

后端 未结 3 2030
粉色の甜心
粉色の甜心 2020-12-09 05:14

I have several xml files. They all have the same structure, but were splitted due to file size. So, let\'s say I have A.xml, B.xml, C.xml

相关标签:
3条回答
  • 2020-12-09 05:18

    Low-tech simple answer:

    echo '<products>' > combined.xml
    grep -vh '</\?products>\|<?xml' *.xml >> combined.xml
    echo '</products>' >> combined.xml
    

    Limitations:

    • The opening and closing tags need to be on their own line.
    • The files need to all have the same outer tags.
    • The outer tags must not have attributes.
    • The files must not have inner tags that match the outer tags.
    • Any current contents of combined.xml will be wiped out instead of getting included.

    Each of these limitations can be worked around, but not all of them easily.

    0 讨论(0)
  • 2020-12-09 05:19

    xml_grep

    http://search.cpan.org/dist/XML-Twig/tools/xml_grep/xml_grep

    xml_grep --pretty_print indented --wrap products --descr '' --cond "product" *.xml > combined.xml

    • --wrap : encloses/wraps the the xml result with the given tag. (here: products)
    • --cond : the xml subtree to grep (here: product)
    0 讨论(0)
  • 2020-12-09 05:32

    High-tech answer:

    Save this Python script as xmlcombine.py:

    #!/usr/bin/env python
    import sys
    from xml.etree import ElementTree
    
    def run(files):
        first = None
        for filename in files:
            data = ElementTree.parse(filename).getroot()
            if first is None:
                first = data
            else:
                first.extend(data)
        if first is not None:
            print ElementTree.tostring(first)
    
    if __name__ == "__main__":
        run(sys.argv[1:])
    

    To combine files, run:

    python xmlcombine.py ?.xml > combined.xml
    

    For further enhancement, consider using:

    • chmod +x xmlcombine.py: Allows you to omit python in the command line

    • xmlcombine.py !(combined).xml > combined.xml: Collects all XML files except the output, but requires bash's extglob option

    • xmlcombine.py *.xml | sponge combined.xml: Collects everything in combined.xml as well, but requires the sponge program

    • import lxml.etree as ElementTree: Uses a potentially faster XML parser

    0 讨论(0)
提交回复
热议问题