I have several xml files. They all have the same structure, but were splitted due to file size. So, let\'s say I have A.xml
, B.xml
, C.xml
Low-tech simple answer:
echo '<products>' > combined.xml
grep -vh '</\?products>\|<?xml' *.xml >> combined.xml
echo '</products>' >> combined.xml
Limitations:
combined.xml
will be wiped out instead of getting included.Each of these limitations can be worked around, but not all of them easily.
http://search.cpan.org/dist/XML-Twig/tools/xml_grep/xml_grep
xml_grep --pretty_print indented --wrap products --descr '' --cond "product" *.xml > combined.xml
products
)product
)High-tech answer:
Save this Python script as xmlcombine.py:
#!/usr/bin/env python
import sys
from xml.etree import ElementTree
def run(files):
first = None
for filename in files:
data = ElementTree.parse(filename).getroot()
if first is None:
first = data
else:
first.extend(data)
if first is not None:
print ElementTree.tostring(first)
if __name__ == "__main__":
run(sys.argv[1:])
To combine files, run:
python xmlcombine.py ?.xml > combined.xml
For further enhancement, consider using:
chmod +x xmlcombine.py
:
Allows you to omit python
in the command line
xmlcombine.py !(combined).xml > combined.xml
:
Collects all XML files except the output, but requires bash's extglob
option
xmlcombine.py *.xml | sponge combined.xml
:
Collects everything in combined.xml
as well, but requires the sponge
program
import lxml.etree as ElementTree
:
Uses a potentially faster XML parser