How to use lxml and python to pretty print a subtree of an xml file?

前提是你 提交于 2021-02-08 17:55:50


I have the following code using python with lxml to pretty print the file example.xml:

python -c '
from lxml import etree;
from sys import stdout, stdin;

parser=etree.XMLParser(remove_blank_text=True, strip_cdata=False);
tree=etree.parse(stdin, parser)
tree.write(stdout, pretty_print = True)' < example.xml

I'm using lxml because it is important that I preserve the fidelity of the original file, including preserving the CDATA idioms. Here's the file example.xml that I'm using it on:

<projects><project name="helloworld" threads="1" pubsub="auto" heartbeat-interval="1">
<description><![CDATA[This is a sample project]]></description>  <metadata>    <meta id="studioUploadedBy">anonymous</meta>
<meta id="studioUploaded">1550863090439</meta>    <meta id="studioModifiedBy">anonymous</meta>
<meta id="studioModified">1550863175384</meta>    <meta id="studioTags">helloworld</meta>
<meta id="studioVersionNotes">This is just a sample project</meta>    <meta id="layout">{"cq1":{"Source1":{"x":50,"y":-290}}}</meta>
</metadata>  <contqueries>    <contquery name="cq1">      <windows>        <window-source pubsub="true" name="Source1">
<schema>            <fields>              <field name="name" type="string" key="true"/>            </fields>
</schema>        </window-source>      </windows>    </contquery>  </contqueries> </project></projects>

It generates the following output:

  <project name="helloworld" threads="1" pubsub="auto" heartbeat-interval="1">
    <description><![CDATA[This is a sample project]]></description>
      <meta id="studioUploadedBy">anonymous</meta>
      <meta id="studioUploaded">1550863090439</meta>
      <meta id="studioModifiedBy">anonymous</meta>
      <meta id="studioModified">1550863175384</meta>
      <meta id="studioTags">helloworld</meta>
      <meta id="studioVersionNotes">This is just a sample project</meta>
      <meta id="layout">{"cq1":{"Source1":{"x":50,"y":-290}}}</meta>
      <contquery name="cq1">
          <window-source pubsub="true" name="Source1">
                <field name="name" type="string" key="true"/>

This is nearly what I want except that I'd like to get a subtree. I'd like to be able to get just the subtree <project name="helloworld"...> thru </project>. How would I modify the above Python code based on lxml to do that?


You can use tree.find to get the xml element you need extracted. Them convert it to element tree. Then you can issue a write statement on the resulting elementtree (et) in this case.

python -c '
           from lxml import etree;
           from sys import stdout, stdin;
           tree=etree.parse(stdin, parser)
           e = tree.find("project")
           et = etree.ElementTree(e)                                                                                                                                                                             
           et.write(stdout, pretty_print = True)'


We can capture a nested Element using xpath. Element objects do not provide the same .write() capability so we'll need to a different output mechanism.

How about...

python -c '
from lxml import etree;
from sys import stdout, stdin;

parser=etree.XMLParser(remove_blank_text=True, strip_cdata=False);
tree=etree.parse(stdin, parser)
# assuming there will be exactly 1 project
print etree.tostring(project, pretty_print = True)' < example.xml

