Finding the line number of the element's ending tag in lxml

倾然丶 夕夏残阳落幕 提交于 2021-02-09 07:13:32

问题


While parsing an XML document with lxml I want to find the starting and ending line numbers of a particular tag. I am able to find the starting tag's position by using the sourceline property on lxml.etree.Element, however I am struggling at finding the closing tag's line number.

A trivial example of my attempt:

import lxml.etree as ET

xml_sample = b'''<?xml version="1.0" encoding="utf-8"?>
<collection>
    <item>
        <value>foo</value>
    </item>
    <item>
        <value>
            bar
        </value>
    </item>
</collection>'''

for el in ET.fromstring(xml_sample).getroottree().findall('//value'):
    print('Found value "{el.text}" starting on line {el.sourceline} '
          'and ending on line ???.'.format(el=el))

Is it possible to get the closing tag line numbers of the value elements in the above example?


回答1:


With xml.etree.ElementTree.tostring() trick:

...
root = ET.fromstring(xml_sample)
for el in root.findall('.//value'):
    endline_num = el.sourceline + (len(ET.tostring(el).strip().split()) - 1)
    print('Found value "{el.text}" starting on line {el.sourceline} '
          'and ending on line {end_num}.'.format(el=el, end_num=endline_num))

The output:

Found value "foo" starting on line 4 and ending on line 4.
Found value "
            bar
        " starting on line 7 and ending on line 9.


来源:https://stackoverflow.com/questions/47902528/finding-the-line-number-of-the-elements-ending-tag-in-lxml

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!