How do I get the whole content between two xml tags in Python?

后端 未结 5 1608
我寻月下人不归
我寻月下人不归 2020-12-15 09:13

I try to get the whole content between an opening xml tag and it\'s closing counterpart.

Getting the content in straight cases like title below is easy

5条回答
  •  盖世英雄少女心
    2020-12-15 09:26

    Here's something that works for me and your sample:

    from lxml import etree
    doc = etree.XML(
    """
    
      Some testing stuff
      Some text with data in it.
    """
    )
    
    def flatten(seq):
      r = []
      for item in seq:
        if isinstance(item,(str,unicode)):
          r.append(unicode(item))
        elif isinstance(item,(etree._Element,)):
          r.append(etree.tostring(item,with_tail=False))
      return u"".join(r)
    
    print flatten(doc.xpath('/review/text/node()'))
    

    Yields:

    Some text with data in it.
    

    The xpath selects all child nodes of the element and either renders them to unicode directly if they are a string/unicode subclass () or calls etree.tostring on it if it's an Element, with_tail=False avoids duplication of the tail.

    You may need to handle other node types if they are present.

提交回复
热议问题