Get text from mixed element xml tags with ElementTree

前端未结

关注

 1  685

不要未来只要你来 2021-01-29 08:09

I\'m using ElementTree to parse an XML document that I have. I am getting the text from the u tags. Some of them have mixed content that I need to filter out or kee

1条回答

没有蜡笔的小新 (楼主)

2021-01-29 09:07

The lost text bits, "¿Sí?" and "A mí no me suena.", are available as the tail property of each element (the text following the element's end tag).

Here is a way to get the wanted output (tested with Python 2.7).

Assume that vocal.xml looks like this:


  
    
      eh
    ¿Sí? 
  

  Pues... 
     
       laugh
     A mí no me suena.

Code:

from xml.etree import ElementTree as ET

root = ET.parse("vocal.xml") 

for u in root.findall(".//u"):
    v = u.find("vocal")

    if v.get("type") == "filler":
        frags = [u.text, v.findtext("desc"), v.tail]
    else:
        frags = [u.text, v.tail]

    print " ".join(t.encode("utf-8").strip() for t in frags).strip()

Output:

eh ¿Sí?
Pues... A mí no me suena.

0 讨论(0)