Traversing TEI in Python 3, text comes up empty for some entities

青春壹個敷衍的年華 提交于 2020-01-16 04:59:08

问题


I have a TEI-encoded xml file with entities as follows:

<sp>
    <speaker rend="italic">Sampson.</speaker>
    <ab>
         <lb n="5"/>
         <hi rend="italic">Gregory:</hi>
         <seg type="homograph">A</seg> my word wee'l not carry coales.<lb n="6"/>
    </ab>
</sp>
<sp>
     <speaker rend="italic">Greg.</speaker>
     <ab>No, for then we should be Colliars.
         <lb n="7" rend="rj"/>
     </ab>
</sp>

The full file is very large but can be accessed here: http://ota.ox.ac.uk/desc/5721. I'm attempting to use Python 3 to traverse the xml and get all the text associated with the tag, which is where the dialogue is found.

import xml.etree.ElementTree as etree
tree = etree.parse('romeo_juliet_5721.xml')
doc = tree.getroot()
for i in doc.iter(tag='{http://www.tei-c.org/ns/1.0}ab'):   
        print(i.tag, i.text)
>>> http://www.tei-c.org/ns/1.0}ab 
>>>                  
>>> {http://www.tei-c.org/ns/1.0}ab No, for then we should be Colliars.

The output catches the entities just fine but doesn't recognize "my word wee'l not carry coales" as the text of the first ab. If it's within a different element, I'm not seeing it. I've thought about converting the entire element to a string and getting the element text using regex (or by stripping all xml tags), but I would rather understand what's happening here. Thanks for any help you can provide.


回答1:


That's because in the ElementTree model, the text " my word wee'l not carry coales." is considered tail of <seg> element instead of text of <ab>. To get the text of an element as well as tail of its children, you can try this way :

for i in doc.iter(tag='{http://www.tei-c.org/ns/1.0}ab'): 
    innerText = i.text+''.join((text.tail or '') for text in i.iter()).strip()  
    print(i.tag, innerText)


来源:https://stackoverflow.com/questions/37062825/traversing-tei-in-python-3-text-comes-up-empty-for-some-entities

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!