How do I use xml namespaces with find/findall in lxml?

前端 未结 4 994
星月不相逢
星月不相逢 2020-12-05 02:50

I\'m trying to parse content in an OpenOffice ODS spreadsheet. The ods format is essentially just a zipfile with a number of documents. The content of the spreadsheet is sto

4条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-05 03:10

    Etree won't find namespaced elements if there are no xmlns definitions in the XML file. For instance:

    import lxml.etree as etree
    
    xml_doc = ''
    
    tree = etree.fromstring(xml_doc)
    
    # finds nothing:
    tree.find('.//ns:root', {'ns': 'foo'})
    tree.find('.//{foo}root', {'ns': 'foo'})
    tree.find('.//ns:root')
    tree.find('.//ns:root')
    

    Sometimes that is the data you are given. So, what can you do when there is no namespace?

    My solution: add one.

    import lxml.etree as etree
    
    xml_doc = ''
    xml_doc_with_ns = '%s' % xml_doc
    
    tree = etree.fromstring(xml_doc_with_ns)
    
    # finds what you're looking for:
    tree.find('.//{foo}root')
    

提交回复
热议问题