lxml etree xmlparser remove unwanted namespace

前端 未结 4 2121
感动是毒
感动是毒 2020-11-29 00:04

I have an xml doc that I am trying to parse using Etree.lxml


  
1&
4条回答
  •  南方客
    南方客 (楼主)
    2020-11-29 00:08

    import io
    import lxml.etree as ET
    
    content='''\
    
      
    1
    some stuff
    ''' dom = ET.parse(io.BytesIO(content))

    You can find namespace-aware nodes using the xpath method:

    body=dom.xpath('//ns:Body',namespaces={'ns':'http://www.example.com/zzz/yyy'})
    print(body)
    # []
    

    If you really want to remove namespaces, you could use an XSL transformation:

    # http://wiki.tei-c.org/index.php/Remove-Namespaces.xsl
    xslt='''
    
    
    
        
          
        
    
    
    
        
          
        
    
    
    
        
          
        
    
    
    '''
    
    xslt_doc=ET.parse(io.BytesIO(xslt))
    transform=ET.XSLT(xslt_doc)
    dom=transform(dom)
    

    Here we see the namespace has been removed:

    print(ET.tostring(dom))
    # 
    #   
    # 1 #
    # # some stuff # #

    So you can now find the Body node this way:

    print(dom.find("Body"))
    # 
    

提交回复
热议问题