In lxml, how do I remove a tag but retain all contents?

前端 未结 2 935
故里飘歌
故里飘歌 2020-12-05 04:54

The problem is this: I have an XML fragment like so:

text1 inner1 text2 inner2 text3         


        
2条回答
  •  谎友^
    谎友^ (楼主)
    2020-12-05 05:48

    Use Cleaner function of lxml to remove tags from html content. Below is an example to do what you want. For an HTML document, Cleaner is a better general solution to the problem than using strip_elements, because in cases like this you want to strip out more than just the tag; you also want to get rid of things like onclick=function() attributes on other tags.

    import lxml
    from lxml.html.clean import Cleaner
    cleaner = Cleaner()
    cleaner.remove_tags = ['p']
    remove_tags:
    

    A list of tags to remove. Only the tags will be removed, their content will get pulled up into the parent tag.

提交回复
热议问题