Remove class attribute from HTML using Python and lxml

夙愿已清 提交于 2019-12-03 23:17:11

I can't test this at the moment but this appears to be the general idea

for tag in node.xpath('//*[@class]'):
    tag.attrib.pop('class')

For lxml elment, the .attrib object contains the dict of attributes, you can just del it as you like.

Below is just a simple example to show how to replace an attribute name in html.

Given html:

<div><img src="http://www.example.com/logo.png"></div>

Code:

from lxml.html import fromstring
from lxml.html import _transform_result

html = "<div><img src=\"http://www.example.com/logo.png\"></div>"
doc = fromstring(html)
for el in doc.iter('img'):
    if "src" in el.attrib:
        el.set('data-src', el.get('src'))
        del el.attrib["src"]
print _transform_result(type(html), doc)

Result:

<div><img data-src="http://www.example.com/logo.png"></div>
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!