lxml: cssselect(): AttributeError: 'lxml.etree._Element' object has no attribute 'cssselect'

匿名 (未验证) 提交于 2019-12-03 10:09:14

问题:

Can someone explain why the first call to root.cssselect() works, while the second fails?

from lxml.html import fromstring from lxml import etree  html='<html><a href="http://example.com">example</a></html' root = fromstring(html) print 'via fromstring', repr(root) # via fromstring <Element html at 0x...> print root.cssselect("a")  root2 = etree.HTML(html) print 'via etree.HTML()', repr(root2) # via etree.HTML() <Element html at 0x...> root2.cssselect("a") # --> Exception 

I get:

Traceback (most recent call last):   File "/home/foo_eins_d/src/foo.py", line 11, in <module>     root2.cssselect("a") AttributeError: 'lxml.etree._Element' object has no attribute 'cssselect' 

Version: lxml==3.4.4

回答1:

The difference is in the type of element. Example -

In [12]: root = etree.HTML(html)  In [13]: root = fromstring(html)  In [14]: root2 = etree.HTML(html)  In [15]: type(root) Out[15]: lxml.html.HtmlElement  In [16]: type(root2) Out[16]: lxml.etree._Element 

lxml.html.HTMLElement has the method cssselect() . Also, HTMLElement is a subclass of etree._Element .

But the lxml.etree._Element does not have that method.

If you want to parse html, you should use lxml.html.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!