Can someone explain why the first call to root.cssselect()
works, while the second fails?
from lxml.html import fromstring
from lxml import etree
html='<html><a href="http://example.com">example</a></html'
root = fromstring(html)
print 'via fromstring', repr(root) # via fromstring <Element html at 0x...>
print root.cssselect("a")
root2 = etree.HTML(html)
print 'via etree.HTML()', repr(root2) # via etree.HTML() <Element html at 0x...>
root2.cssselect("a") # --> Exception
I get:
Traceback (most recent call last):
File "/home/foo_eins_d/src/foo.py", line 11, in <module>
root2.cssselect("a")
AttributeError: 'lxml.etree._Element' object has no attribute 'cssselect'
Version: lxml==3.4.4
The difference is in the type of element. Example -
In [12]: root = etree.HTML(html)
In [13]: root = fromstring(html)
In [14]: root2 = etree.HTML(html)
In [15]: type(root)
Out[15]: lxml.html.HtmlElement
In [16]: type(root2)
Out[16]: lxml.etree._Element
lxml.html.HTMLElement
has the method cssselect()
. Also, HTMLElement
is a subclass of etree._Element
.
But the lxml.etree._Element
does not have that method.
If you want to parse html, you should use lxml.html
.
来源:https://stackoverflow.com/questions/32264533/lxml-cssselect-attributeerror-lxml-etree-element-object-has-no-attribute