When parsing html why do I need item.text sometimes and item.text_content() others

后端未结

关注

 2  1913

花落未央 2021-01-17 19:23

Still learning lxml. I discovered that sometimes I cannot get to the text of an item from a tree using item.text. If I use item.text_content() I am good to go. I am not s

2条回答

[愿得一人] (楼主)

2021-01-17 19:52
Accordng to the docs the text_content method:

Returns the text content of the element, including the text content of its children, with no markup.

So for example,
```
import lxml.html as lh
data = """blah"""
doc = lh.fromstring(data)
print(doc)
# 
```
doc is the Element a. The a tag has no text immediately following it (between the and the . So doc.text is None:
print(doc.text) # None

but there is text after the c tag, so doc.text_content() is not None:

print(doc.text_content()) # blah

PS. There is a clear description of the meaning of the text attribute here. Although it is part of the docs for lxml.etree.Element, I think the meaning of the text and tail attributes applies equally well to lxml.html.Element objects.
0 讨论(0)

查看其它2个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复