发表新帖

发表新帖

How can i grab CData out of BeautifulSoup

后端未结

关注

 5  1202

春和景丽 2020-12-03 12:16

I have a website that I\'m scraping that has a similar structure the following. I\'d like to be able to grab the info out of the CData block.

I\'m using BeautifulSo

5条回答

广开言路 (楼主)

2020-12-03 13:11
BeautifulSoup sees CData as a special case (subclass) of "navigable strings". So for example:
```
import BeautifulSoup

txt = '''We have
       
       and more.
       '''

soup = BeautifulSoup.BeautifulSoup(txt)
for cd in soup.findAll(text=True):
  if isinstance(cd, BeautifulSoup.CData):
    print 'CData contents: %r' % cd
```
In your case of course you could look in the subtree starting at the div with the 'main-contents' ID, rather than all over the document tree.
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题