How can i grab CData out of BeautifulSoup

后端 未结 5 1202
春和景丽
春和景丽 2020-12-03 12:16

I have a website that I\'m scraping that has a similar structure the following. I\'d like to be able to grab the info out of the CData block.

I\'m using BeautifulSo

5条回答
  •  广开言路
    2020-12-03 13:11

    BeautifulSoup sees CData as a special case (subclass) of "navigable strings". So for example:

    import BeautifulSoup
    
    txt = '''We have
           
           and more.
           '''
    
    soup = BeautifulSoup.BeautifulSoup(txt)
    for cd in soup.findAll(text=True):
      if isinstance(cd, BeautifulSoup.CData):
        print 'CData contents: %r' % cd
    

    In your case of course you could look in the subtree starting at the div with the 'main-contents' ID, rather than all over the document tree.

提交回复
热议问题