How do I ignore tags while getting the .string of a Beautiful Soup element?

后端 未结 2 1696
忘掉有多难
忘掉有多难 2020-12-20 00:50

I\'m working with HTML elements that have child tags, which I want to \"ignore\" or remove, so that the text is still there. Just now, if I try to .string any e

相关标签:
2条回答
  • 2020-12-20 01:33
    for child in soup.find(id='main'):
        if isinstance(child, bs4.Tag):
            print child.text
    

    And, you'll get:

    This is a paragraph.
    This is a paragraph with a tag.
    This is another paragraph.
    
    0 讨论(0)
  • 2020-12-20 01:37

    Use the .strings iterable instead. Use ''.join() to pull in all strings and join them together:

    print ''.join(main.strings)
    

    Iterating over .strings yields each and every contained string, directly or in child tags.

    Demo:

    >>> print ''.join(main.strings)
    
    This is a paragraph. 
    This is a paragraph with a tag. 
    This is another paragraph. 
    
    0 讨论(0)
提交回复
热议问题