BeautifulSoup getText from between

, not picking up subsequent paragraphs

后端 未结 2 986
南方客
南方客 2020-12-23 17:18

Firstly, I am a complete newbie when it comes to Python. However, I have written a piece of code to look at an RSS feed, open the link and extract the text from the article.

2条回答
  •  天涯浪人
    2020-12-23 17:59

    This works well for specific articles where the text is all wrapped in

    tags. Since the web is an ugly place, it's not always the case.

    Often, websites will have text scattered all over, wrapped in different types of tags (e.g. maybe in a or a

    , or an
  • ).

    To find all text nodes in the DOM, you can use soup.find_all(text=True).

    This is going to return some undesired text, like the contents of

提交回复
热议问题