BeautifulSoup Grab Visible Webpage Text

前端 未结 10 793
北恋
北恋 2020-11-22 07:35

Basically, I want to use BeautifulSoup to grab strictly the visible text on a webpage. For instance, this webpage is my test case. And I mainly want to just get the

10条回答
  •  天涯浪人
    2020-11-22 08:00

    While, i would completely suggest using beautiful-soup in general, if anyone is looking to display the visible parts of a malformed html (e.g. where you have just a segment or line of a web-page) for whatever-reason, the the following will remove content between < and > tags:

    import re   ## only use with malformed html - this is not efficient
    def display_visible_html_using_re(text):             
        return(re.sub("(\<.*?\>)", "",text))
    

提交回复
热议问题