Basically, I want to use BeautifulSoup to grab strictly the visible text on a webpage. For instance, this webpage is my test case. And I mainly want to just get the
While, i would completely suggest using beautiful-soup in general, if anyone is looking to display the visible parts of a malformed html (e.g. where you have just a segment or line of a web-page) for whatever-reason, the the following will remove content between <
and >
tags:
import re ## only use with malformed html - this is not efficient
def display_visible_html_using_re(text):
return(re.sub("(\<.*?\>)", "",text))