Strip HTML from strings in Python

前端 未结 26 2653
难免孤独
难免孤独 2020-11-22 02:50
from mechanize import Browser
br = Browser()
br.open(\'http://somewebpage\')
html = br.response().readlines()
for line in html:
  print line

When p

26条回答
  •  萌比男神i
    2020-11-22 03:10

    If you want to strip all HTML tags the easiest way I found is using BeautifulSoup:

    from bs4 import BeautifulSoup  # Or from BeautifulSoup import BeautifulSoup
    
    def stripHtmlTags(htmlTxt):
        if htmlTxt is None:
                return None
            else:
                return ''.join(BeautifulSoup(htmlTxt).findAll(text=True)) 
    

    I tried the code of the accepted answer but I was getting "RuntimeError: maximum recursion depth exceeded", which didn't happen with the above block of code.

提交回复
热议问题