Strip HTML from strings in Python

前端 未结 26 2661
难免孤独
难免孤独 2020-11-22 02:50
from mechanize import Browser
br = Browser()
br.open(\'http://somewebpage\')
html = br.response().readlines()
for line in html:
  print line

When p

26条回答
  •  滥情空心
    2020-11-22 03:14

    There's a simple way to this:

    def remove_html_markup(s):
        tag = False
        quote = False
        out = ""
    
        for c in s:
                if c == '<' and not quote:
                    tag = True
                elif c == '>' and not quote:
                    tag = False
                elif (c == '"' or c == "'") and tag:
                    quote = not quote
                elif not tag:
                    out = out + c
    
        return out
    

    The idea is explained here: http://youtu.be/2tu9LTDujbw

    You can see it working here: http://youtu.be/HPkNPcYed9M?t=35s

    PS - If you're interested in the class(about smart debugging with python) I give you a link: http://www.udacity.com/overview/Course/cs259/CourseRev/1. It's free!

    You're welcome! :)

提交回复
热议问题