Beautifulsoup, maximum recursion depth reached

后端 未结 3 1099
予麋鹿
予麋鹿 2021-01-18 07:21

This is a beautifulsoup procedure that grabs content within all

html tags. After grabbing content from some web pages, I get an error th

3条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-18 07:46

    I had the same problem. If you have nested tags with a depth of about 480 levels, and you want to convert this tag to string/unicode, you get the RuntimeError maximum recursion depth reached. Every level needs two nested method calls and soon you hit the default of 1000 nested python calls. You can raise this level, or you can use this helper. It extracts all text from the html and displays it in a pre-environment:

    def beautiful_soup_tag_to_unicode(tag):
        try:
            return unicode(tag)
        except RuntimeError as e:
            if not str(e).startswith('maximum recursion'):
                raise
            # If you have more than 480 level of nested tags you can hit the maximum recursion level
            out=[]
            for mystring in tag.findAll(text=True):
                mystring=mystring.strip()
                if not mystring:
                    continue
                out.append(mystring)
            return u'
    %s
    ' % '\n'.join(out)

提交回复
热议问题