python,not getting full response

蹲街弑〆低调 提交于 2019-11-29 15:18:04

You might have to call read multiple times, as long as it does not return an empty string indicating EOF:

def get_page(url):
    """ loads a webpage into a string """
    src = ''

    req = urllib2.Request(url)

    try:
        response = urllib2.urlopen(req)
        chunk = True
        while chunk:
            chunk = response.read(1024)
            src += chunk
        response.close()
    except IOError:
        print 'can\'t open',url 
        return src

    return src

I had the same problem, I though it was urllib but it was bs4.

Instead of use

BeautifulSoup(src)

or

soup = bs4.BeautifulSoup(html, 'html.parser')

try use

soup = bs4.BeautifulSoup(html, 'html5lib')
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!