Python urllib2.open Connection reset by peer error

依然范特西╮ 提交于 2019-12-20 02:27:20

问题


I'm trying to scrape a page using python

The problem is, I keep getting Errno54 Connection reset by peer.

The error comes when I run this code -

urllib2.urlopen("http://www.bkstr.com/webapp/wcs/stores/servlet/CourseMaterialsResultsView?catalogId=10001&categoryId=9604&storeId=10161&langId=-1&programId=562&termId=100020629&divisionDisplayName=Stanford&departmentDisplayName=ILAC&courseDisplayName=126&sectionDisplayName=01&demoKey=d&purpose=browse")

this happens for all the urls on this pag- what is the issue?


回答1:


$> telnet www.bkstr.com 80
Trying 64.37.224.85...
Connected to www.bkstr.com.
Escape character is '^]'.
GET /webapp/wcs/stores/servlet/CourseMaterialsResultsView?catalogId=10001&categoryId=9604&storeId=10161&langId=-1&programId=562&termId=100020629&divisionDisplayName=Stanford&departmentDisplayName=ILAC&courseDisplayName=126&sectionDisplayName=01&demoKey=d&purpose=browse HTTP/1.0

Connection closed by foreign host.

You're not going to have any joy fetching that URL from python, or anywhere else. If it works in your browser then there must be something else going on, like cookies or authentication or some such. Or, possibly, the server's broken or they've changed their configuration.

Try opening it in a browser that you've never accessed that site in before to check. Then log in and try it again.

Edit: It was cookies after all:

import cookielib, urllib2

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
#Need to set a cookie
opener.open("http://www.bkstr.com/")
#Now open the page we want
data = opener.open("http://www.bkstr.com/webapp/wcs/stores/servlet/CourseMaterialsResultsView?catalogId=10001&categoryId=9604&storeId=10161&langId=-1&programId=562&termId=100020629&divisionDisplayName=Stanford&departmentDisplayName=ILAC&courseDisplayName=126&sectionDisplayName=01&demoKey=d&purpose=browse").read()

The output looks ok, but you'll have to check that it does what you want :)




回答2:


I came across a similar error just recently. The connection was dropping out and being reset. I tried cookiejars, extended delays and different headers/useragents, but nothing worked. In the end the fix was simple. I went from urllib2 to requests. The old;

import urllib2
opener = urllib2.build_opener()
buf = opener.open(url).read()

The new;

import requests
buf = requests.get(url).text

After that everything worked perfectly.



来源:https://stackoverflow.com/questions/7377494/python-urllib2-open-connection-reset-by-peer-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!