urllib2 | 易学教程

Unable to load ASP.NET page using Python urllib2

阅读更多关于 Unable to load ASP.NET page using Python urllib2

I am trying to do a POST request to https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/WellDetails/WellDetails.aspx in order to scrape data. Here is my current code: from urllib import urlencode import urllib2 # Configuration uri = 'https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/WellDetails/WellDetails.aspx' headers = { 'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13', 'HTTP_ACCEPT': 'application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5', 'Accept-Charset':

urllib2 opener hangs if run inside a thread

阅读更多关于 urllib2 opener hangs if run inside a thread

I have a code that is running fine (connect to a page , get PHPSESSID) . when i put that code in a function , then made a thread of it : Gdk.threads_enter() threading.Thread(target=self.do_login,args=()).start() Gdk.threads_leave() the code hangs on f = opener.open(req) any ideas why ? when i force close the application , it completes everything and prints everything in the terminal without errors . why does it hang on that particular line in thread only . it does not outside of a thread . Okay, I just repost the comment here so that the question can get solved. As has been mentioned on other

Scrappy response different than browser response

阅读更多关于 Scrappy response different than browser response

I am trying to scrape a this page with scrapy: http://www.barnesandnoble.com/s?dref=4815&sort=SA&startat=7391 and the response which I get is different than what I see in the browser. Browser response has the correct page, while scrapy response is: http://www.barnesandnoble.com/s?dref=4815&sort=SA&startat=1 page. I have tried with urllib2 but still have the same issue. Any help is much appreciated. I don't really understand the issue, but usually a different response for a browser and scrapy is caused by one these: the server analyzes your User-Agent header, and returns a specially crafted

How to use urllib2 to access ftp/http server using proxy with authentification

阅读更多关于 How to use urllib2 to access ftp/http server using proxy with authentification

Update: see the comments for my solution. My python code uses urllib2 to access a FTP server through a proxy with user and password. I use both a urllib2.ProxyHandler and a urllib2.ProxyBasicAuthHandler to implement this by following urllib2 examples : 1 import urllib2 2 proxy_host = 'host.proxy.org:3128' # only host name, no scheme (http/ftp) 3 proxy_handler = urllib2.ProxyHandler({'ftp': proxy_host}) 4 proxy_auth_handler = urllib2.ProxyBasicAuthHandler() 5 proxy_auth_handler.add_password(None, proxy_host, proxy_user, proxy_passwd) 6 opener_thru_proxy = urllib2.build_opener(proxy_handler,

urllib2 HTTPPasswordMgr not working - Credentials not sent error

阅读更多关于 urllib2 HTTPPasswordMgr not working - Credentials not sent error

问题 The following python curl call has the following successful results: >>> import subprocess >>> args = [ 'curl', '-H', 'X-Requested-With: Demo', 'https://username:password@qualysapi.qualys.com/qps/rest/3.0/count/was/webapp' ] >>> xml_output = subprocess.check_output(args).decode('utf-8') % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 138 276 0 276 0 0 190 0 --:--:-- 0:00:01 --:--:-- 315 >>> xml_output u'<?xml version="1.0" encoding="UTF-8"?>

Getting a file from an authenticated site (with python urllib, urllib2)

阅读更多关于 Getting a file from an authenticated site (with python urllib, urllib2)

问题 I'm trying to get a queried-excel file from a site. When I enter the direct link, it will lead to a login page and once I've entered my username and password, it will proceed to download the excel file automatically. I am trying to avoid installing additional module that's not part of the standard python (This script will be running on a "standardize machine" and it won't work if the module is not installed) I've tried the following but I see a "page login" information in the excel file

Do objects created by urllib2.urlopen() represent a constant connection?

阅读更多关于 Do objects created by urllib2.urlopen() represent a constant connection?

问题 In the following code, is the connection to the remote server held open until close() is called or is it recreated every time read() is called? In the following code I do see a new network communication happens every time read() is called, rather than the remote file being buffered as soon as urlopen() is called. import urllib2 handle = urllib2.urlopen('http://download.thinkbroadband.com/5MB.zip') while True: buff = handle.read(64*1024) # Is a new connection to the server created here? if len

Alternative of urllib.urlretrieve in Python 3.5

阅读更多关于 Alternative of urllib.urlretrieve in Python 3.5

问题 I am currently doing a course on machine learning in UDACITY . In there they have written some code in python 2.7 but as i am currently using python 3.5 , i am getting some error . This is the code import urllib url = "https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tgz" urllib.urlretrieve(url, filename="../enron_mail_20150507.tgz") print ("download complete!") I tried urllib.request . import urllib url = "https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tgz" urllib.request(url,

Python: Urllib2 | [Errno 54] Connection reset by peer

阅读更多关于 Python: Urllib2 | [Errno 54] Connection reset by peer

问题 I'm calling a list of urls from the same domain and returning a snip of their html for a few thousand domains but am getting this error about 1,000 rows or so in. Is there anything I can do to avoid this error? Does it make sense to create a wait step after every row? every few hundred rows? Is there a better way to get around this? File "/Users.../ap.py", line 144, in <module> simpleProg() File "/Users.../ap.py", line 21, in simpleProg() File "/Users.../ap.py", line 57, in first_step() File

POST binary data using httplib cause Unicode exceptions

阅读更多关于 POST binary data using httplib cause Unicode exceptions

问题 When i try to send an image with urllib2 the UnicodeDecodeError exception is occured. HTTP Post body: f = open(imagepath, "rb") binary = f.read() mimetype, devnull = mimetypes.guess_type(urllib.pathname2url(imagepath)) body = """Content-Length: {size} Content-Type: {mimetype} {binary} """.format(size=os.path.getsize(imagepath), mimetype=mimetype, binary=binary) request = urllib2.Request(url, body, headers) opener = urllib2.build_opener(urllib2.HTTPSHandler(debuglevel=1)) response = opener