urllib2

Unable to load ASP.NET page using Python urllib2

别说谁变了你拦得住时间么 提交于 2019-12-02 09:05:47
I am trying to do a POST request to https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/WellDetails/WellDetails.aspx in order to scrape data. Here is my current code: from urllib import urlencode import urllib2 # Configuration uri = 'https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/WellDetails/WellDetails.aspx' headers = { 'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13', 'HTTP_ACCEPT': 'application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5', 'Accept-Charset':

urllib2 opener hangs if run inside a thread

爱⌒轻易说出口 提交于 2019-12-02 08:39:43
I have a code that is running fine (connect to a page , get PHPSESSID) . when i put that code in a function , then made a thread of it : Gdk.threads_enter() threading.Thread(target=self.do_login,args=()).start() Gdk.threads_leave() the code hangs on f = opener.open(req) any ideas why ? when i force close the application , it completes everything and prints everything in the terminal without errors . why does it hang on that particular line in thread only . it does not outside of a thread . Okay, I just repost the comment here so that the question can get solved. As has been mentioned on other

Scrappy response different than browser response

六眼飞鱼酱① 提交于 2019-12-02 08:10:57
I am trying to scrape a this page with scrapy: http://www.barnesandnoble.com/s?dref=4815&sort=SA&startat=7391 and the response which I get is different than what I see in the browser. Browser response has the correct page, while scrapy response is: http://www.barnesandnoble.com/s?dref=4815&sort=SA&startat=1 page. I have tried with urllib2 but still have the same issue. Any help is much appreciated. I don't really understand the issue, but usually a different response for a browser and scrapy is caused by one these: the server analyzes your User-Agent header, and returns a specially crafted

How to use urllib2 to access ftp/http server using proxy with authentification

霸气de小男生 提交于 2019-12-02 08:01:44
Update: see the comments for my solution. My python code uses urllib2 to access a FTP server through a proxy with user and password. I use both a urllib2.ProxyHandler and a urllib2.ProxyBasicAuthHandler to implement this by following urllib2 examples : 1 import urllib2 2 proxy_host = 'host.proxy.org:3128' # only host name, no scheme (http/ftp) 3 proxy_handler = urllib2.ProxyHandler({'ftp': proxy_host}) 4 proxy_auth_handler = urllib2.ProxyBasicAuthHandler() 5 proxy_auth_handler.add_password(None, proxy_host, proxy_user, proxy_passwd) 6 opener_thru_proxy = urllib2.build_opener(proxy_handler,

urllib2 HTTPPasswordMgr not working - Credentials not sent error

混江龙づ霸主 提交于 2019-12-02 05:53:58
问题 The following python curl call has the following successful results: >>> import subprocess >>> args = [ 'curl', '-H', 'X-Requested-With: Demo', 'https://username:password@qualysapi.qualys.com/qps/rest/3.0/count/was/webapp' ] >>> xml_output = subprocess.check_output(args).decode('utf-8') % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 138 276 0 276 0 0 190 0 --:--:-- 0:00:01 --:--:-- 315 >>> xml_output u'<?xml version="1.0" encoding="UTF-8"?>

Getting a file from an authenticated site (with python urllib, urllib2)

99封情书 提交于 2019-12-02 05:06:16
问题 I'm trying to get a queried-excel file from a site. When I enter the direct link, it will lead to a login page and once I've entered my username and password, it will proceed to download the excel file automatically. I am trying to avoid installing additional module that's not part of the standard python (This script will be running on a "standardize machine" and it won't work if the module is not installed) I've tried the following but I see a "page login" information in the excel file

Do objects created by urllib2.urlopen() represent a constant connection?

风格不统一 提交于 2019-12-02 05:04:24
问题 In the following code, is the connection to the remote server held open until close() is called or is it recreated every time read() is called? In the following code I do see a new network communication happens every time read() is called, rather than the remote file being buffered as soon as urlopen() is called. import urllib2 handle = urllib2.urlopen('http://download.thinkbroadband.com/5MB.zip') while True: buff = handle.read(64*1024) # Is a new connection to the server created here? if len

Alternative of urllib.urlretrieve in Python 3.5

随声附和 提交于 2019-12-02 04:54:01
问题 I am currently doing a course on machine learning in UDACITY . In there they have written some code in python 2.7 but as i am currently using python 3.5 , i am getting some error . This is the code import urllib url = "https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tgz" urllib.urlretrieve(url, filename="../enron_mail_20150507.tgz") print ("download complete!") I tried urllib.request . import urllib url = "https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tgz" urllib.request(url,

Python: Urllib2 | [Errno 54] Connection reset by peer

不问归期 提交于 2019-12-02 03:38:15
问题 I'm calling a list of urls from the same domain and returning a snip of their html for a few thousand domains but am getting this error about 1,000 rows or so in. Is there anything I can do to avoid this error? Does it make sense to create a wait step after every row? every few hundred rows? Is there a better way to get around this? File "/Users.../ap.py", line 144, in <module> simpleProg() File "/Users.../ap.py", line 21, in simpleProg() File "/Users.../ap.py", line 57, in first_step() File

POST binary data using httplib cause Unicode exceptions

微笑、不失礼 提交于 2019-12-02 03:15:59
问题 When i try to send an image with urllib2 the UnicodeDecodeError exception is occured. HTTP Post body: f = open(imagepath, "rb") binary = f.read() mimetype, devnull = mimetypes.guess_type(urllib.pathname2url(imagepath)) body = """Content-Length: {size} Content-Type: {mimetype} {binary} """.format(size=os.path.getsize(imagepath), mimetype=mimetype, binary=binary) request = urllib2.Request(url, body, headers) opener = urllib2.build_opener(urllib2.HTTPSHandler(debuglevel=1)) response = opener