urllib2 | 易学教程

Multiprocessing useless with urllib2?

阅读更多关于 Multiprocessing useless with urllib2?

I recently tried to speed up a little tool (which uses urllib2 to send a request to the (unofficial)twitter-button-count-url (> 2000 urls) and parses it´s results) with the multiprocessing module (and it´s worker pools). I read several discussion here about multithreading (which slowed the whole thing down compared to a standard, non-threaded version) and multiprocessing, but i could´t find an answer to a (probably very simple) question: Can you speed up url-calls with multiprocessing or ain´t the bottleneck something like the network-adapter? I don´t see which part of, for example, the

Python: Clicking a button with urllib or urllib2

阅读更多关于 Python: Clicking a button with urllib or urllib2

问题 I want to click a button with python, the info for the form is automatically filled by the webpage. the HTML code for sending a request to the button is: INPUT type="submit" value="Place a Bid"> How would I go about doing this? Is it possible to click the button with just urllib or urllib2? Or will I need to use something like mechanize or twill? 回答1: Use the form target and send any input as post data like this: <form target="http://mysite.com/blah.php" method="GET"> ...... ...... ......

Fetch data of variables inside script tag in Python or Content added from js

阅读更多关于 Fetch data of variables inside script tag in Python or Content added from js

问题 I want to fetch data from another url for which I am using urllib and Beautiful Soup , My data is inside table tag (which I have figure out using Firefox console). But when I tried to fetch table using his id the result is None , Then I guess this table must be dynamically added via some js code. I have tried all both parsers 'lxml', 'html5lib' but still I can't get that table data. I have also tried one more thing : web = urllib.urlopen("my url") html = web.read() soup = BeautifulSoup(html,

Scrape a web page that requires they give you a session cookie first

阅读更多关于 Scrape a web page that requires they give you a session cookie first

问题 I'm trying to scrape an excel file from a government "muster roll" database. However, the URL I have to access this excel file: http://nrega.ap.gov.in/Nregs/FrontServlet?requestType=HouseholdInf_engRH&hhid=192420317026010002&actionVal=musterrolls&type=Normal requires that I have a session cookie from the government site attached to the request. How could I grab the session cookie with an initial request to the landing page (when they give you the session cookie) and then use it to hit the URL

How to get the URL of a redirect with Python

阅读更多关于 How to get the URL of a redirect with Python

In Python, I'm using urllib2 to open a url. This url redirects to another url, which redirects to yet another url. I wish to print out the url after each redirect. For example -> = redirects to A -> B -> C -> D I want to print the URL of B, C and D (A is already known because it's the start URL). Wooble Probably the best way is to subclass urllib2.HTTPRedirectHandler . Dive Into Python's chapter on redirects may be helpful. You can easily get D by just asking for the current URL. req = urllib2.Request(starturl, datagen, headers) res = urllib2.urlopen(req) finalurl = res.geturl() To deal with

Python: download files from google drive using url

阅读更多关于 Python: download files from google drive using url

I am trying to download files from google drive and all I have is the drive's url. I have read about google api that talks about some drive_service and MedioIO, which also requires some credentials( mainly json file/oauth). But I am unable to get any idea about how its working. Also, tried urllib2 urlretrieve, but my case is to get files from drive. Tried 'wget' too but no use. Tried pydrive library. It has good upload functions to drive but no download options. Any help will be appreciated. Thanks. turdus-merula If by "drive's url" you mean the shareable link of a file on Google Drive, then

Download and decompress gzipped file in memory?

阅读更多关于 Download and decompress gzipped file in memory?

I would like to download a file using urllib and decompress the file in memory before saving. This is what I have right now: response = urllib2.urlopen(baseURL + filename) compressedFile = StringIO.StringIO() compressedFile.write(response.read()) decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb') outfile = open(outFilePath, 'w') outfile.write(decompressedFile.read()) This ends up writing empty files. How can I achieve what I'm after? Updated Answer: #! /usr/bin/env python2 import urllib2 import StringIO import gzip baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/" #

urllib2 POST progress monitoring

阅读更多关于 urllib2 POST progress monitoring

问题 I'm uploading a fairly large file with urllib2 to a server-side script via POST. I want to display a progress indicator that shows the current upload progress. Is there a hook or a callback provided by urllib2 that allows me to monitor upload progress? I know that you can do it with download using successive calls to the connection's read() method, but I don't see a write() method, you just add data to the request. 回答1: It is possible but you need to do a few things: Fake out the urllib2

How do you get default headers in a urllib2 Request?

阅读更多关于 How do you get default headers in a urllib2 Request?

问题 I have a Python web client that uses urllib2. It is easy enough to add HTTP headers to my outgoing requests. I just create a dictionary of the headers I want to add, and pass it to the Request initializer. However, other "standard" HTTP headers get added to the request as well as the custom ones I explicitly add. When I sniff the request using Wireshark, I see headers besides the ones I add myself. My question is how do a I get access to these headers? I want to log every request (including

How to convert a dictionary to query string in Python?

阅读更多关于 How to convert a dictionary to query string in Python?

问题 After using cgi.parse_qs() , how to convert the result (dictionary) back to query string? Looking for something similar to urllib.urlencode() . 回答1: Python 3 urllib.parse. urlencode (query, doseq=False, [...]) Convert a mapping object or a sequence of two-element tuples, which may contain str or bytes objects, to a percent-encoded ASCII text string. — Python 3 urllib.parse docs A dict is a mapping. Legacy Python urllib.urlencode ( query [, doseq ]) Convert a mapping object or a sequence of