urllib2 | 易学教程

(转)Python3.X如何下载安装urllib2包？

阅读更多关于 (转)Python3.X如何下载安装urllib2包？

python 3.X版本是不需要安装：urllib2包的，urllib和urllib2包集合成在一个包了那现在问题是：在python3.x版本中，如何使用：urllib2.urlopen()？答： import urllib.request resp=urllib.request.urlopen(http://www.baidu.com) 来源：https://www.cnblogs.com/zdlfb/p/6130724.html 来源：博客园作者：天线宝宝6 链接：https://www.cnblogs.com/yourwit/p/11673096.html

How to POST an xml element in python

阅读更多关于 How to POST an xml element in python

Basically I have this xml element (xml.etree.ElementTree) and I want to POST it to a url. Currently I'm doing something like xml_string = xml.etree.ElementTree.tostring(my_element) data = urllib.urlencode({'xml': xml_string}) response = urllib2.urlopen(url, data) I'm pretty sure that works and all, but was wondering if there is some better practice or way to do it without converting it to a string first. Thanks! If this is your own API, I would consider POSTing as application/xml . The default is application/x-www-form-urlencoded , which is meant for HTML form data, not a single XML document.

How do I scrape pages with dynamically generated URLs using Python?

阅读更多关于 How do I scrape pages with dynamically generated URLs using Python?

I am trying to scrape http://www.dailyfinance.com/quote/NYSE/international-business-machines/IBM/financial-ratios , but the traditional url string building technique doesn't work because the "full-company-name-is-inserted-in-the-path" string. And the exact "full-company-name" isn't known in advance. Only the company symbol, "IBM" is known. Essentially, the way I scrape is by looping through an array of company symbol and build the url string before sending it to urllib2.urlopen(url). But in this case, that can't be done. For example, CSCO string is http://www.dailyfinance.com/quote/NASDAQ

Extract News article content from stored .html pages

阅读更多关于 Extract News article content from stored .html pages

I am reading text from html files and doing some analysis. These .html files are news articles. Code: html = open(filepath,'r').read() raw = nltk.clean_html(html) raw.unidecode(item.decode('utf8')) Now I just want the article content and not the rest of the text like advertisements, headings etc. How can I do so relatively accurately in python? I know some tools like Jsoup(a java api) and bolier but I want to do so in python. I could find some techniques using bs4 but there limited to one type of page. And I have news pages from numerous sources. Also, there is dearth of any sample code

Python and urllib2: how to make a GET request with parameters

阅读更多关于 Python and urllib2: how to make a GET request with parameters

I'm building an "API API", it's basically a wrapper for a in house REST web service that the web app will be making a lot of requests to. Some of the web service calls need to be GET rather than post, but passing parameters. Is there a "best practice" way to encode a dictionary into a query string? e.g.: ?foo=bar&bla=blah I'm looking at the urllib2 docs , and it looks like it decides by itself wether to use POST or GET based on if you pass params or not, but maybe someone knows how to make it transform the params dictionary into a GET request. Maybe there's a package for something like this

Urllib2 & BeautifulSoup : Nice couple but too slow - urllib3 & threads?

阅读更多关于 Urllib2 & BeautifulSoup : Nice couple but too slow - urllib3 & threads?

I was looking to find a way to optimize my code when I heard some good things about threads and urllib3. Apparently, people disagree which solution is the best. The problem with my script below is the execution time: so slow! Step 1 : I fetch this page http://www.cambridgeesol.org/institutions/results.php?region=Afghanistan&type=&BULATS=on Step 2 : I parse the page with BeautifulSoup Step 3: I put the data in an excel doc Step 4: I do it again, and again, and again for all the countries in my list (big list) (I am just changing "Afghanistan" in the url to another country) Here is my code: ws =

urllib2 opener hangs if run inside a thread

阅读更多关于 urllib2 opener hangs if run inside a thread

问题 I have a code that is running fine (connect to a page , get PHPSESSID) . when i put that code in a function , then made a thread of it : Gdk.threads_enter() threading.Thread(target=self.do_login,args=()).start() Gdk.threads_leave() the code hangs on f = opener.open(req) any ideas why ? when i force close the application , it completes everything and prints everything in the terminal without errors . why does it hang on that particular line in thread only . it does not outside of a thread .

Which is best in Python: urllib2, PycURL or mechanize?

阅读更多关于 Which is best in Python: urllib2, PycURL or mechanize?

Ok so I need to download some web pages using Python and did a quick investigation of my options. Included with Python: urllib - seems to me that I should use urllib2 instead. urllib has no cookie support, HTTP/FTP/local files only (no SSL) urllib2 - complete HTTP/FTP client, supports most needed things like cookies, does not support all HTTP verbs (only GET and POST, no TRACE, etc.) Full featured: mechanize - can use/save Firefox/IE cookies, take actions like follow second link, actively maintained (0.2.5 released in March 2011) PycURL - supports everything curl does (FTP, FTPS, HTTP, HTTPS,

urllib2 Error 403: Forbidden

阅读更多关于 urllib2 Error 403: Forbidden

I have posted to this site and received really helpful guidance, i return with another question. Where have i gone wrong here, I was prettty sure this is what is required to access information from various sites. In this case, the CME Group. import urllib2 url = "http://www.cmegroup.com/trading/energy/natural-gas/natural-gas.html" request= urllib2.Request(url) handle = urllib2.urlopen(request) content = handle.read() splitted_page = content.split("<span class=\"cmeSubHeading\">", 1); splitted_page = splitted_page[1].split("</span>", 1) print splitted_page[0] Error reads, HTTPError(req.get_full

Scrappy response different than browser response

阅读更多关于 Scrappy response different than browser response

问题 I am trying to scrape a this page with scrapy: http://www.barnesandnoble.com/s?dref=4815&sort=SA&startat=7391 and the response which I get is different than what I see in the browser. Browser response has the correct page, while scrapy response is: http://www.barnesandnoble.com/s?dref=4815&sort=SA&startat=1 page. I have tried with urllib2 but still have the same issue. Any help is much appreciated. 回答1: I don't really understand the issue, but usually a different response for a browser and