urllib2

(转)Python3.X如何下载安装urllib2包 ?

匿名 (未验证) 提交于 2019-12-02 22:51:30
python 3.X版本是不需要安装:urllib2包的,urllib和urllib2包集合成在一个包了 那现在问题是: 在python3.x版本中,如何使用:urllib2.urlopen()? 答: import urllib.request resp=urllib.request.urlopen(http://www.baidu.com) 来源:https://www.cnblogs.com/zdlfb/p/6130724.html 来源:博客园 作者: 天线宝宝6 链接:https://www.cnblogs.com/yourwit/p/11673096.html

How to POST an xml element in python

∥☆過路亽.° 提交于 2019-12-02 19:33:44
Basically I have this xml element (xml.etree.ElementTree) and I want to POST it to a url. Currently I'm doing something like xml_string = xml.etree.ElementTree.tostring(my_element) data = urllib.urlencode({'xml': xml_string}) response = urllib2.urlopen(url, data) I'm pretty sure that works and all, but was wondering if there is some better practice or way to do it without converting it to a string first. Thanks! If this is your own API, I would consider POSTing as application/xml . The default is application/x-www-form-urlencoded , which is meant for HTML form data, not a single XML document.

How do I scrape pages with dynamically generated URLs using Python?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-02 17:43:47
I am trying to scrape http://www.dailyfinance.com/quote/NYSE/international-business-machines/IBM/financial-ratios , but the traditional url string building technique doesn't work because the "full-company-name-is-inserted-in-the-path" string. And the exact "full-company-name" isn't known in advance. Only the company symbol, "IBM" is known. Essentially, the way I scrape is by looping through an array of company symbol and build the url string before sending it to urllib2.urlopen(url). But in this case, that can't be done. For example, CSCO string is http://www.dailyfinance.com/quote/NASDAQ

Extract News article content from stored .html pages

早过忘川 提交于 2019-12-02 17:18:39
I am reading text from html files and doing some analysis. These .html files are news articles. Code: html = open(filepath,'r').read() raw = nltk.clean_html(html) raw.unidecode(item.decode('utf8')) Now I just want the article content and not the rest of the text like advertisements, headings etc. How can I do so relatively accurately in python? I know some tools like Jsoup(a java api) and bolier but I want to do so in python. I could find some techniques using bs4 but there limited to one type of page. And I have news pages from numerous sources. Also, there is dearth of any sample code

Python and urllib2: how to make a GET request with parameters

泄露秘密 提交于 2019-12-02 17:01:14
I'm building an "API API", it's basically a wrapper for a in house REST web service that the web app will be making a lot of requests to. Some of the web service calls need to be GET rather than post, but passing parameters. Is there a "best practice" way to encode a dictionary into a query string? e.g.: ?foo=bar&bla=blah I'm looking at the urllib2 docs , and it looks like it decides by itself wether to use POST or GET based on if you pass params or not, but maybe someone knows how to make it transform the params dictionary into a GET request. Maybe there's a package for something like this

Urllib2 & BeautifulSoup : Nice couple but too slow - urllib3 & threads?

一世执手 提交于 2019-12-02 16:53:31
I was looking to find a way to optimize my code when I heard some good things about threads and urllib3. Apparently, people disagree which solution is the best. The problem with my script below is the execution time: so slow! Step 1 : I fetch this page http://www.cambridgeesol.org/institutions/results.php?region=Afghanistan&type=&BULATS=on Step 2 : I parse the page with BeautifulSoup Step 3: I put the data in an excel doc Step 4: I do it again, and again, and again for all the countries in my list (big list) (I am just changing "Afghanistan" in the url to another country) Here is my code: ws =

urllib2 opener hangs if run inside a thread

╄→гoц情女王★ 提交于 2019-12-02 16:29:57
问题 I have a code that is running fine (connect to a page , get PHPSESSID) . when i put that code in a function , then made a thread of it : Gdk.threads_enter() threading.Thread(target=self.do_login,args=()).start() Gdk.threads_leave() the code hangs on f = opener.open(req) any ideas why ? when i force close the application , it completes everything and prints everything in the terminal without errors . why does it hang on that particular line in thread only . it does not outside of a thread .

Which is best in Python: urllib2, PycURL or mechanize?

我们两清 提交于 2019-12-02 13:47:54
Ok so I need to download some web pages using Python and did a quick investigation of my options. Included with Python: urllib - seems to me that I should use urllib2 instead. urllib has no cookie support, HTTP/FTP/local files only (no SSL) urllib2 - complete HTTP/FTP client, supports most needed things like cookies, does not support all HTTP verbs (only GET and POST, no TRACE, etc.) Full featured: mechanize - can use/save Firefox/IE cookies, take actions like follow second link, actively maintained (0.2.5 released in March 2011) PycURL - supports everything curl does (FTP, FTPS, HTTP, HTTPS,

urllib2 Error 403: Forbidden

白昼怎懂夜的黑 提交于 2019-12-02 11:13:34
I have posted to this site and received really helpful guidance, i return with another question. Where have i gone wrong here, I was prettty sure this is what is required to access information from various sites. In this case, the CME Group. import urllib2 url = "http://www.cmegroup.com/trading/energy/natural-gas/natural-gas.html" request= urllib2.Request(url) handle = urllib2.urlopen(request) content = handle.read() splitted_page = content.split("<span class=\"cmeSubHeading\">", 1); splitted_page = splitted_page[1].split("</span>", 1) print splitted_page[0] Error reads, HTTPError(req.get_full

Scrappy response different than browser response

做~自己de王妃 提交于 2019-12-02 11:06:05
问题 I am trying to scrape a this page with scrapy: http://www.barnesandnoble.com/s?dref=4815&sort=SA&startat=7391 and the response which I get is different than what I see in the browser. Browser response has the correct page, while scrapy response is: http://www.barnesandnoble.com/s?dref=4815&sort=SA&startat=1 page. I have tried with urllib2 but still have the same issue. Any help is much appreciated. 回答1: I don't really understand the issue, but usually a different response for a browser and