urllib2

python,not getting full response

蹲街弑〆低调 提交于 2019-11-29 15:18:04
when I want to get the page using urllib2, I don't get the full page. here is the code in python: import urllib2 import urllib import socket from bs4 import BeautifulSoup # define the frequency for http requests socket.setdefaulttimeout(5) # getting the page def get_page(url): """ loads a webpage into a string """ src = '' req = urllib2.Request(url) try: response = urllib2.urlopen(req) src = response.read() response.close() except IOError: print 'can\'t open',url return src return src def write_to_file(soup): ''' i know that I should use try and catch''' # writing to file, you can check if you

Github-api giving 404 when passing json-data with python + urllib2

匆匆过客 提交于 2019-11-29 15:13:30
I have the following code, which should perform the first part of creating a new download at github. It should send the json-data with POST. jsonstring = '{"name": "test", "size": "4"}' req = urllib2.Request("https://api.github.com/repos/<user>/<repo>/downloads") req.add_header('Authorization', 'token ' + '<token>') result = urllib2.urlopen(req, jsonstring) If I remove the , jsonstring from the urlopen() , it does not fail, and gives me the list of available downloads. However, if I try to POST the json-string, I get 404 error. The problem has to be with the json, or in the way I send it, but

Python爬虫-requests

China☆狼群 提交于 2019-11-29 14:52:48
python爬虫-requests python爬虫-requests 说明 基于python3实现 主要方法: 异常: 参数 session对象 说明 无疑,py3上也可以使用urllib2库,但入门时走的py2路线,所以坚持了这一贯的曲风。而这之后会刻意转py3,requests库的使用就成了重中之重。可实在没什么好讲述的,有了urllib2基础之后,基于一个“使用对照表”,一切就仿佛顺理成章。 自然非熟练情况下,并不总能记住。最近用requests库的时候老把这篇笔记翻来看,也算勉强“敷衍”了自身的需求。大概我这样的坏毛很难纠改了——“但凡没有外力迫使,一切交予时间”。所以我坚定不移的相信,只要周期足够,掌握是早晚事儿而已。 下面主要说明一下自己使用的时候明显察觉到的区别: 1. 不需要urllib.urlencode()实现url编码,get和post请求方式直接上参数,分别对应:前者params,后者data 2. 拿到服务器返还的response后urllib2不需要编码处理,而requsets需要,实现方式为: response.encoding=response.apparent_encoding 3. urllib2对文字与照片(这样说不准确,应该是二进制文件)的处理都是response.read();requests分别对应text方法和content方法,即:

Python urllib2. URLError: <urlopen error [Errno 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted>

断了今生、忘了曾经 提交于 2019-11-29 14:25:31
I'm making multiple connection to API. Making delete query. I got that error on a 3000'th query. Something like this: def delete_request(self,path): opener = urllib2.build_opener(urllib2.HTTPHandler) request = urllib2.Request('%s%s'%(self.endpoint,path)) signature = self._gen_auth('DELETE', path, '') request.add_header('X-COMPANY-SIGNATURE-AUTH', signature) request.get_method = lambda: 'DELETE' resp = opener.open(request) Than in console: for i in xrange(300000): con.delete_request('/integration/sitemap/item.xml/media/%d/' % i) After about 3000'th request it says: URLError: urlopen error

I am downloading a file using Python urllib2. How do I check how large the file size is?

馋奶兔 提交于 2019-11-29 13:38:33
问题 And if it is large...then stop the download? I don't want to download files that are larger than 12MB. request = urllib2.Request(ep_url) request.add_header('User-Agent',random.choice(agents)) thefile = urllib2.urlopen(request).read() 回答1: There's no need as bobince did and drop to httplib. You can do all that with urllib directly: >>> import urllib2 >>> f = urllib2.urlopen("http://dalkescientific.com") >>> f.headers.items() [('content-length', '7535'), ('accept-ranges', 'bytes'), ('server',

Get Request Headers for Urllib2.Request?

浪子不回头ぞ 提交于 2019-11-29 12:32:30
问题 Is there a way to get the headers from a request created with Urllib2 or to confirm the HTTP headers sent with urllib2.urlopen? 回答1: An easy way to see request (and response headers) is to enable debug output: opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1)) You then can see the precise headers sent/recieved: >>> opener.open('http://python.org') send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: python.org\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'

Use python to access a site with PKI security

空扰寡人 提交于 2019-11-29 12:31:47
问题 I have a site that has PKI security enabled. Each client used either a card reader to load their certificate, or the certificate is installed in the IE certificate storage on their box. So my question are: How can I use either the card reader certificate or the certificate stored on the system to verify the system? How do I pass the credentials onto the site to say, hey I'm me and I can access the service? They example can be using soft certificates. I can figure out the card reader part

Getting the final destination of a javascript redirect on a website

我是研究僧i 提交于 2019-11-29 12:11:53
I parse a website with python. They use a lot of redirects and they do them by calling javascript functions. So when I just use urllib to parse the site, it doesn't help me, because I can't find the destination url in the returned html code. Is there a way to access the DOM and call the correct javascript function from my python code? All I need is the url, where the redirect takes me. I looked into Selenium. And if you are not running a pure script (meaning you don't have a display and can't start a "normal" browser) the solution is actually quite simple: from selenium import webdriver driver

Python unable to retrieve form with urllib or mechanize

让人想犯罪 __ 提交于 2019-11-29 11:00:02
I'm trying to fill out and submit a form using Python, but I'm not able to retrieve the resulting page. I've tried both mechanize and urllib/urllib2 methods to post the form, but both run into problems. The form I'm trying to retrieve is here: http://zrs.leidenuniv.nl/ul/start.php . The page is in Dutch, but this is irrelevant to my problem. It may be noteworthy that the form action redirects to http://zrs.leidenuniv.nl/ul/query.php . First of all, this is the urllib/urllib2 method I've tried: import urllib, urllib2 import socket, cookielib url = 'http://zrs.leidenuniv.nl/ul/start.php' params

Opening Local File Works with urllib but not with urllib2

梦想的初衷 提交于 2019-11-29 10:56:01
问题 I'm trying to open a local file using urllib2. How can I go about doing this? When I try the following line with urllib: resp = urllib.urlopen(url) it works correctly, but when I switch it to: resp = urllib2.urlopen(url) I get: ValueError: unknown url type: /path/to/file where that file definitely does exit. Thanks! 回答1: Just put "file://" in front of the path >>> import urllib2 >>> urllib2.urlopen("file:///etc/debian_version").read() 'wheezy/sid\n' 回答2: In urllib.urlopen method: If the URL