urllib2 | 易学教程

python,not getting full response

阅读更多关于 python,not getting full response

when I want to get the page using urllib2, I don't get the full page. here is the code in python: import urllib2 import urllib import socket from bs4 import BeautifulSoup # define the frequency for http requests socket.setdefaulttimeout(5) # getting the page def get_page(url): """ loads a webpage into a string """ src = '' req = urllib2.Request(url) try: response = urllib2.urlopen(req) src = response.read() response.close() except IOError: print 'can\'t open',url return src return src def write_to_file(soup): ''' i know that I should use try and catch''' # writing to file, you can check if you

Github-api giving 404 when passing json-data with python + urllib2

阅读更多关于 Github-api giving 404 when passing json-data with python + urllib2

I have the following code, which should perform the first part of creating a new download at github. It should send the json-data with POST. jsonstring = '{"name": "test", "size": "4"}' req = urllib2.Request("https://api.github.com/repos/<user>/<repo>/downloads") req.add_header('Authorization', 'token ' + '<token>') result = urllib2.urlopen(req, jsonstring) If I remove the , jsonstring from the urlopen() , it does not fail, and gives me the list of available downloads. However, if I try to POST the json-string, I get 404 error. The problem has to be with the json, or in the way I send it, but

Python爬虫-requests

阅读更多关于 Python爬虫-requests

python爬虫-requests python爬虫-requests 说明基于python3实现主要方法：异常：参数 session对象说明无疑，py3上也可以使用urllib2库，但入门时走的py2路线，所以坚持了这一贯的曲风。而这之后会刻意转py3，requests库的使用就成了重中之重。可实在没什么好讲述的，有了urllib2基础之后，基于一个“使用对照表”，一切就仿佛顺理成章。自然非熟练情况下，并不总能记住。最近用requests库的时候老把这篇笔记翻来看，也算勉强“敷衍”了自身的需求。大概我这样的坏毛很难纠改了——“但凡没有外力迫使，一切交予时间”。所以我坚定不移的相信，只要周期足够，掌握是早晚事儿而已。下面主要说明一下自己使用的时候明显察觉到的区别： 1. 不需要urllib.urlencode()实现url编码，get和post请求方式直接上参数，分别对应：前者params,后者data 2. 拿到服务器返还的response后urllib2不需要编码处理，而requsets需要,实现方式为： response.encoding=response.apparent_encoding 3. urllib2对文字与照片（这样说不准确，应该是二进制文件）的处理都是response.read();requests分别对应text方法和content方法，即：

Python urllib2. URLError: <urlopen error [Errno 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted>

阅读更多关于 Python urllib2. URLError:

I'm making multiple connection to API. Making delete query. I got that error on a 3000'th query. Something like this: def delete_request(self,path): opener = urllib2.build_opener(urllib2.HTTPHandler) request = urllib2.Request('%s%s'%(self.endpoint,path)) signature = self._gen_auth('DELETE', path, '') request.add_header('X-COMPANY-SIGNATURE-AUTH', signature) request.get_method = lambda: 'DELETE' resp = opener.open(request) Than in console: for i in xrange(300000): con.delete_request('/integration/sitemap/item.xml/media/%d/' % i) After about 3000'th request it says: URLError: urlopen error

I am downloading a file using Python urllib2. How do I check how large the file size is?

阅读更多关于 I am downloading a file using Python urllib2. How do I check how large the file size is?

问题 And if it is large...then stop the download? I don't want to download files that are larger than 12MB. request = urllib2.Request(ep_url) request.add_header('User-Agent',random.choice(agents)) thefile = urllib2.urlopen(request).read() 回答1: There's no need as bobince did and drop to httplib. You can do all that with urllib directly: >>> import urllib2 >>> f = urllib2.urlopen("http://dalkescientific.com") >>> f.headers.items() [('content-length', '7535'), ('accept-ranges', 'bytes'), ('server',

Get Request Headers for Urllib2.Request?

阅读更多关于 Get Request Headers for Urllib2.Request?

问题 Is there a way to get the headers from a request created with Urllib2 or to confirm the HTTP headers sent with urllib2.urlopen? 回答1: An easy way to see request (and response headers) is to enable debug output: opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1)) You then can see the precise headers sent/recieved: >>> opener.open('http://python.org') send: 'GET / HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: python.org\r\nConnection: close\r\nUser-Agent: Python-urllib/2.7\r\n\r\n'

Use python to access a site with PKI security

阅读更多关于 Use python to access a site with PKI security

问题 I have a site that has PKI security enabled. Each client used either a card reader to load their certificate, or the certificate is installed in the IE certificate storage on their box. So my question are: How can I use either the card reader certificate or the certificate stored on the system to verify the system? How do I pass the credentials onto the site to say, hey I'm me and I can access the service? They example can be using soft certificates. I can figure out the card reader part

Getting the final destination of a javascript redirect on a website

阅读更多关于 Getting the final destination of a javascript redirect on a website

I parse a website with python. They use a lot of redirects and they do them by calling javascript functions. So when I just use urllib to parse the site, it doesn't help me, because I can't find the destination url in the returned html code. Is there a way to access the DOM and call the correct javascript function from my python code? All I need is the url, where the redirect takes me. I looked into Selenium. And if you are not running a pure script (meaning you don't have a display and can't start a "normal" browser) the solution is actually quite simple: from selenium import webdriver driver

Python unable to retrieve form with urllib or mechanize

阅读更多关于 Python unable to retrieve form with urllib or mechanize

I'm trying to fill out and submit a form using Python, but I'm not able to retrieve the resulting page. I've tried both mechanize and urllib/urllib2 methods to post the form, but both run into problems. The form I'm trying to retrieve is here: http://zrs.leidenuniv.nl/ul/start.php . The page is in Dutch, but this is irrelevant to my problem. It may be noteworthy that the form action redirects to http://zrs.leidenuniv.nl/ul/query.php . First of all, this is the urllib/urllib2 method I've tried: import urllib, urllib2 import socket, cookielib url = 'http://zrs.leidenuniv.nl/ul/start.php' params

Opening Local File Works with urllib but not with urllib2

阅读更多关于 Opening Local File Works with urllib but not with urllib2

问题 I'm trying to open a local file using urllib2. How can I go about doing this? When I try the following line with urllib: resp = urllib.urlopen(url) it works correctly, but when I switch it to: resp = urllib2.urlopen(url) I get: ValueError: unknown url type: /path/to/file where that file definitely does exit. Thanks! 回答1: Just put "file://" in front of the path >>> import urllib2 >>> urllib2.urlopen("file:///etc/debian_version").read() 'wheezy/sid\n' 回答2: In urllib.urlopen method: If the URL