urllib2 | 易学教程

Get outgoing port number from urllib2

阅读更多关于 Get outgoing port number from urllib2

问题 I am using Python 2.6.x and urllib2 to do some web scraping, but I need really low-level socket information (really just the port number of the local socket) for each HTTP request. Does anyone know how to get that? Thanks EDIT: Okay, I'm still trying to get this right, so I did what I thought should work but I'm not getting the output when I try and use the new stuff. What am I doing wrong here? from urllib2 import * class AbstractHTTPHandler(AbstractHTTPHandler): def do_open(self, http_class

How to send a request without 'Host Header' using Python?

阅读更多关于 How to send a request without 'Host Header' using Python?

问题 I have been trying for many days now, so here I am finally asking, may be dumb question for most of the experts. I am using PyUnit for API testing of my application. The application (to be tested) is deployed on one of the local servers over here. The application prevents hackers from doing malicious activities. So I am accessing any website (protected by this application) through this application. e.g. http://my-security-app/stackoverflow/login , http://my-security-app/website-to-be

How to follow a redirect with urllib?

阅读更多关于 How to follow a redirect with urllib?

问题 I'm creating a script in Python 3 which access a page like: example.com/daora/zz.asp?x=qqrzzt using the urllib.request.urlopen("example.com/daora/zz.asp?x=qqrzzt"), but this code just give me the same page(example.com/daora/zz.asp?x=qqrzzt) and on the browser i get a redirect to a page like: example.com/egg.aspx What could i do to retrieve the example.com/egg.aspx and not the example.com/daora/zz.asp?x=qqrzzt I think this is relevant code, this is the code from "example.com/daora/zz.asp?x

Timeout error when downloading .html files from urls

阅读更多关于 Timeout error when downloading .html files from urls

问题 I get the following an error when downloading html pages from the urls. Error: raise URLError(err) urllib2.URLError: <urlopen error [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time or established connection failed because connected host has failed to respond> Code: import urllib2 hdr = {'User-Agent': 'Mozilla/5.0'} for i,site in enumerate(urls[index]): print (site) req = urllib2.Request(site, headers=hdr) page = urllib2

Python 2.7 urllib2 raising urllib2.HTTPError 301 when hitting redirect with xml content

阅读更多关于 Python 2.7 urllib2 raising urllib2.HTTPError 301 when hitting redirect with xml content

问题 I'm using urllib2 to request a particular S3 bucket at hxxp://s3.amazonaws.com/mybucket . Amazon sends back an HTTP code of 301 along with some XML data (the redirect being to hxxp://mybucket.s3.amazonaws.com/ ). Instead of following the redirect, python raises urllib2.HTTPError: HTTP Error 301: Moved Permanently . According to the official Python docs at HOWTO Fetch Internet Resources Using urllib2, "the default handlers handle redirects (codes in the 300 range)". Is python handling this

How does Python urllib2 https work?

阅读更多关于 How does Python urllib2 https work?

问题 Looking at the documentation for urlib2 it says it supports HTTPS connections. However what it doesn't make clear is how you enable it do you for example take HTTPBasicAuth and replace the HTTP with HTTPS or do you just need to pass an HTTPS in url when you actually open the connection? 回答1: < Python 2.7.9:_ You can simply pass an HTTPS URL when you open the connection. Heed the warning in the Urllib2 documentation that states: "Warning HTTPS requests do not do any verification of the server

Python urllib simple login script

阅读更多关于 Python urllib simple login script

问题 I am trying to make a script to login into my "check card balance" service for my university using python. Basically it's a web form where we fill-in our PIN and PASS and it shows us how much $$$ is left on our card (for food)... This is the webpage: [url]http://www.wcu.edu/11407.asp[/url] This is the form I am filling: <FORM method=post action=https://itapp.wcu.edu/BanAuthRedirector/Default.aspx><INPUT value=https://cf.wcu.edu/busafrs/catcard/idsearch.cfm type=hidden name=wcuirs_uri> <P><B

Requests, bind to an ip

阅读更多关于 Requests, bind to an ip

问题 I have a script that makes some requests with urllib2 . I use the trick suggested elsewhere on Stack Overflow to bind another ip to the application, where my my computer has two ip addresses (IP A and IP B). I would like to switch to using the requests library. Does anyone knows how I can achieve the same functionality with that library? 回答1: Looking into the requests module, it looks like it uses httplib to send the http requests. httplib uses socket.create_connection() to connect to the www

Python urllib2 parse html problem

阅读更多关于 Python urllib2 parse html problem

问题 I am using mechanize to parse html of website, but with this website i got strange result. from mechanize import Browser br = Browser() r = br.open("http://www.heavenplaza.com") result = r.read() result is something which i can not understand. you can see here: http://paste2.org/p/1556077 Anyone can have some method to get that website HTML? with mechanize or urllib. Thanks 回答1: import urllib2, StringIO, gzip f = urllib2.urlopen("http://www.heavenplaza.com") data = StringIO.StringIO(f.read())

How to check redirected web page address, without downloading it in Python

阅读更多关于 How to check redirected web page address, without downloading it in Python

问题 For a given url, how can I detect final internet location after HTTP redirects, without downloading final page (e.g. HEAD request.) using python. I am trying to write a mass downloader, my downloading mechanism needs to know internet location of page before downloading it. edit I ended up doing this, I hope this helps other people. I am still open to other methods. import urlparse import httplib def getFinalUrl(url): "Navigates Through redirections to get final url." parsed = urlparse