urllib2

python urllib2: connection reset by peer

强颜欢笑 提交于 2019-12-10 12:43:45
问题 I have a perl program that retrieves data from the database of my university library and it works well. Now I want to rewrite it in python but encounter the problem <urlopen error [errno 104] connection reset by peer> The perl code is: my $ua = LWP::UserAgent->new; $ua->cookie_jar( HTTP::Cookies->new() ); $ua->timeout(30); $ua->env_proxy; my $response = $ua->get($url); The python code I wrote is: cj = CookieJar(); request = urllib2.Request(url); # url: target web page opener = urllib2.build

“post” method to communicate directly with a server

别来无恙 提交于 2019-12-10 12:19:31
问题 Just started with python not long ago, and I'm learning to use "post" method to communicate directly with a server. A fun script I'm working on right now is to post comments on wordpress. The script does post comments on my local site, but I don't know why it raises HTTP Error 404 which means page not found. Here's my code, please help me find what's wrong: import urllib2 import urllib url='http://localhost/wp-comments-post.php' user_agent='Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'

urllib2 won't use my proxy

ぃ、小莉子 提交于 2019-12-10 11:28:51
问题 I'm trying to open a URL with urllib2 using an opener I built with a HTTPS proxy, however it is requesting it with my normal IP, and not the proxy I give it. import urllib2 proxy = urllib2.ProxyHandler({'https': 'IP:PORT'}) opener = urllib2.build_opener(proxy) my_ip = opener.open('http://whatthehellismyip.com/?ipraw').read() print my_ip Can anyone please tell me what I am doing wrong here? 回答1: You forgot to install opener. This should work: import urllib2 proxy = urllib2.ProxyHandler({'https

Python SOAP request using urllib2

别等时光非礼了梦想. 提交于 2019-12-10 10:56:33
问题 I'm attempting to write a script to communicate with sharepoint via SOAP using urllib2 in Python. My code is connecting successfully to a sharepoint list, but does not do anything once connected. Could my SOAP request be wrong? It seems to be returning nothing, despite 2 list items existing on the sharepoint site. import urllib2 from ntlm import HTTPNtlmAuthHandler user = r'DOMAIN\myusername' password = "password" url = "https://mysecuresite.com/site/_vti_bin/Lists.asmx" passman = urllib2

getting value of location header using python urllib2

岁酱吖の 提交于 2019-12-10 10:49:35
问题 when I use urllib2,and list the headers,I cannot see the 'Location' header. In [19]:p = urllib2.urlopen('http://www.example.com') In [21]: p.headers.items() Out[21]: [('transfer-encoding', 'chunked'), ('vary', 'Accept-Encoding'), ('server', 'Apache/2.2.3 (CentOS)'), ('last-modified', 'Wed, 09 Feb 2011 17:13:15 GMT'), ('connection', 'close'), ('date', 'Fri, 25 May 2012 03:00:02 GMT'), ('content-type', 'text/html; charset=UTF-8')] If I use telnet and GET telnet www.example.com 80 Trying 192.0

Why urllib2.urlopen can not open pages like “http://localhost/new-post#comment-29”?

人盡茶涼 提交于 2019-12-10 10:29:32
问题 I'm curious, how come I get 404 error running this line: urllib2.urlopen("http://localhost/new-post#comment-29") While everything works fine surfing http://localhost/new-post#comment-29 in any browser... urlopen method does not parse urls with "#" in it? Anybody knows? 回答1: In the HTTP protocol, the fragment (from # onwards) is not sent to the server across the network: it's locally retained by the browser and used, once the server's response is fully received, to somehow "visually locate"

Using urllib2 via proxy

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-10 10:29:28
问题 I am trying to use urllib2 through a proxy; however, after trying just about every variation of passing my verification details using urllib2 , I either get a request that hangs forever and returns nothing or I get 407 Errors . I can connect to the web fine using my browser which connects to a prox-pac and redirects accordingly; however, I can't seem to do anything via the command line curl , wget , urllib2 etc. even if I use the proxies that the prox-pac redirects to. I tried setting my

How to send Multipart/related requests in Python to SOAP server?

笑着哭i 提交于 2019-12-09 19:46:56
问题 I have to send a file to a SOAP server via a multipart/related HTTP POST. I have built the message from scratch like this: from email.mime.application import MIMEApplication from email.encoders import encode_7or8bit from email.mime.multipart import MIMEMultipart from email.mime.base import MIMEBase envelope = """<?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://www.w3.org/2003/05/soap-envelope" xmlns:SOAP-ENC="http://www.w3.org/2003/05/soap-encoding" xmlns:xsi=

Why am I getting “'ResultSet' has no attribute 'findAll'” using BeautifulSoup in Python?

隐身守侯 提交于 2019-12-09 16:17:27
问题 So I am learning Python slowly, and am trying to make a simple function that will draw data from the high scores page of an online game. This is someone else's code that i rewrote into one function (which might be the problem), but I am getting this error. Here is the code: >>> from urllib2 import urlopen >>> from BeautifulSoup import BeautifulSoup >>> def create(el): source = urlopen(el).read() soup = BeautifulSoup(source) get_table = soup.find('table', {'id':'mini_player'}) get_rows = get

Python - Easy way to scrape Google, download top N hits (entire .html documents) for given search?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-09 15:14:28
问题 Is there an easy way to scrape Google and write the text (just the text) of the top N (say, 1000) .html (or whatever) documents for a given search? As an example, imagine searching for the phrase "big bad wolf" and downloading just the text from the top 1000 hits -- i.e., actually downloading the text from those 1000 web pages (but just those pages, not the entire site). I'm assuming this would use the urllib2 library? I use Python 3.1 if that helps. 回答1: The official way to get results from