urllib2 | 易学教程

SOCKS5 proxy using urllib2 and PySocks

阅读更多关于 SOCKS5 proxy using urllib2 and PySocks

问题 I'm trying to connect to a SOCKS5 proxy using urllib2 and PySocks. My proxy has a username and password and I use the below code, however I always get a socks.SOCKS5Error: 0x02: Connection not allowed by ruleset message when I'm trying to connect. Would anyone know what I'm doing wrong..? import socket import socks import urllib2 socks.set_default_proxy(socks.SOCKS5, "xx.xx.xx", 8080, 'username','pass') socket.socket = socks.socksocket hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64)

Big requests issue: GET doesnt release/reset TCP connections, loop crashes

阅读更多关于 Big requests issue: GET doesnt release/reset TCP connections, loop crashes

im using python3.3 and the requests module to scrape links from an arbitrary webpage. My program works as follows: I have a list of urls which in the beginning has just the starting url in it. The program loops over that list and gives the urls to a procedure GetLinks, where im using requests.get and Beautifulsoup to extract all links. Before that procedure appends links to my urllist it gives them to another procedure testLinks to see whether its an internal, external or broken link. In the testLinks im using requests.get too to be able to handle redirects etc. The program worked really well

503 error when trying to access Google Patents using python

阅读更多关于 503 error when trying to access Google Patents using python

问题 Earlier today I was able to pull data from Google Patents using the code below import urllib2 url = 'http://www.google.com/search?tbo=p&q=ininventor:"John-Mudd"&hl=en&tbm=pts&source=lnt&tbs=ptso:us' req = urllib2.Request(url, headers={'User-Agent' : "foobar"}) response = urllib2.urlopen(req) Now when I go to run it I get the following 503 error. I had only looped through this code maybe 30 times on it (i'm trying to get all the patents owned by a list of 30 people). HTTPError Traceback (most

Website form login using Python urllib2

阅读更多关于 Website form login using Python urllib2

问题 I've breen trying to learn to use the urllib2 package in Python. I tried to login in as a student (the left form) to a signup page for maths students: http://reg.maths.lth.se/. I have inspected the code (using Firebug) and the left form should obviously be called using POST with a key called pnr whose value should be a string 10 characters long (the last part can perhaps not be seen from the HTML code, but it is basically my social security number so I know how long it should be). Note that

trying to split the file download buffer to into separate threads

阅读更多关于 trying to split the file download buffer to into separate threads

问题 I am trying to download the buffer of file into 5 threads but it seems like it's getting garbled. from numpy import arange import requests from threading import Thread import urllib2 url = 'http://pymotw.com/2/urllib/index.html' sizeInBytes = r = requests.head(url, headers={'Accept-Encoding': 'identity'}).headers['content-length'] splitBy = 5 splits = arange(splitBy + 1) * (float(sizeInBytes)/splitBy) dataLst = [] def bufferSplit(url, idx, splits): req = urllib2.Request(url, headers={'Range':

Create new TCP Connections for every HTTP request in python

阅读更多关于 Create new TCP Connections for every HTTP request in python

问题 For my college project I am trying to develop a python based traffic generator.I have created 2 CentOS machines on vmware and I am using 1 as my client and 1 as my server machine. I have used IP aliasing technique to increase number of clients and severs using just single client/server machine. Upto now I have created 50 IP alias on my client machine and 10 IP alias on my server machine. I am also using multiprocessing module to generate traffic concurrently from all 50 clients to all 10

Download an internet resource in Python and save it on my desired location

阅读更多关于 Download an internet resource in Python and save it on my desired location

问题 I am new to Python and I am using urllib2 to download files over the internet. I am using this code import urllib2 response = urllib2.urlopen('http://www.example.com/myfile.zip') ... This code actually save the zip file on my temp folder, I don't want it to be like that, I want to save it on my desired location. Is it possible? 回答1: You can use the urllib.urlretrieve function, to download the distant file to your local filesystem. >>> import urllib >>> urllib.urlretrieve('http://www.example

Extracting source code from html file using python3.1 urllib.request

阅读更多关于 Extracting source code from html file using python3.1 urllib.request

问题 I'm trying to obtain data using regular expressions from a html file, by implementing the following code: import urllib.request def extract_words(wdict, urlname): uf = urllib.request.urlopen(urlname) text = uf.read() print (text) match = re.findall("<tr>\s*<td>([\w\s.;'(),-/]+)</td>\s+<td>([\w\s.,;'()-/]+)</td>\s*</tr>", text) which returns an error: File "extract.py", line 33, in extract_words match = re.findall("<tr>\s*<td>([\w\s.;'(),-/]+)</td>\s+<td>([\w\s.,;'()-/]+)</td>\s*</tr>", text)

urllib2.HTTPError Python

阅读更多关于 urllib2.HTTPError Python

问题 I have a file with GI numbers and would like to get FASTA sequences from ncbi. from Bio import Entrez import time Entrez.email ="eigtw59tyjrt403@gmail.com" f = open("C:\\bioinformatics\\gilist.txt") for line in iter(f): handle = Entrez.efetch(db="nucleotide", id=line, retmode="xml") records = Entrez.read(handle) print ">GI "+line.rstrip()+" "+records[0]["GBSeq_primary-accession"]+" "+records[0]["GBSeq_definition"]+"\n"+records[0]["GBSeq_sequence"] time.sleep(1) # to make sure not many

How do I translate Python urllib.request code to Java code

阅读更多关于 How do I translate Python urllib.request code to Java code

问题 This is the python code import urllib.request as urllib2 import json data = { "Inputs": { "input1": { "ColumnNames": ["id", "regex"], "Values": [ [ "0", "the regex value" ],] }, }, "GlobalParameters": { "Database query": "select * from expone", } } body = str.encode(json.dumps(data)) url = 'https://ussouthcentral.services.azureml.net/workspaces/4729545551a741e1a2e606d37' \ 'ae61ce0/services/ac7c34ad134d43ca9fdc65e292ce35d3/execute?api-version=2.0&details=true' api_key = '8ku5P6fR3F8ykgMHK5Y8