urllib | 易学教程

replace special characters in a string python

阅读更多关于 replace special characters in a string python

问题 I am using urllib to get a string of html from a website and need to put each word in the html document into a list. Here is the code I have so far. I keep getting an error. I have also copied the error below. import urllib.request url = input("Please enter a URL: ") z=urllib.request.urlopen(url) z=str(z.read()) removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ") words = removeSpecialChars.split() print ("Words list: ", words[0:20]) Here is the error. Please enter a URL:

Making a POST call instead of GET using urllib2

阅读更多关于 Making a POST call instead of GET using urllib2

There's a lot of stuff out there on urllib2 and POST calls, but I'm stuck on a problem. I'm trying to do a simple POST call to a service: url = 'http://myserver/post_service' data = urllib.urlencode({'name' : 'joe', 'age' : '10'}) content = urllib2.urlopen(url=url, data=data).read() print content I can see the server logs and it says that I'm doing GET calls, when I'm sending the data argument to urlopen. The library is raising an 404 error (not found), which is correct for a GET call, POST calls are processed well (I'm also trying with a POST within a HTML form). Gregg This may have been

How to unquote a urlencoded unicode string in python?

阅读更多关于 How to unquote a urlencoded unicode string in python?

I have a unicode string like "Tanım" which is encoded as "Tan%u0131m" somehow. How can i convert this encoded string back to original unicode. Apparently urllib.unquote does not support unicode. %uXXXX is a non-standard encoding scheme that has been rejected by the w3c, despite the fact that an implementation continues to live on in JavaScript land. The more common technique seems to be to UTF-8 encode the string and then % escape the resulting bytes using %XX. This scheme is supported by urllib.unquote: >>> urllib2.unquote("%0a") '\n' Unfortunately, if you really need to support %uXXXX, you

Overriding urllib2.HTTPError or urllib.error.HTTPError and reading response HTML anyway

阅读更多关于 Overriding urllib2.HTTPError or urllib.error.HTTPError and reading response HTML anyway

问题 I receive a 'HTTP Error 500: Internal Server Error' response, but I still want to read the data inside the error HTML. With Python 2.6, I normally fetch a page using: import urllib2 url = "http://google.com" data = urllib2.urlopen(url) data = data.read() When attempting to use this on the failing URL, I get the exception urllib2.HTTPError : urllib2.HTTPError: HTTP Error 500: Internal Server Error How can I fetch such error pages (with or without urllib2 ), all while they are returning

AttributeError: 'module' object has no attribute 'urlretrieve'

阅读更多关于 AttributeError: 'module' object has no attribute 'urlretrieve'

问题 I am trying to write a program that will download mp3's off of a website then join them together but whenever I try to download the files I get this error: Traceback (most recent call last): File "/home/tesla/PycharmProjects/OldSpice/Voicemail.py", line 214, in <module> main() File "/home/tesla/PycharmProjects/OldSpice/Voicemail.py", line 209, in main getMp3s() File "/home/tesla/PycharmProjects/OldSpice/Voicemail.py", line 134, in getMp3s raw_mp3.add = urllib.urlretrieve("http://www-scf.usc

urllib.request.urlretrieve with proxy?

阅读更多关于 urllib.request.urlretrieve with proxy?

问题 somehow I can't download files trough a proxyserver, and I don't know what i have done wrong. I just get a timeout. Any advice? import urllib.request urllib.request.ProxyHandler({"http" : "myproxy:123"}) urllib.request.urlretrieve("http://myfile", "file.file") 回答1: You need to use your proxy-object, not just instanciate it (you created an object, but didn't assign it to a variable and therefore can't use it). Try using this pattern: #create the object, assign it to a variable proxy = urllib

爬虫基础知识

阅读更多关于爬虫基础知识

爬虫基础知识一：urllib库　　urllib是Python自带的一个用于爬虫的库，其主要作用就是可以通过代码模拟浏览器发送请求。其常被用到的子模块在Python3中的为urllib.request和urllib.parse，在Python2中是urllib和urllib2。二：由易到难的爬虫程序：　　1.爬取百度首页面所有数据值 1 #!/usr/bin/env python 2 # -*- coding:utf-8 -*- 3 #导包 4 import urllib.request 5 import urllib.parse 6 if __name__ == "__main__": 7 #指定爬取的网页url 8 url = 'http://www.baidu.com/' 9 #通过urlopen函数向指定的url发起请求，返回响应对象 10 reponse = urllib.request.urlopen(url=url) 11 #通过调用响应对象中的read函数，返回响应回客户端的数据值（爬取到的数据） 12 data = reponse.read()#返回的数据为byte类型，并非字符串 13 print(data)#打印显示爬取到的数据值。 #补充说明 urlopen函数原型：urllib.request.urlopen(url, data=None,

Parallel fetching of files

阅读更多关于 Parallel fetching of files

问题 In order to download files, I'm creating a urlopen object (urllib2 class) and reading it in chunks. I would like to connect to the server several times and download the file in six different sessions. Doing that, the download speed should get faster. Many download managers have this feature. I thought about specifying the part of file i would like to download in each session, and somehow process all the sessions in the same time. I'm not sure how I can achieve this. 回答1: Sounds like you want

How to route urllib requests through the TOR network? [duplicate]

阅读更多关于 How to route urllib requests through the TOR network? [duplicate]

问题 This question already has answers here : How to make urllib2 requests through Tor in Python? (12 answers) Closed 3 years ago . How to route urllib requests through the TOR network? 回答1: This works for me (using urllib2, haven't tried urllib): def req(url): proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"}) opener = urllib2.build_opener(proxy_support) opener.addheaders = [('User-agent', 'Mozilla/5.0')] return opener.open(url).read() print req('http://google.com') 回答2: Tor works

Python 'requests' library - define specific DNS?

阅读更多关于 Python 'requests' library - define specific DNS?

问题 In my project I'm handling all HTTP requests with python requests library. Now, I need to query the http server using specific DNS - there are two environments, each using its own DNS, and changes are made independently. So, when the code is running, it should use DNS specific to the environment, and not the DNS specified in my internet connection. Has anyone tried this using python-requests? I've only found solution for urllib2: https://stackoverflow.com/questions/4623090/python-set-custom