urllib2

Does Python urllib2 library use the IE proxy setting default on Windows?

眉间皱痕 提交于 2019-12-13 07:26:21
问题 I noticed the urllib2 library used my IE proxy setting. Any official explanation for this? Thanks a lot. 回答1: See the urllib2 section on ProxyHandler. The default is to read the list of proxies from the environment variables <protocol>_proxy . If no proxy environment variables are set, in a Windows environment, proxy settings are obtained from the registry’s Internet Settings section and in a Mac OS X environment, proxy information is retrieved from the OS X System Configuration Framework. 来源

python- youtube. Get url video list

谁都会走 提交于 2019-12-13 06:13:03
问题 I am working with python 2.7. I want to create a txt with the list of videos in a particular youtube list: example list I wrote (I'm totally new in Python): from bs4 import BeautifulSoup import urllib2 import re url='https://www.youtube.com/playlist?list=PLYjSYQBFeM-zQeZFpWeZ_4tnhc3GQWNj8' page=urllib2.urlopen(url) soup = BeautifulSoup(page.read()) href_tags = soup.find_all(href=True) ff = open("C:/exp/file.txt", "w") and then this worked: for i in href_tags: ff.write(str(i)) ff.close() But,

URLError: <urlopen error [Errno 11004] getaddrinfo failed>

梦想的初衷 提交于 2019-12-13 05:45:38
问题 I am actually writing a code to check whether the links are direct or error or redirect or file download link when i write these lines import urllib2 response = urllib2.urlopen('http://google.com') I get error as follows: Traceback (most recent call last): File "<pyshell#1>", line 1, in <module> response = urllib2.urlopen('http://google.com') File "C:\Python27\lib\urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 391, in open

how to send cookies inside post request

蓝咒 提交于 2019-12-13 04:31:56
问题 trying to send Post request with the cookies on my pc from get request #! /usr/bin/python import re #regex import urllib import urllib2 #get request x = urllib2.urlopen("http://www.example.com) #GET Request cookies=x.headers['set-cookie'] #to get the cookies from get request url = 'http://example' # to know the values type any password to know the cookies values = {"username" : "admin", "passwd" : password, "lang" : "" , "option" : "com_login", "task" : "login", "return" : "aW5kZXgucGhw" }

Prettify() error using python 2.7

。_饼干妹妹 提交于 2019-12-13 03:43:19
问题 Code: import urllib2 from bs4 import BeautifulSoup page1 = urllib2.urlopen("http://en.wikipedia.org/wiki/List_of_human_stampedes") soup = BeautifulSoup(page1) print(soup.prettify()) Error: Traceback (most recent call last): File "C:\Users\sony\Desktop\Trash\Crawler Try\try2.py", line 7, in <module> print(soup.prettify()) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 8775: ordinal not in range(128) [Finished in 2.4s with exit code 1] I can't seem to get the error

urllib2/pycurl in Django: Fetch XML, check HTTP status, check HTTPS connection

瘦欲@ 提交于 2019-12-13 01:26:43
问题 I need to make an API call (of sorts) in Django as a part of the custom authentication system we require. A username and password is sent to a specific URL over SSL (using GET for those parameters) and the response should be an HTTP 200 "OK" response with the body containing XML with the user's info. On an unsuccessful auth, it will return an HTTP 401 "Unauthorized" response. For security reasons, I need to check: The request was sent over an HTTPS connection The server certificate's public

HTTPError when using urllib2 read()

霸气de小男生 提交于 2019-12-13 00:29:14
问题 I'm trying to scrape a web page using urllib2 and BeautifulSoup. It was working fine and then when I put in an input() in a different part of my code to try and debug something, I got an HTTPError. When I tried running my program again, I got an HTTPError when trying calling read(). The error stack is below: [2013-07-17 16:47:07,415: ERROR/MainProcess] Task program.tasks.testTask[460db7cf-ff58-4a51-9c0f-749affc66abb] raised exception: IOError() 16:47:07 celeryd.1 | Traceback (most recent call

Why I got messy characters while opening url using urllib2?

被刻印的时光 ゝ 提交于 2019-12-13 00:28:06
问题 Here's my code, you guys can also test it out. I always get messed-up characters instead of page source. Header = {"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)"} Req = urllib2.Request("http://rlslog.net", None, Header) Response = urllib2.urlopen(Req) Html = Response.read() print Html[:1000] Normally Html should be page source, but it ended up to be tons of messed-up characters. Anybody knows why? BTW: I'm

Python爬虫页面获取基础:Python3安装与使用urllib2包之小坑

拥有回忆 提交于 2019-12-12 22:22:03
Python3.6.6或者说python3.x找不到urllib2语法问题修改之后,会报一个没有安装urllib2的包的错误。 通过pip install urllib2也会提示找不到包。 通过pip3 install urllib2也会提示找不到包。 这是因为builtwith依赖于urllib2包。但Pyhton2中的urllib2工具包,在Python3中分拆成了urllib.request和urllib.error两个包。就导致找不到包,同时也没办法安装。 所以需要install urllib.request和install urllib.error 两个包,然后将builtwith包中的import urllib2修改为import urllib.request 和import urllib.error。 推荐Python大牛在线分享技术 扣qun:855408893 领域:web开发,爬虫,数据分析,数据挖掘,人工智能 同时代码中的方法函数也需要修改,基本就是将urllib2.xxx修改为urllib.request.xxx。 案例 python2中 import urllib2 req =urllib2.Request('xxxx') data =urllib2.urlopen(req).read() print(data) python3中 import

Proxy seems to be ignored by Mechanize?

微笑、不失礼 提交于 2019-12-12 19:20:20
问题 I am using an http proxy and the Mechanize module. I initialize the mechanize object and set the proxy like so: self.br = mechanize.Browser() self.br.set_proxies({"http": proxyAddress}) #proxy address is like 1.1.1.1:8080 Then I open the site like so: response = self.br.open("http://google.com") My problem is that mechanize seems to be completely ignoring the proxy. If I debug and inspect the br object, under the proxy handler I can see my proxy settings. However, even if I give a bad proxy