urllib

Download pdf using urllib?

↘锁芯ラ 提交于 2019-11-26 16:08:15
问题 I am trying to download a pdf file from a website using urllib. This is what i got so far: import urllib def download_file(download_url): web_file = urllib.urlopen(download_url) local_file = open('some_file.pdf', 'w') local_file.write(web_file.read()) web_file.close() local_file.close() if __name__ == 'main': download_file('http://www.example.com/some_file.pdf') When i run this code, all I get is an empty pdf file. What am I doing wrong? 回答1: Here is an example that works: import urllib2 def

爬虫 urllib

泪湿孤枕 提交于 2019-11-26 16:05:34
内置http请求库 模块 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模块 urllib.robotparser robots.txt解析模块 来源: https://www.cnblogs.com/huay/p/11325639.html

How to retrieve the values of dynamic html content using Python

北城以北 提交于 2019-11-26 15:25:44
I'm using Python 3 and I'm trying to retrieve data from a website. However, this data is dynamically loaded and the code I have right now doesn't work: url = eveCentralBaseURL + str(mineral) print("URL : %s" % url); response = request.urlopen(url) data = str(response.read(10000)) data = data.replace("\\n", "\n") print(data) Where I'm trying to find a particular value, I'm finding a template instead e.g."{{formatPrice median}}" instead of "4.48". How can I make it so that I can retrieve the value instead of the placeholder text? Edit: This is the specific page I'm trying to extract information

catch specific HTTP error in python

陌路散爱 提交于 2019-11-26 15:25:31
问题 I want to catch a specific http error and not any one of the entire family.. what I was trying to do is -- import urllib2 try: urllib2.urlopen("some url") except urllib2.HTTPError: <whatever> but what I end up is catching any kind of http error, but I want to catch only if the specified webpage doesn't exist!! probably that's HTTP error 404..but I don't know how to specify that catch only error 404 and let the system run the default handler for other events..ny suggestions?? 回答1: Just catch

Python URLLib / URLLib2 POST

点点圈 提交于 2019-11-26 15:18:52
I'm trying to create a super-simplistic Virtual In / Out Board using wx/Python. I've got the following code in place for one of my requests to the server where I'll be storing the data: data = urllib.urlencode({'q': 'Status'}) u = urllib2.urlopen('http://myserver/inout-tracker', data) for line in u.readlines(): print line Nothing special going on there. The problem I'm having is that, based on how I read the docs, this should perform a Post Request because I've provided the data parameter and that's not happening. I have this code in the index for that url: if (!isset($_POST['q'])) { die ('No

Django: add image in an ImageField from image url

雨燕双飞 提交于 2019-11-26 14:55:57
please excuse me for my ugly english ;-) Imagine this very simple model : class Photo(models.Model): image = models.ImageField('Label', upload_to='path/') I would like to create a Photo from an image URL (i.e., not by hand in the django admin site). I think that I need to do something like this : from myapp.models import Photo import urllib img_url = 'http://www.site.com/image.jpg' img = urllib.urlopen(img_url) # Here I need to retrieve the image (as the same way that if I put it in an input from admin site) photo = Photo.objects.create(image=image) I hope that I've well explained the problem,

Python 3 urllib produces TypeError: POST data should be bytes or an iterable of bytes. It cannot be of type str

倾然丶 夕夏残阳落幕 提交于 2019-11-26 14:26:29
问题 I am trying to convert working Python 2.7 code into Python 3 code and I am receiving a type error from the urllib request module. I used the inbuilt 2to3 Python tool to convert the below working urllib and urllib2 Python 2.7 code: import urllib2 import urllib url = "https://www.customdomain.com" d = dict(parameter1="value1", parameter2="value2") req = urllib2.Request(url, data=urllib.urlencode(d)) f = urllib2.urlopen(req) resp = f.read() The output from the 2to3 module was the below Python 3

download image from url using python urllib but receiving HTTP Error 403: Forbidden

邮差的信 提交于 2019-11-26 14:16:52
问题 I want to download image file from a url using python module "urllib.request", which works for some website (e.g. mangastream.com), but does not work for another (mangadoom.co) receiving error "HTTP Error 403: Forbidden". What could be the problem for the latter case and how to fix it? I am using python3.4 on OSX. import urllib.request # does not work img_url = 'http://mangadoom.co/wp-content/manga/5170/886/005.png' img_filename = 'my_img.png' urllib.request.urlretrieve(img_url, img_filename)

AttributeError: &#39;module&#39; object has no attribute &#39;urlopen&#39;

蹲街弑〆低调 提交于 2019-11-26 14:16:40
I'm trying to use Python to download the HTML source code of a website but I'm receiving this error. Traceback (most recent call last): File "C:\Users\Sergio.Tapia\Documents\NetBeansProjects\DICParser\src\WebDownload.py", line 3, in file = urllib.urlopen(" http://www.python.org ") AttributeError: 'module' object has no attribute 'urlopen' I'm following the guide here: http://www.boddie.org.uk/python/HTML.html import urllib file = urllib.urlopen("http://www.python.org") s = file.read() f.close() #I'm guessing this would output the html source code? print(s) I'm using Python 3, thanks for the

Urllib库

旧巷老猫 提交于 2019-11-26 14:09:49
python 之 Urllib库的基本使用 官方文档 https://docs.python.org/3/library/urllib.html 什么是Urllib Urllib是python内置的HTTP请求库 包括以下模块 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模块 urllib.robotparser robots.txt解析模块 urlopen 关于urllib.request.urlopen参数的介绍: urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None) url参数的使用 先写一个简单的例子: import urllib.request response = urllib.request.urlopen('http://www.baidu.com') print(response.read().decode('utf-8')) urlopen一般常用的有三个参数,它的参数如下: urllib.requeset.urlopen(url,data,timeout) response.read()可以获取到网页的内容,如果没有read(