urllib2

Getting a file from an authenticated site (with python urllib, urllib2)

佐手、 提交于 2019-12-02 02:57:18
I'm trying to get a queried-excel file from a site. When I enter the direct link, it will lead to a login page and once I've entered my username and password, it will proceed to download the excel file automatically. I am trying to avoid installing additional module that's not part of the standard python (This script will be running on a "standardize machine" and it won't work if the module is not installed) I've tried the following but I see a "page login" information in the excel file itself :-| import urllib url = "myLink_queriedResult/result.xls" urllib.urlretrieve(url,"C:\\test.xls") SO..

Python: Urllib2 | [Errno 54] Connection reset by peer

╄→尐↘猪︶ㄣ 提交于 2019-12-02 01:16:26
I'm calling a list of urls from the same domain and returning a snip of their html for a few thousand domains but am getting this error about 1,000 rows or so in. Is there anything I can do to avoid this error? Does it make sense to create a wait step after every row? every few hundred rows? Is there a better way to get around this? File "/Users.../ap.py", line 144, in <module> simpleProg() File "/Users.../ap.py", line 21, in simpleProg() File "/Users.../ap.py", line 57, in first_step() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen

Python100天学习教程(Python学习视频_Python学习路线):Day01 初识Python

烂漫一生 提交于 2019-12-02 01:00:41
话说,520情人节啥的跟咱程序员有关系吗?都慌着干啥呢,不用学Python的吗? 哈哈哈,大家视情况而定哈,没对象的赶紧跟唐唐一起学Python100天学习教程吧! 今天起唐唐会跟大家出系统的Python100天专题课程和习题,100天从新手到大神,你也可以! Python100天学习教程(Python学习视频_Python学习路线):Day01 初识Python Python简介 Python的历史 1989年圣诞节:Guido von Rossum开始写Python语言的编译器。 1991年2月:第一个Python编译器(同时也是解释器)诞生,它是用C语言实现的(后面又出现了Java和C#实现的版本Jython和IronPython,以及PyPy、Brython、Pyston等其他实现),可以调用C语言的库函数。在最早的版本中,Python已经提供了对“类”,“函数”,“异常处理”等构造块的支持,同时提供了“列表”和“字典”等核心数据类型,同时支持以模块为基础的拓展系统。 1994年1月:Python 1.0正式发布。 2000年10月16日:Python 2.0发布,增加了实现完整的垃圾回收,提供了对Unicode的支持。与此同时,Python的整个开发过程更加透明,社区对开发进度的影响逐渐扩大,生态圈开始慢慢形成。 2008年12月3日:Python 3.0发布

Using PDFMiner (Python) with online pdf files. Encode the url?

风格不统一 提交于 2019-12-02 00:08:01
问题 I am wishing to extract the content of pdf files available online using PDFMiner . My code is based on the one available in the documentation used to extract the content of PDF files on the hard disk: # Open a PDF file. fp = open('mypdf.pdf', 'rb') # Create a PDF parser object associated with the file object. parser = PDFParser(fp) # Create a PDF document object that stores the document structure. document = PDFDocument(parser) That works quite well with some small changes. Now, I have tried

Multi threaded web scraper using urlretrieve on a cookie-enabled site

若如初见. 提交于 2019-12-01 23:44:26
I am trying to write my first Python script, and with lots of Googling, I think that I am just about done. However, I will need some help getting myself across the finish line. I need to write a script that logs onto a cookie-enabled site, scrape a bunch of links, and then spawn a few processes to download the files. I have the program running in single-threaded, so I know that the code works. But, when I tried to create a pool of download workers, I ran into a wall. #manager.py import Fetch # the module name where worker lives from multiprocessing import pool def FetchReports(links,Username

urllib2 HTTPPasswordMgr not working - Credentials not sent error

淺唱寂寞╮ 提交于 2019-12-01 23:34:35
The following python curl call has the following successful results: >>> import subprocess >>> args = [ 'curl', '-H', 'X-Requested-With: Demo', 'https://username:password@qualysapi.qualys.com/qps/rest/3.0/count/was/webapp' ] >>> xml_output = subprocess.check_output(args).decode('utf-8') % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 138 276 0 276 0 0 190 0 --:--:-- 0:00:01 --:--:-- 315 >>> xml_output u'<?xml version="1.0" encoding="UTF-8"?>\n<ServiceResponse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation=

Using PDFMiner (Python) with online pdf files. Encode the url?

江枫思渺然 提交于 2019-12-01 21:18:01
I am wishing to extract the content of pdf files available online using PDFMiner . My code is based on the one available in the documentation used to extract the content of PDF files on the hard disk: # Open a PDF file. fp = open('mypdf.pdf', 'rb') # Create a PDF parser object associated with the file object. parser = PDFParser(fp) # Create a PDF document object that stores the document structure. document = PDFDocument(parser) That works quite well with some small changes. Now, I have tried urllib2.openurl for online PDFs but that doesn't work. I get an error message : coercing to Unicode:

How to use urllib2 to get a webpage using SSLv3 encryption

廉价感情. 提交于 2019-12-01 18:44:29
I'm using python 2.7 and I'd like to get the contents of a webpage that requires sslv3. Currently when I try to access the page I get the error SSL23_GET_SERVER_HELLO, and some searching on the web lead me to the following solution which fixes things in Python 3 urllib.request.install_opener(urllib.request.build_opener(urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_TLSv1)))) How can I get the same effect in python 2.7, as I can't seem to find the equivalent of the context argument for the HTTPSHandler class. I realize this response is a few years too late, but I also ran into

scrape google resultstats with python [closed]

╄→尐↘猪︶ㄣ 提交于 2019-12-01 18:28:27
I would like to get the estimated results number from google for a keyword. Im using Python3.3 and try to accomplish this task with BeautifulSoup and urllib.request. This is my simple code so far def numResults(): try: page_google = '''http://www.google.de/#output=search&sclient=psy-ab&q=pokerbonus&oq=pokerbonus&gs_l=hp.3..0i10l2j0i10i30l2.16503.18949.0.20819.10.9.0.1.1.0.413.2110.2-6j1j1.8.0....0...1c.1.19.psy-ab.FEBvxrgi0KU&pbx=1&bav=on.2,or.r_qf.&bvm=bv.48705608,d.Yms&''' req_google = Request(page_google) req_google.add_header('User Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0)

Python urllib2 force IPv4

不打扰是莪最后的温柔 提交于 2019-12-01 17:44:07
I am running a script using python that uses urllib2 to grab data from a weather api and display it on screen. I have had the problem that when I query the server, I get a "no address associated with hostname" error. I can view the output of the api with a web browser, and I can download the file with wget, but I have to force IPv4 to get it to work. Is it possible to force IPv4 in urllib2 when using urllib2.urlopen? Not directly, no. So, what can you do? One possibility is to explicitly resolve the hostname to IPv4 yourself, and then use the IPv4 address instead of the name as the host. For