urllib2

Scraping Google Images using Selenium in Python

风格不统一 提交于 2021-02-11 08:44:29
问题 Now, I have been trying to scrape google images using the following code : from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys import os import time import requests import re import urllib2 import re from threading import Thread import json #Assuming I have a folder named Pictures1, the images are downloaded there. def threaded_func(url,i): raw_img = urllib2.urlopen(url).read() cntr = len([i for i in os.listdir("Pictures1"

Getting actual facebook and twitter image urls using python

浪尽此生 提交于 2021-02-10 05:27:13
问题 I want to write a python code that downloads 'main' image from urls that contain images. I have urls like these in my data (text files) http://t.co/fd9F0Gp1P1 points to an fb image http://t.co/0Ldy6j26fb points to twitter image but their expanded urls don't result in .jpg,.png images. Instead they direct us to a page that contains the desired image. How do I download images from these urls? 回答1: Here you will find an example of how I downloaded the plane image from the facebook page, you can

Getting actual facebook and twitter image urls using python

£可爱£侵袭症+ 提交于 2021-02-10 05:27:06
问题 I want to write a python code that downloads 'main' image from urls that contain images. I have urls like these in my data (text files) http://t.co/fd9F0Gp1P1 points to an fb image http://t.co/0Ldy6j26fb points to twitter image but their expanded urls don't result in .jpg,.png images. Instead they direct us to a page that contains the desired image. How do I download images from these urls? 回答1: Here you will find an example of how I downloaded the plane image from the facebook page, you can

urllib2 error no host given

会有一股神秘感。 提交于 2021-02-08 04:37:25
问题 EDIT:(SOLVED) When I am reading the values in from my file a newline char is getting added onto the end.(\n) this is splitting my request string at that point. I think it's to do with how I saved the values to the file in the first place. Many thanks. I have I have the following code: results = 'http://www.myurl.com/'+str(mystring) print str(results) request = urllib2.Request(results) request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)') opener = urllib2.build

How to send POST request with no data

北慕城南 提交于 2021-01-29 18:01:21
问题 Is it possible using urllib or urllib2 to send no data with a POST request? Sounds odd, but the API I am using sends blank data in the POST request. I've tried the following, but it seems to be issuing a GET request because of no POST data. url = 'https://site.com/registerclaim?cid=' + int(cid) values = {} headers = { 'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36', 'X-CRFS-Token' : csrfToken, 'X

Python: open a URL with accent

≡放荡痞女 提交于 2021-01-29 11:04:50
问题 In Python 2.7, I want to open a URL which contains accents ( the link itself , not the page to which it's pointing). If I use the following: #!/usr/bin/env Python # -*- coding: utf-8 -*- import urllib2 test = "https://www.notifymydevice.com/push?ApiKey=K6HGFJJCCQE04G29OHSRBIXI&PushTitle=Les%20accents%20:%20éèçà&PushText=Messages%20éèçà&" urllib2.urlopen(test) My accents are converted to gibberish (Ã, ¨, ©, etc rather than the éèà I expect). I've searched for that kind of issue and so I tried

URL component % and \x

孤者浪人 提交于 2021-01-28 11:52:08
问题 I have a doubt. st = "b%C3%BCrokommunikation" urllib2.unquote(st) OUTPUT: 'b\xc3\xbcrokommunikation' But, if I print it: print urllib2.unquote(st) OUTPUT: bürokommunikation Why is the difference? I have to write bürokommunikation instead of 'b\xc3\xbcrokommunikation' into a file. My problem is: I have lots of data with such values extracted from URLs. I have to store them as eg. bürokommunikation into a text file. 回答1: When you print the string, your terminal emulator recognizes the unicode

How to handle 307 redirection using urllib2 from http to https

こ雲淡風輕ζ 提交于 2021-01-28 05:19:34
问题 try: client_result = urllib2.urlopen( "http://" + data['src_ip'] + "/iperf/iperf_main.py.fcgi", urllib.urlencode(data), 3 ) client_response_text = client_result.read() return('200 OK', response_headers, ['success', server_response_text, client_response_text]) except urllib2.HTTPError as e: return(str(e.code) + ' Error' , response_headers, [e.read()]) The code snippet above makes a HTTP POST request and gets a 307 redirect to the same address, except HTTPS version. However, the code control

Parsing HTML page containing & using Python

て烟熏妆下的殇ゞ 提交于 2021-01-27 16:13:12
问题 I am trying to parse HTML page in python using urllib2 and ElementTree and I am facing trouble parsing the HTML. Webpage contains "&" within quoted string but ElementTree throws parseError for lines containing & Script: import urllib2 url = 'http://eciresults.nic.in/ConstituencywiseU011.htm' req = urllib2.Request(url, headers={'Content-type': 'text/xml'}) r = urllib2.urlopen(req).read() import xml.etree.ElementTree as ET htmlpage=ET.fromstring(r) This throws following error in Python 2.7

Parsing HTML page containing & using Python

|▌冷眼眸甩不掉的悲伤 提交于 2021-01-27 16:10:47
问题 I am trying to parse HTML page in python using urllib2 and ElementTree and I am facing trouble parsing the HTML. Webpage contains "&" within quoted string but ElementTree throws parseError for lines containing & Script: import urllib2 url = 'http://eciresults.nic.in/ConstituencywiseU011.htm' req = urllib2.Request(url, headers={'Content-type': 'text/xml'}) r = urllib2.urlopen(req).read() import xml.etree.ElementTree as ET htmlpage=ET.fromstring(r) This throws following error in Python 2.7