urllib2 | 易学教程

Scraping Google Images using Selenium in Python

阅读更多关于 Scraping Google Images using Selenium in Python

问题 Now, I have been trying to scrape google images using the following code : from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.common.keys import Keys import os import time import requests import re import urllib2 import re from threading import Thread import json #Assuming I have a folder named Pictures1, the images are downloaded there. def threaded_func(url,i): raw_img = urllib2.urlopen(url).read() cntr = len([i for i in os.listdir("Pictures1"

Getting actual facebook and twitter image urls using python

阅读更多关于 Getting actual facebook and twitter image urls using python

问题 I want to write a python code that downloads 'main' image from urls that contain images. I have urls like these in my data (text files) http://t.co/fd9F0Gp1P1 points to an fb image http://t.co/0Ldy6j26fb points to twitter image but their expanded urls don't result in .jpg,.png images. Instead they direct us to a page that contains the desired image. How do I download images from these urls? 回答1: Here you will find an example of how I downloaded the plane image from the facebook page, you can

Getting actual facebook and twitter image urls using python

阅读更多关于 Getting actual facebook and twitter image urls using python

urllib2 error no host given

阅读更多关于 urllib2 error no host given

问题 EDIT:(SOLVED) When I am reading the values in from my file a newline char is getting added onto the end.(\n) this is splitting my request string at that point. I think it's to do with how I saved the values to the file in the first place. Many thanks. I have I have the following code: results = 'http://www.myurl.com/'+str(mystring) print str(results) request = urllib2.Request(results) request.add_header('User-Agent','Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)') opener = urllib2.build

How to send POST request with no data

阅读更多关于 How to send POST request with no data

问题 Is it possible using urllib or urllib2 to send no data with a POST request? Sounds odd, but the API I am using sends blank data in the POST request. I've tried the following, but it seems to be issuing a GET request because of no POST data. url = 'https://site.com/registerclaim?cid=' + int(cid) values = {} headers = { 'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36', 'X-CRFS-Token' : csrfToken, 'X

Python: open a URL with accent

阅读更多关于 Python: open a URL with accent

问题 In Python 2.7, I want to open a URL which contains accents ( the link itself , not the page to which it's pointing). If I use the following: #!/usr/bin/env Python # -*- coding: utf-8 -*- import urllib2 test = "https://www.notifymydevice.com/push?ApiKey=K6HGFJJCCQE04G29OHSRBIXI&PushTitle=Les%20accents%20:%20éèçà&PushText=Messages%20éèçà&" urllib2.urlopen(test) My accents are converted to gibberish (Ã, ¨, ©, etc rather than the éèà I expect). I've searched for that kind of issue and so I tried

URL component % and \x

阅读更多关于 URL component % and \x

问题 I have a doubt. st = "b%C3%BCrokommunikation" urllib2.unquote(st) OUTPUT: 'b\xc3\xbcrokommunikation' But, if I print it: print urllib2.unquote(st) OUTPUT: bürokommunikation Why is the difference? I have to write bürokommunikation instead of 'b\xc3\xbcrokommunikation' into a file. My problem is: I have lots of data with such values extracted from URLs. I have to store them as eg. bürokommunikation into a text file. 回答1: When you print the string, your terminal emulator recognizes the unicode

How to handle 307 redirection using urllib2 from http to https

阅读更多关于 How to handle 307 redirection using urllib2 from http to https

问题 try: client_result = urllib2.urlopen( "http://" + data['src_ip'] + "/iperf/iperf_main.py.fcgi", urllib.urlencode(data), 3 ) client_response_text = client_result.read() return('200 OK', response_headers, ['success', server_response_text, client_response_text]) except urllib2.HTTPError as e: return(str(e.code) + ' Error' , response_headers, [e.read()]) The code snippet above makes a HTTP POST request and gets a 307 redirect to the same address, except HTTPS version. However, the code control

Parsing HTML page containing & using Python

阅读更多关于 Parsing HTML page containing & using Python

问题 I am trying to parse HTML page in python using urllib2 and ElementTree and I am facing trouble parsing the HTML. Webpage contains "&" within quoted string but ElementTree throws parseError for lines containing & Script: import urllib2 url = 'http://eciresults.nic.in/ConstituencywiseU011.htm' req = urllib2.Request(url, headers={'Content-type': 'text/xml'}) r = urllib2.urlopen(req).read() import xml.etree.ElementTree as ET htmlpage=ET.fromstring(r) This throws following error in Python 2.7

Parsing HTML page containing & using Python

阅读更多关于 Parsing HTML page containing & using Python