urllib2 | 易学教程

How to send a urllib2 request with added white spaces

阅读更多关于 How to send a urllib2 request with added white spaces

问题 I am trying to send a request to open web page url that uses white spaces so that I can download a file from the page. In a normal browser i.e chrome when you enter the url into the address bar the file is automatically generated and you are asked to download it. Instead of having to load a web browser every time I want a set of logs I am trying to create a python script that I can run that will do all the hard work for me. Example: url = http (ip-address)/supportlog.xml/getlogs&name=0335008

Making post request using urllib

阅读更多关于 Making post request using urllib

问题 I am trying to make request on API provider curl "https://api.infermedica.com/dev/parse" \ -X "POST" \ -H "App_Id: 4c177c" -H "App_Key: 6852599182ba85d70066986ca2b3" \ -H "Content-Type: application/json" \ -d '{"text": "i feel smoach pain but no couoghing today"}' This curl request gives response. But same request when I try to make in code self.headers = { "App_Id": "4c177c", "App_Key": "6852599182ba85d70066986ca2b3", "Content-Type": "application/json", "User-Agent": "M$ self.url = "https:/

Web scraping using Python

阅读更多关于 Web scraping using Python

问题 I am trying to scrape the website http://www.nseindia.com using urllib2 and BeautifulSoup. Unfortunately, I keep getting 403 Forbidden when I try to access the page through Python. I thought it was a user agent issue, but changing that did not help. Then I thought it may have something to do with cookies, but apparently loading the page through links with cookies turned off works fine. What may be blocking requests through urllib? 回答1: http://www.nseindia.com/ seems to require an Accept

Automate downloading images off Google

阅读更多关于 Automate downloading images off Google

问题 I'm very new to Python and I'm trying to create a tool that automates downloading images off Google. So far, I have the following code: import urllib def google_image(x): search = x.split() search = '%20'.join(map(str, search)) url = 'http://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=%s&safe=off' % But I'm not sure where to continue or if I'm even on the right track. Can someone please help? 回答1: see scrapy documentation for image pipeline ITEM_PIPELINES = {'scrapy.contrib

Python urlparse: small issue

阅读更多关于 Python urlparse: small issue

问题 I'm making an app that parses html and gets images from it. Parsing is easy using Beautiful Soup and downloading of the html and the images works too with urllib2. I do have a problem with urlparse to make absolute paths out of relative ones. The problem is best explained with an example: >>> import urlparse >>> urlparse.urljoin("http://www.example.com/", "../test.png") 'http://www.example.com/../test.png' As you can see, urlparse doesn't take away the ../ away. This gives a problem when I

urllib2 returns a different page the browser does?

阅读更多关于 urllib2 returns a different page the browser does?

问题 I'm trying to scrape a page (my router's admin page) but the device seems to be serving a different page to urllib2 than to my browser. has anyone found this before? How can I get around it? this the code I'm using: >>> from BeautifulSoup import BeautifulSoup >>> import urllib2 >>> page = urllib2.urlopen("http://192.168.1.254/index.cgi?active_page=9133&active_page_str=page_bt_home&req_mode=0&mimic_button_field=btn_tab_goto:+9133..&request_id=36590071&button_value=9133") >>> soup =

regex not working in bs4

阅读更多关于 regex not working in bs4

问题 I am trying to extract some links from a specific filehoster on watchseriesfree.to website. In the following case I want rapidvideo links, so I use regex to filter out those tags with text containing rapidvideo import re import urllib2 from bs4 import BeautifulSoup def gethtml(link): req = urllib2.Request(link, headers={'User-Agent': "Magic Browser"}) con = urllib2.urlopen(req) html = con.read() return html def findLatest(): url = "https://watchseriesfree.to/serie/Madam-Secretary" head =

Python - urllib2 timeout

阅读更多关于 Python - urllib2 timeout

问题 I got something below is snippet of my code opener = urllib2.build_opener(redirect_handler.MyHTTPRedirectHandler()) opener.addheaders = [('Accept-encoding', 'gzip')] fetch_timeout = 12 self.response = opener.open(url, timeout=fetch_timeout) however, it code still waits 60~ seconds before timing out... Any clues? 回答1: At a guess you probably need to set the socket timeout import socket default_timeout = 12 socket.setdefaulttimeout(default_timeout) 回答2: Which version are you using. It was added

gaierror: [Errno -2] Name or service not known

阅读更多关于 gaierror: [Errno -2] Name or service not known

问题 def make_req(data, url, method='POST') params = urllib.urlencode(data) headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain", } conn = httplib.HTTPSConnection(url) conn.request(method, url, params, headers) response = conn.getresponse() response_data = response.read() conn.close() But it is throwing: in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): gaierror: [Errno -2] Name or service not known What is the reason ? What is this error?

Does httplib2 support http proxy at all? Socks proxy works but not http

阅读更多关于 Does httplib2 support http proxy at all? Socks proxy works but not http

问题 Here is my code. I cannot get any http proxy to work. Socks proxy (socks4/5) works fine though. Any ideas why? urllib2 works fine with proxies though. I am confused. Thanks.. Code : 1 import socks 2 import httplib2 3 import BeautifulSoup 4 5 httplib2.debuglevel=4 6 7 http = httplib2.Http(proxy_info = httplib2.ProxyInfo(3, '213.30.160.160', 80)) 8 9 main_url = 'http://cuil.com' 10 11 response, content = http.request(main_url, 'GET') 12 13 #html_content = BeautifulSoup(content) 14 15 print