urllib2

How to send a urllib2 request with added white spaces

橙三吉。 提交于 2019-12-11 02:34:40
问题 I am trying to send a request to open web page url that uses white spaces so that I can download a file from the page. In a normal browser i.e chrome when you enter the url into the address bar the file is automatically generated and you are asked to download it. Instead of having to load a web browser every time I want a set of logs I am trying to create a python script that I can run that will do all the hard work for me. Example: url = http (ip-address)/supportlog.xml/getlogs&name=0335008

Making post request using urllib

会有一股神秘感。 提交于 2019-12-11 01:40:33
问题 I am trying to make request on API provider curl "https://api.infermedica.com/dev/parse" \ -X "POST" \ -H "App_Id: 4c177c" -H "App_Key: 6852599182ba85d70066986ca2b3" \ -H "Content-Type: application/json" \ -d '{"text": "i feel smoach pain but no couoghing today"}' This curl request gives response. But same request when I try to make in code self.headers = { "App_Id": "4c177c", "App_Key": "6852599182ba85d70066986ca2b3", "Content-Type": "application/json", "User-Agent": "M$ self.url = "https:/

Web scraping using Python

半世苍凉 提交于 2019-12-11 00:52:08
问题 I am trying to scrape the website http://www.nseindia.com using urllib2 and BeautifulSoup. Unfortunately, I keep getting 403 Forbidden when I try to access the page through Python. I thought it was a user agent issue, but changing that did not help. Then I thought it may have something to do with cookies, but apparently loading the page through links with cookies turned off works fine. What may be blocking requests through urllib? 回答1: http://www.nseindia.com/ seems to require an Accept

Automate downloading images off Google

梦想与她 提交于 2019-12-11 00:49:08
问题 I'm very new to Python and I'm trying to create a tool that automates downloading images off Google. So far, I have the following code: import urllib def google_image(x): search = x.split() search = '%20'.join(map(str, search)) url = 'http://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=%s&safe=off' % But I'm not sure where to continue or if I'm even on the right track. Can someone please help? 回答1: see scrapy documentation for image pipeline ITEM_PIPELINES = {'scrapy.contrib

Python urlparse: small issue

蓝咒 提交于 2019-12-10 23:58:48
问题 I'm making an app that parses html and gets images from it. Parsing is easy using Beautiful Soup and downloading of the html and the images works too with urllib2. I do have a problem with urlparse to make absolute paths out of relative ones. The problem is best explained with an example: >>> import urlparse >>> urlparse.urljoin("http://www.example.com/", "../test.png") 'http://www.example.com/../test.png' As you can see, urlparse doesn't take away the ../ away. This gives a problem when I

urllib2 returns a different page the browser does?

 ̄綄美尐妖づ 提交于 2019-12-10 23:35:17
问题 I'm trying to scrape a page (my router's admin page) but the device seems to be serving a different page to urllib2 than to my browser. has anyone found this before? How can I get around it? this the code I'm using: >>> from BeautifulSoup import BeautifulSoup >>> import urllib2 >>> page = urllib2.urlopen("http://192.168.1.254/index.cgi?active_page=9133&active_page_str=page_bt_home&req_mode=0&mimic_button_field=btn_tab_goto:+9133..&request_id=36590071&button_value=9133") >>> soup =

regex not working in bs4

余生颓废 提交于 2019-12-10 22:02:19
问题 I am trying to extract some links from a specific filehoster on watchseriesfree.to website. In the following case I want rapidvideo links, so I use regex to filter out those tags with text containing rapidvideo import re import urllib2 from bs4 import BeautifulSoup def gethtml(link): req = urllib2.Request(link, headers={'User-Agent': "Magic Browser"}) con = urllib2.urlopen(req) html = con.read() return html def findLatest(): url = "https://watchseriesfree.to/serie/Madam-Secretary" head =

Python - urllib2 timeout

[亡魂溺海] 提交于 2019-12-10 21:03:51
问题 I got something below is snippet of my code opener = urllib2.build_opener(redirect_handler.MyHTTPRedirectHandler()) opener.addheaders = [('Accept-encoding', 'gzip')] fetch_timeout = 12 self.response = opener.open(url, timeout=fetch_timeout) however, it code still waits 60~ seconds before timing out... Any clues? 回答1: At a guess you probably need to set the socket timeout import socket default_timeout = 12 socket.setdefaulttimeout(default_timeout) 回答2: Which version are you using. It was added

gaierror: [Errno -2] Name or service not known

流过昼夜 提交于 2019-12-10 20:41:19
问题 def make_req(data, url, method='POST') params = urllib.urlencode(data) headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain", } conn = httplib.HTTPSConnection(url) conn.request(method, url, params, headers) response = conn.getresponse() response_data = response.read() conn.close() But it is throwing: in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): gaierror: [Errno -2] Name or service not known What is the reason ? What is this error?

Does httplib2 support http proxy at all? Socks proxy works but not http

99封情书 提交于 2019-12-10 19:44:40
问题 Here is my code. I cannot get any http proxy to work. Socks proxy (socks4/5) works fine though. Any ideas why? urllib2 works fine with proxies though. I am confused. Thanks.. Code : 1 import socks 2 import httplib2 3 import BeautifulSoup 4 5 httplib2.debuglevel=4 6 7 http = httplib2.Http(proxy_info = httplib2.ProxyInfo(3, '213.30.160.160', 80)) 8 9 main_url = 'http://cuil.com' 10 11 response, content = http.request(main_url, 'GET') 12 13 #html_content = BeautifulSoup(content) 14 15 print