python-requests | 易学教程

How to scrape data bypassing radio button using request in Python 3?

阅读更多关于 How to scrape data bypassing radio button using request in Python 3?

问题 I want to scrape data from this website. After visiting, we need to select radio button criteria as 'TIN', then enter the TIN no. as '27680809621V' & click on submit button. I don't know how to do I'm stuck, as there is no name or value. import requests from bs4 import BeautifulSoup s = requests.session() req = s.get('https://mahagst.gov.in/en/know-your-taxpayer') soup = BeautifulSoup(req.text,'lxml') dictinfo = {i['name']:i.get('value','') for i in soup.select('input[name]')} Someone please

HTTPSConnectionPool SSL Error certificate verify failed

阅读更多关于 HTTPSConnectionPool SSL Error certificate verify failed

问题 I'm working on web scraping some particular websites and therefor I use the python 3 requests package and beautifulsoup. While processing a test over some websites I got this error : requests.exceptions.SSLError: HTTPSConnectionPool(host='autoglassbodyrepair.lawshield.co.uk', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),)) import requests as rq import bs4

Creating URLs in a loop

阅读更多关于 Creating URLs in a loop

问题 I am trying to create a list of URLs using a for loop. It prints all the correct URLs, but is not saving them in a list. Ultimately I want to download multiple files using urlretrieve. for i, j in zip(range(0, 17), range(1, 18)): if i < 8 or j < 10: url = "https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls" print(url) if i == 9 and j == 10: url = "https://Here is a URL/P200{}".format(i) + "-{}".format(j) + ".xls" print(url) if i > 9: if i > 9 or j < 8: url = "https://Here is a

how to input data via a post request using requests in python

阅读更多关于 how to input data via a post request using requests in python

问题 What I'm trying to do : I am trying to automize the process of downloading YouTube videos using a particular website. The idea is that the website yields me the source of the video I had input and I download it. The website: https://en.savefrom.net/1-youtube-video-downloader-4/. Here is the input text field defined in the HTML : <input type="text" name="sf_url" value="" autofocus="" placeholder="Paste your video link here" onfocus="if(this.value && this.select){this.select()}" id="sf_url"> I

Scraping part of a Wikipedia Infobox

阅读更多关于 Scraping part of a Wikipedia Infobox

问题 I'm using Python 2.7, requests & BeautifulSoup to scrape approximately 50 Wikipedia pages. I've created a column in my dataframe that has partial URL's that relate to the name of each song (these have been verified previously and I'm getting response code 200 when testing against all of them). My code loops through and appends these individual URL's to the main Wikipedia URL. I've been able to get the heading of the page or other data, but what I really want is the Length of the song only

Scraping part of a Wikipedia Infobox

阅读更多关于 Scraping part of a Wikipedia Infobox

How to find out the correct encoding when using beautifulsoup?

阅读更多关于 How to find out the correct encoding when using beautifulsoup?

问题 In python3 and beautifulsoup4 I want to get information from a website, after making the requests. I did so: import requests from bs4 import BeautifulSoup req = requests.get('https://sisgvarmazenamento.blob.core.windows.net/prd/PublicacaoPortal/Arquivos/201901.htm').text soup = BeautifulSoup(req,'lxml') soup.find("h1").text '\r\n CÃ\x82MARA MUNICIPAL DE SÃ\x83O PAULO' I do not know what the encoding is, but it's a site with Brazilian Portuguese, so it should be utf-8 or latin1 Please, is

Python requests with proxy results in SSLError WRONG_VERSION_NUMBER

阅读更多关于 Python requests with proxy results in SSLError WRONG_VERSION_NUMBER

问题 I can't use the different proxy in Python. My code: import requests proxies = { "https":'https://154.16.202.22:3128', "http":'http://154.16.202.22:3128' } r=requests.get('https://httpbin.org/ip', proxies=proxies) print(r.json()) The error I'm getting is: . . . raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='httpbin.org', port=443): Max retries exceeded with url: /ip (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION

(Beautiful Soup) Get data inside a button tag

阅读更多关于 (Beautiful Soup) Get data inside a button tag

问题 I try to scrape out an ImageId inside a button tag, want to have the result: "25511e1fd64e99acd991a22d6c2d6b6c". When I try: drawing_url = drawing_url.find_all('button', class_='inspectBut')['onclick'] it doesn't work. Giving an error- TypeError: list indices must be integers or slices, not str Input = for article in soup.find_all('div', class_='dojoxGridRow'): drawing_url = article.find('td', class_='dojoxGridCell', idx='3') drawing_url = drawing_url.find_all('button', class_='inspectBut')

Converting curl with --form to python requests

阅读更多关于 Converting curl with --form to python requests

问题 I have a curl request like this: curl -X POST http://mdom-n-plus-1.nonprod.appsight.us:8081/mesmerdom/v1/getByScreen -F "data={\"screen\":{\"screen-id\":\"57675\"}}" I am trying to convert it to python by using something like this: import requests import json url = "http://mdom-n-plus-1.nonprod.appsight.us:8081/mesmerdom/v1/getByScreen" payload = {"data": json.dumps({"screen":["screen-id", "57675"]})} req = requests.post(url, data=payload) print (req.text) but I get the following error: io