python-requests

How to scrape data bypassing radio button using request in Python 3?

依然范特西╮ 提交于 2021-01-29 05:47:05
问题 I want to scrape data from this website. After visiting, we need to select radio button criteria as 'TIN', then enter the TIN no. as '27680809621V' & click on submit button. I don't know how to do I'm stuck, as there is no name or value. import requests from bs4 import BeautifulSoup s = requests.session() req = s.get('https://mahagst.gov.in/en/know-your-taxpayer') soup = BeautifulSoup(req.text,'lxml') dictinfo = {i['name']:i.get('value','') for i in soup.select('input[name]')} Someone please

HTTPSConnectionPool SSL Error certificate verify failed

馋奶兔 提交于 2021-01-29 05:40:32
问题 I'm working on web scraping some particular websites and therefor I use the python 3 requests package and beautifulsoup. While processing a test over some websites I got this error : requests.exceptions.SSLError: HTTPSConnectionPool(host='autoglassbodyrepair.lawshield.co.uk', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),)) import requests as rq import bs4

Creating URLs in a loop

﹥>﹥吖頭↗ 提交于 2021-01-29 05:22:58
问题 I am trying to create a list of URLs using a for loop. It prints all the correct URLs, but is not saving them in a list. Ultimately I want to download multiple files using urlretrieve. for i, j in zip(range(0, 17), range(1, 18)): if i < 8 or j < 10: url = "https://Here is a URL/P200{}".format(i) + "-0{}".format(j) + ".xls" print(url) if i == 9 and j == 10: url = "https://Here is a URL/P200{}".format(i) + "-{}".format(j) + ".xls" print(url) if i > 9: if i > 9 or j < 8: url = "https://Here is a

how to input data via a post request using requests in python

青春壹個敷衍的年華 提交于 2021-01-29 03:57:34
问题 What I'm trying to do : I am trying to automize the process of downloading YouTube videos using a particular website. The idea is that the website yields me the source of the video I had input and I download it. The website: https://en.savefrom.net/1-youtube-video-downloader-4/. Here is the input text field defined in the HTML : <input type="text" name="sf_url" value="" autofocus="" placeholder="Paste your video link here" onfocus="if(this.value && this.select){this.select()}" id="sf_url"> I

Scraping part of a Wikipedia Infobox

百般思念 提交于 2021-01-29 03:49:42
问题 I'm using Python 2.7, requests & BeautifulSoup to scrape approximately 50 Wikipedia pages. I've created a column in my dataframe that has partial URL's that relate to the name of each song (these have been verified previously and I'm getting response code 200 when testing against all of them). My code loops through and appends these individual URL's to the main Wikipedia URL. I've been able to get the heading of the page or other data, but what I really want is the Length of the song only

Scraping part of a Wikipedia Infobox

不羁的心 提交于 2021-01-29 03:49:40
问题 I'm using Python 2.7, requests & BeautifulSoup to scrape approximately 50 Wikipedia pages. I've created a column in my dataframe that has partial URL's that relate to the name of each song (these have been verified previously and I'm getting response code 200 when testing against all of them). My code loops through and appends these individual URL's to the main Wikipedia URL. I've been able to get the heading of the page or other data, but what I really want is the Length of the song only

How to find out the correct encoding when using beautifulsoup?

懵懂的女人 提交于 2021-01-28 20:15:48
问题 In python3 and beautifulsoup4 I want to get information from a website, after making the requests. I did so: import requests from bs4 import BeautifulSoup req = requests.get('https://sisgvarmazenamento.blob.core.windows.net/prd/PublicacaoPortal/Arquivos/201901.htm').text soup = BeautifulSoup(req,'lxml') soup.find("h1").text '\r\n CÃ\x82MARA MUNICIPAL DE SÃ\x83O PAULO' I do not know what the encoding is, but it's a site with Brazilian Portuguese, so it should be utf-8 or latin1 Please, is

Python requests with proxy results in SSLError WRONG_VERSION_NUMBER

心已入冬 提交于 2021-01-28 14:16:39
问题 I can't use the different proxy in Python. My code: import requests proxies = { "https":'https://154.16.202.22:3128', "http":'http://154.16.202.22:3128' } r=requests.get('https://httpbin.org/ip', proxies=proxies) print(r.json()) The error I'm getting is: . . . raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='httpbin.org', port=443): Max retries exceeded with url: /ip (Caused by SSLError(SSLError(1, '[SSL: WRONG_VERSION

(Beautiful Soup) Get data inside a button tag

早过忘川 提交于 2021-01-28 14:11:45
问题 I try to scrape out an ImageId inside a button tag, want to have the result: "25511e1fd64e99acd991a22d6c2d6b6c". When I try: drawing_url = drawing_url.find_all('button', class_='inspectBut')['onclick'] it doesn't work. Giving an error- TypeError: list indices must be integers or slices, not str Input = for article in soup.find_all('div', class_='dojoxGridRow'): drawing_url = article.find('td', class_='dojoxGridCell', idx='3') drawing_url = drawing_url.find_all('button', class_='inspectBut')

Converting curl with --form to python requests

拜拜、爱过 提交于 2021-01-28 13:41:39
问题 I have a curl request like this: curl -X POST http://mdom-n-plus-1.nonprod.appsight.us:8081/mesmerdom/v1/getByScreen -F "data={\"screen\":{\"screen-id\":\"57675\"}}" I am trying to convert it to python by using something like this: import requests import json url = "http://mdom-n-plus-1.nonprod.appsight.us:8081/mesmerdom/v1/getByScreen" payload = {"data": json.dumps({"screen":["screen-id", "57675"]})} req = requests.post(url, data=payload) print (req.text) but I get the following error: io