问题
I want to scrape data from this website. After visiting, we need to select radio button criteria as 'TIN', then enter the TIN no. as '27680809621V' & click on submit button. I don't know how to do I'm stuck, as there is no name or value.
import requests
from bs4 import BeautifulSoup
s = requests.session()
req = s.get('https://mahagst.gov.in/en/know-your-taxpayer')
soup = BeautifulSoup(req.text,'lxml')
dictinfo = {i['name']:i.get('value','') for i in soup.select('input[name]')}
Someone please help me.
回答1:
The selection makes a GET request with selected Tin :) This is how you will get the json response back, and therefore, no need for BeautifulSoup.
from requests import Session
s = Session()
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36',
'Accept': 'application/json'
}
# Add headers
s.headers.update(headers)
BASE_URL = 'https://mahagst.gov.in/sap/opu/odata/sap/ZMSTD_KYT_SRV/TinDetailSet'
params = {
"$filter": "(Tin eq '27680809621V')"
}
r = s.get(BASE_URL, params=params)
data = r.json()
print(data)
This is how I found out the URL and params
And the data return is a beautiful json(dictionary) :)
The data is a dictionary and list. So you can use your Python skills to get the variables out. e.g. data['d']['results']
:) Hope this will help you.
回答2:
You can probably get the content you need through the same URL used by the website, i.e. https://mahagst.gov.in/sap/opu/odata/sap/ZMSTD_KYT_SRV/TinDetailSet?$filter=(Tin eq '27680809621V')
, by replacing the TIN number
.
Alternatively, you could use Selenium to check the radio button, fill the input, and get the data.
来源:https://stackoverflow.com/questions/56609913/how-to-scrape-data-bypassing-radio-button-using-request-in-python-3