问题
I am trying to write a webscraping program in python. Howevere, the pages I want to scrape are behind a login. I have an account and have been trying to follow the help posted here . I think I have done everything right, but I cannot get past the login. My code is posted below:
#!/usr/bin/env python
import requests, sys, lxml.html
#logging in
s = requests.Session()
login_url = 'https://login.fidelity.com/ftgw/Fas/Fidelity/RtlCust/Login/'
payload = {
'ssn' : 'USERNAME',
'pin' : 'PASSWORD'
}
s.post(login_url, data=payload, headers=dict(referer='https://login.fidelity.com'))
#page to scrape
response = s.get('https://fixedincome.fidelity.com/ftgw/fi/FIBondDetails?requestType=&displayFormat=TABLE&cusip=30382LDK1&ordersystem=TORD&preferenceName=')
print response.content #redirected to the login page
回答1:
You are missing a few things.
The loginurl is
login_url = 'https://login.fidelity.com/ftgw/Fas/Fidelity/RtlCust/Login/Response/dj.chf.ra'
And you need to pass these two additional params in the post
'DEVICE_PRINT' : 'version%3D3.4.2.0_1%26pm_fpua%3Dmozilla%2F5.0+(x11%3B+linux+x86_64%3B+rv%3A41.0)+gecko%2F20100101+firefox%2F41.0%7C5.0+(X11)%7CLinux+x86_64%', 'SavedIdInd' : 'N',
And its SSN and PIN (upper case)
I tried this url after that and it works for me.
response = s.get('https://oltx.fidelity.com/ftgw/fbc/oftop/portfolio')
print response.content
来源:https://stackoverflow.com/questions/45084888/how-to-use-the-requests-python-module-to-login-to-fidelity-com