Logging in to a web site with Python (urllib,urllib2,cookielib): How does one find necessary information for submission?

╄→尐↘猪︶ㄣ 提交于 2020-01-04 13:47:19

问题


Preface: I understand that there are many responses for similar questions such as this on stack overflow. However, I haven't found anything relating to aspx log ins, nor an exact case such as this.

Problem: I need to determine what information is necessary in order to log in to https://cableone.net/login.aspx in order to scrape information from there.

Progress: Thus far I have found input fields in the source of login.aspx and have scrapped together a script in python with urllib,urllib2,and cookielib. I ignored anythig that had a blank value in my script.

<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE"value="/wEPDwUIMzc1NzEwOTZkZFAEfkjXC+VNsqYoayGxa5/q4srT" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWBAK6lKDUCwLVx7ufCQL/+N3OBwLFgNGYD6KeUd6uNDBwc5zcR0u4hqrwv1fM" />
<input name="ctl00$plhMain$txtUserName" type="text" id="ctl00_plhMain_txtUserName" />
<input name="ctl00$plhMain$txtPassword" type="password" id="ctl00_plhMain_txtPassword" />
<input type="submit" name="ctl00$plhMain$btnLogin" value="Login" id="ctl00_plhMain_btnLogin" />

I then utilized the above input values with python and urllib in the following.

import urllib, urllib2, cookielib
from cookielib import CookieJar


url = 'https://myaccount.cableone.net/Login.aspx'

cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
cookies = cookielib.CookieJar()

#determine what I need to change with these values 
formValues = {
    "__VIEWSTATE":"/wEPDwUIMzc1NzEwOTZkZFAEfkjXC+VNsqYoayGxa5/q4srT",
    "__EVENTVALIDATION":"/wEWBAK6lKDUCwLVx7ufCQL/+N3OBwLFgNGYD6KeUd6uNDBwc5zcR0u4hqrwv1fM",
    "ctl00$plhMain$txtUserName":"myAccount",
    "ctl00$plhMain$txtPassword":"myPassword"
    }

data = urllib.urlencode(formValues)

response = opener.open("https://myaccount.cableone.net/Login.aspx",data)
thePage = response.read()
httpheaders = response.info()
print thePage 

回答1:


The approach you outlined will be difficult if the form is dynamic in any way. A more universal way is to install Google Chrome Canary which has good developer tools, click "inspect page", then go to "Network" tab, and mark "Preserve log". (You may need the Canary version, because the regular one doesn't catch some of the data if I'm not mistaken)

With all this open, click "login", and you'll see all the requests and headers and POST data. This will give you all the POST data that is sent to the server.

Now, you can test the data in your script, and remove it one by one. Another option for testing the requests is to use Advanced REST Client, by the way.



来源:https://stackoverflow.com/questions/15887345/logging-in-to-a-web-site-with-python-urllib-urllib2-cookielib-how-does-one-fi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!