问题
I'm trying to make a program that can fetch information about my attendance from my college website. In order to do that i wrote a script to login to the website ,which leads me to my dashboard ,and then go to the Attendence tab, get the href and attach it to url of the college website ,
the tag in the attendence class looked like this
<a href="../Student/StudentAttendanceView.aspx?SID=7JyKkZE1Eyx2EYNii7tOjQ==|yaE7DmfR9r8=" id="aAttandance">Attendance</a>
and when i clicked the attendance link the ,webpage had a url on the Address bar looked like this
http://erp.college_name.edu/Student/StudentAttendanceView.aspx?SID=7JyKkZE1Eyx2EYNii7tOjQ==|yaE7DmfR9r8= .
So, it was self explanatory that i was supposed to attach the href to the
'http://erp.college_name.edu' . Ok, i did i.e.
L = 'http://erp.college_name.edu' + str(I.findAll('li')[4].a.get('href').replace('.', ''))
but the problem is that when i fetch the href it is something else than that in the tag, it keeps on changing and when i get the link that is when i print L i got this.. which i assumed to get..
http://erp.college_name.edu/Student/StudentAttendanceViewaspx?SID=aDmK9cEFWwDqvsWw5ZzEOw==|oTeYVRfW1u8=
but the problem is that the href i'm getting in is different from the real url , and IT KEEPS ON CHANGING WHEN I RE-RUN THE PROGRAM ,the second time i got
http://erp.college_name.edu/Student/StudentAttendanceViewaspx?SID=WM/lbVRchyyBiLsDvkORJw==|MaP8NtvvrHE=
, why i'm getting this ,and moreover when i click on other links on my Dashboard page and again click on attendance tab , the href value in the url again changed in the address bar? ..
so, after that when i did,
opens = requests.get(L)
soup_2 = BeautifulSoup(opens.text, 'lxml')
print(L)
i got this..
C:\Users\HUNTER\AppData\Local\Programs\Python\Python35-32\python.exe
C:/Users/HUNTER/PycharmProjects/dictionary/erp_1.py
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html><head><title>The page cannot be found</title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<style type="text/css">
BODY { font: 8pt/12pt verdana }
H1 { font: 13pt/15pt verdana }
H2 { font: 8pt/12pt verdana }
A:link { color: red }
A:visited { color: maroon }
</style>
</head><body><table border="0" cellspacing="10" width="500"><tr><td>
<h1>The page cannot be found</h1>
The page you are looking for might have been removed, had its name
changed, or is temporarily unavailable.
<hr/>
<p>Please try the following:</p>
<ul>
<li>Make sure that the Web site address displayed in the address bar of
your browser is spelled and formatted correctly.</li>
<li>If you reached this page by clicking a link, contact
the Web site administrator to alert them that the link is incorrectly
formatted.
</li>
<li>Click the <a href="javascript:history.back(1)">Back</a> button to
try
another link.</li>
</ul>
<h2>HTTP Error 404 - File or directory not found.<br/>Internet
Information
Services (IIS)</h2>
<hr/>
<p>Technical Information (for support personnel)</p>
<ul>
<li>Go to <a href="http://go.microsoft.com/fwlink/?
linkid=8180">Microsoft
Product Support Services</a> and perform a title search for the words
<b>HTTP</b> and <b>404</b>.</li>
<li>Open <b>IIS Help</b>, which is accessible in IIS Manager (inetmgr),
and search for topics titled <b>Web Site Setup</b>, <b>Common
Administrative
Tasks</b>, and <b>About Custom Error Messages</b>.</li>
</ul>
</td></tr></table></body></html>
Process finished with exit code 0
UPDATE
I replaced the .replace('.', '') method with [2:] because the the replace function also removed . from .aspx in the href and the problem now changed to this
but still, how the value of href keep getting changed how can i fetch that page..
Any help?
来源:https://stackoverflow.com/questions/42858967/how-to-get-the-exact-real-value-of-href