How to get the EXACT, REAL value of 'href'

时光毁灭记忆、已成空白 提交于 2020-01-06 16:18:09

问题


I'm trying to make a program that can fetch information about my attendance from my college website. In order to do that i wrote a script to login to the website ,which leads me to my dashboard ,and then go to the Attendence tab, get the href and attach it to url of the college website ,
the tag in the attendence class looked like this

<a href="../Student/StudentAttendanceView.aspx?SID=7JyKkZE1Eyx2EYNii7tOjQ==|yaE7DmfR9r8=" id="aAttandance">Attendance</a>

and when i clicked the attendance link the ,webpage had a url on the Address bar looked like this

http://erp.college_name.edu/Student/StudentAttendanceView.aspx?SID=7JyKkZE1Eyx2EYNii7tOjQ==|yaE7DmfR9r8= .

So, it was self explanatory that i was supposed to attach the href to the

'http://erp.college_name.edu' . Ok, i did i.e.

 L = 'http://erp.college_name.edu' + str(I.findAll('li')[4].a.get('href').replace('.', ''))

but the problem is that when i fetch the href it is something else than that in the tag, it keeps on changing and when i get the link that is when i print L i got this.. which i assumed to get..

http://erp.college_name.edu/Student/StudentAttendanceViewaspx?SID=aDmK9cEFWwDqvsWw5ZzEOw==|oTeYVRfW1u8=

but the problem is that the href i'm getting in is different from the real url , and IT KEEPS ON CHANGING WHEN I RE-RUN THE PROGRAM ,the second time i got

http://erp.college_name.edu/Student/StudentAttendanceViewaspx?SID=WM/lbVRchyyBiLsDvkORJw==|MaP8NtvvrHE=

, why i'm getting this ,and moreover when i click on other links on my Dashboard page and again click on attendance tab , the href value in the url again changed in the address bar? ..

so, after that when i did,

opens = requests.get(L)
soup_2 = BeautifulSoup(opens.text, 'lxml')
print(L)  

i got this..

    C:\Users\HUNTER\AppData\Local\Programs\Python\Python35-32\python.exe 
    C:/Users/HUNTER/PycharmProjects/dictionary/erp_1.py
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 
   "http://www.w3.org/TR/html4/strict.dtd">
  <html><head><title>The page cannot be found</title>
   <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
 <style type="text/css">
    BODY { font: 8pt/12pt verdana }
    H1 { font: 13pt/15pt verdana }
    H2 { font: 8pt/12pt verdana }
   A:link { color: red }
    A:visited { color: maroon }
 </style>
 </head><body><table border="0" cellspacing="10" width="500"><tr><td>
  <h1>The page cannot be found</h1>
  The page you are looking for might have been removed, had its name 
 changed, or is temporarily unavailable.
 <hr/>
 <p>Please try the following:</p>
 <ul>
  <li>Make sure that the Web site address displayed in the address bar of 
your browser is spelled and formatted correctly.</li>
  <li>If you reached this page by clicking a link, contact
    the Web site administrator to alert them that the link is incorrectly 
   formatted.
    </li>
    <li>Click the <a href="javascript:history.back(1)">Back</a> button to 
 try 
   another link.</li>
     </ul>
       <h2>HTTP Error 404 - File or directory not found.<br/>Internet 
    Information 
   Services (IIS)</h2>
<hr/>
 <p>Technical Information (for support personnel)</p>
 <ul>
     <li>Go to <a href="http://go.microsoft.com/fwlink/?
     linkid=8180">Microsoft 
       Product Support Services</a> and perform a title search for the words 
    <b>HTTP</b> and <b>404</b>.</li>
  <li>Open <b>IIS Help</b>, which is accessible in IIS Manager (inetmgr),
  and search for topics titled <b>Web Site Setup</b>, <b>Common 
   Administrative 
  Tasks</b>, and <b>About Custom Error Messages</b>.</li>
   </ul>
    </td></tr></table></body></html>


  Process finished with exit code 0

UPDATE

I replaced the .replace('.', '') method with [2:] because the the replace function also removed . from .aspx in the href and the problem now changed to this

but still, how the value of href keep getting changed how can i fetch that page..

Any help?

来源:https://stackoverflow.com/questions/42858967/how-to-get-the-exact-real-value-of-href

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!