Scraping a website with python 3 that requires login

我是研究僧i 提交于 2021-02-08 09:13:38

问题


Just a question regarding some scraping authentication. Using BeautifulSoup:

#importing the requests lib  
import requests
from bs4 import BeautifulSoup

#specifying the page
page = requests.get("http://localhost:8080/login?from=%2F")
#parsing through the api
soup = BeautifulSoup(page.content, 'html.parser')
print(soup.prettify())

From here the output, I think would be important:

 <table>
   <tr>
    <td>
     User:
    </td>
    <td>
     <input autocapitalize="off" autocorrect="off" id="j_username" name="j_username" type="text"/>
    </td>
   </tr>
   <tr>
    <td>
     Password:
    </td>
    <td>
     <input name="j_password" type="password"/>
    </td>
   </tr>
   <tr>
    <td align="right">
     <input id="remember_me" name="remember_me" type="checkbox"/>
    </td>
    <td>
     <label for="remember_me">
      Remember me on this computer
     </label>
    </td>
   </tr>
  </table>

This scrapes the website fine, but it requires a login. Here I am using the mechanicalsoup library:

import mechanicalsoup

browser = mechanicalsoup.StatefulBrowser()
browser.open("http://localhost:8080/login?from=%2F")
browser.get_url()
browser.get_current_page()
browser.get_current_page().find_all('form')
browser["j_username"] = "admin"
browser ["j_password"] = "password"
browser.launch_browser()

However it still won't let me login.

Has anyone used a scraping tool for python 3 that lets them scrape a site that has authentication?


回答1:


I see you're using requests. The syntax for logging in to a site is as follows:

import requests
page = requests.get("http://localhost:8080/login?from=%2F", auth=
('username', 'password'))

Hope this helps! You can read more about authentication here: http://docs.python-requests.org/en/master/user/authentication/




回答2:


With MechanicalSoup, you first need to specify the form you want to fill-in and submit. If you have only one form, use:

browser.select_form()

Then, after filling-in the form, you need to submit it:

browser.submit_selected()

You may read the (newly written) MechanicalSoup tutorial or look at examples like logging in into GitHub with MechanicalSoup.



来源:https://stackoverflow.com/questions/47438699/scraping-a-website-with-python-3-that-requires-login

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!