Using loginform with scrapy

后端 未结 2 616
暗喜
暗喜 2020-12-15 14:10

The scrapy framework (https://github.com/scrapy/scrapy) provides a library for use when logging into websites that require authentication, https://github.com/scrapy/loginfor

2条回答
  •  失恋的感觉
    2020-12-15 14:44

    loginform is just a library, totally decoupled from Scrapy.

    You have to write the code to plug it in the spider you want, probably in a callback method.

    Here is an example of a structure to do this:

    import scrapy
    from loginform import fill_login_form
    
    class MySpiderWithLogin(scrapy.Spider):
        name = 'my-spider'
    
        start_urls = [
            'http://somewebsite.com/some-login-protected-page',
            'http://somewebsite.com/another-protected-page',
        ]
    
        login_url = 'http://somewebsite.com/login-page'
    
        login_user = 'your-username'
        login_password = 'secret-password-here'
    
        def start_requests(self):
            # let's start by sending a first request to login page
            yield scrapy.Request(self.login_url, self.parse_login)
    
        def parse_login(self, response):
            # got the login page, let's fill the login form...
            data, url, method = fill_login_form(response.url, response.body,
                                                self.login_user, self.login_password)
    
            # ... and send a request with our login data
            return scrapy.FormRequest(url, formdata=dict(data),
                               method=method, callback=self.start_crawl)
    
        def start_crawl(self, response):
            # OK, we're in, let's start crawling the protected pages
            for url in self.start_urls:
                yield scrapy.Request(url)
    
        def parse(self, response):
            # do stuff with the logged in response
    

提交回复
热议问题