Scrapy FormRequest login not working

僤鯓⒐⒋嵵緔 提交于 2021-02-08 07:51:41

问题


I'm trying to log in with Scrapy but am receiving lots of "Redirecting (302)" messages. This happens when I use my real login and also with fake login info. I also tried it with another site and still no luck.

import scrapy
from scrapy.http import FormRequest, Request

class LoginSpider(scrapy.Spider):
    name = 'SOlogin'
    allowed_domains = ['stackoverflow.com']

    login_url = 'https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f'
    test_url = 'http://stackoverflow.com/questions/ask'

    def start_requests(self):
        yield Request(url=self.login_url, callback=self.parse_login)

    def parse_login(self, response):
        return FormRequest.from_response(response, formdata={"email": "XXXXX", "password": "XXXXX"}, callback=self.start_crawl)

    def start_crawl(self, response):
       yield Request(self.test_url, callback=self.parse_item)

    def parse_item(self, response):
        print("Test URL " + response.url)

I also tried adding

meta = {'dont_redirect': True, 'handle_httpstatus_list':[302]} 

to the initial Request and the FormRequest.

Here's the output from the code above:

2017-04-17 21:48:17 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: stackoverflow) 2017-04-17 21:48:17 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'stackoverflow', 'NEWSPIDER_MODULE': 'stackoverflow.spiders', 'SPIDER_MODULES': ['stackoverflow.spiders'], 'USER_AGENT': 'Mozilla/5.0'} 2017-04-17 21:48:17 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.logstats.LogStats'] 2017-04-17 21:48:17 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2017-04-17 21:48:17 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2017-04-17 21:48:17 [scrapy.middleware] INFO: Enabled item pipelines: [] 2017-04-17 21:48:17 [scrapy.core.engine] INFO: Spider opened 2017-04-17 21:48:17 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2017-04-17 21:48:17 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 2017-04-17 21:48:18 [scrapy.core.engine] DEBUG: Crawled (200) https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f> (referer: None) 2017-04-17 21:48:18 [scrapy.core.engine] DEBUG: Crawled (200) https://stackoverflow.com/search?q=&email=XXXXX&password=XXXXX> (referer: https://stackoverflow.com/users/login?ssrc=head&returnurl=http%3a%2f%2fstackoverflow.com%2f) 2017-04-17 21:48:19 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to http://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> from http://stackoverflow.com/questions/ask> 2017-04-17 21:48:19 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> from http://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> 2017-04-17 21:48:19 [scrapy.core.engine] DEBUG: Crawled (200) https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask> (referer: https://stackoverflow.com/search?q=&email=XXXXX&password=XXXXX) Test URL https://stackoverflow.com/users/login?ssrc=anon_ask&returnurl=http%3a%2f%2fstackoverflow.com%2fquestions%2fask 2017-04-17 21:48:19 [scrapy.core.engine] INFO: Closing spider (finished) 2017-04-17 21:48:19 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 1772, 'downloader/request_count': 5, 'downloader/request_method_count/GET': 5, 'downloader/response_bytes': 34543, 'downloader/response_count': 5, 'downloader/response_status_count/200': 3, 'downloader/response_status_count/302': 2, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2017, 4, 17, 18, 48, 19, 470354), 'log_count/DEBUG': 6, 'log_count/INFO': 7, 'request_depth_max': 2, 'response_received_count': 3, 'scheduler/dequeued': 5, 'scheduler/dequeued/memory': 5, 'scheduler/enqueued': 5, 'scheduler/enqueued/memory': 5, 'start_time': datetime.datetime(2017, 4, 17, 18, 48, 17, 386516)} 2017-04-17 21:48:19 [scrapy.core.engine] INFO: Spider closed (finished)


回答1:


Scrapy by default try to populate your email and password in the first clickable input field (in login page it's search form). You need to specify input field by formname or formid e.g. FormRequest.from_response(response, formid="login-form", formdata={"email": "XXXXX", "password": "XXXXX"}, callback=self.start_crawl). See docs



来源:https://stackoverflow.com/questions/43457801/scrapy-formrequest-login-not-working

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!