How to write customize Downloader Middleware for selenium and Scrapy?

天涯浪子 提交于 2019-12-19 11:32:49

问题


I am having issue communicating between selenium and scrapy object.

I am using selenium to login to some site, once I get that response I want to use scrape's functionaries to parse and process. Please can some one help me writing middleware so that every request should go through selenium web driver and response should be pass to scrapy.

Thank you!


回答1:


It's pretty straightforward, create a middleware with a webdriver and use process_request to intercept the request, discard it and use the url it had to pass it to your selenium webdriver:

from scrapy.http import HtmlResponse
from selenium import webdriver


class DownloaderMiddleware(object):
    def __init__(self):
        self.driver = webdriver.Chrome()  # your chosen driver

    def process_request(self, request, spider):
        # only process tagged request or delete this if you want all
        if not request.meta.get('selenium'):
            return
        self.driver.get(request.url)
        body = self.driver.page_source
        response = HtmlResponse(url=self.driver.current_url, body=body)
        return response

The downside of this is that you have to get rid of the concurrency in your spider since selenium webdrive can only handle one url at a time. For that see settings documentation page.



来源:https://stackoverflow.com/questions/40268815/how-to-write-customize-downloader-middleware-for-selenium-and-scrapy

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!