问题
I am having issue communicating between selenium and scrapy object.
I am using selenium to login to some site, once I get that response I want to use scrape's functionaries to parse and process. Please can some one help me writing middleware so that every request should go through selenium web driver and response should be pass to scrapy.
Thank you!
回答1:
It's pretty straightforward, create a middleware with a webdriver and use process_request to intercept the request, discard it and use the url it had to pass it to your selenium webdriver:
from scrapy.http import HtmlResponse
from selenium import webdriver
class DownloaderMiddleware(object):
def __init__(self):
self.driver = webdriver.Chrome() # your chosen driver
def process_request(self, request, spider):
# only process tagged request or delete this if you want all
if not request.meta.get('selenium'):
return
self.driver.get(request.url)
body = self.driver.page_source
response = HtmlResponse(url=self.driver.current_url, body=body)
return response
The downside of this is that you have to get rid of the concurrency in your spider since selenium webdrive can only handle one url at a time. For that see settings documentation page.
来源:https://stackoverflow.com/questions/40268815/how-to-write-customize-downloader-middleware-for-selenium-and-scrapy