Scrapy + splash: can't select element

懵懂的女人 提交于 2019-12-03 00:49:14

Not a complete solution, but here is what I have so far:

import json
import re

import scrapy
from scrapy_splash import SplashRequest


class UberEatsSpider(scrapy.Spider):
    name = "ubereatspider"
    allowed_domains = ["ubereats.com"]

    def start_requests(self):
        script = """
        function main(splash)
            local url = splash.args.url
            assert(splash:go(url))
            assert(splash:wait(10))

            splash:set_viewport_full()

            local search_input = splash:select('#address-selection-input')
            search_input:send_text("Wall Street, New York")
            assert(splash:wait(5))

            local submit_button = splash:select('button[class^=submitButton_]')
            submit_button:click()

            assert(splash:wait(10))

            return {
                html = splash:html(),
                png = splash:png(),
            }
          end
        """
        headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36'
        }
        yield SplashRequest('https://www.ubereats.com/new_york/', self.parse, endpoint='execute', args={
            'lua_source': script,
            'wait': 5
        }, splash_headers=headers, headers=headers)

    def parse(self, response):
        script = response.xpath("//script[contains(., 'cityName')]/text()").extract_first()
        pattern = re.compile(r"window.INITIAL_STATE = (\{.*?\});", re.MULTILINE | re.DOTALL)

        match = pattern.search(script)
        if match:
            data = match.group(1)
            data = json.loads(data)
            for place in data["marketplace"]["marketplaceStores"]["data"]["entity"]:
                print(place["title"])

Note the changes in the Lua script: I've located the search input, send the search text to it, then located the "Find" button and clicked it. On the screenshot, I did not see the search results loaded no matter the time delay I've set, but I've managed to get the restaurant names from the script contents. The place objects contain all the necessary information to filter the desired restaurants.

Also note that the URL I'm navigating to is the "New York" one (not the general "stores").

I'm not completely sure why the search result page is not being loaded though, but hope it'll be a good start for you and you can further improve this solution.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!