scrapy-splash

I´m getting JavaScript code instead of rendered html content with scrapy-splash

二次信任 提交于 2020-07-03 17:30:08
问题 I´m trying to use scrapy-splash to load a javascript based page to get the rendered html content of the page but all I get is javascript code as a response. Why doesn´t my spider execute the javascript code of the page? this are my scrapy settings: SPLASH_URL = 'http://localhost:8050' DOWNLOADER_MIDDLEWARES = { 'scrapy_splash.SplashCookiesMiddleware': 723, 'scrapy_splash.SplashMiddleware': 725, 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810, 'scrapy

How to navigate through js/ajax based pagination while scraping a website?

喜欢而已 提交于 2020-04-17 21:58:17
问题 My code works fine only for the first page of each category, But I want to scrap from all the pages of each category. I'm not able to navigate through the next pages. The website uses AJAX for populating the data when I click on next button for navigating to next page. I have also looked into the ajax request which is being made by this website for dynamically populating data(This is the URL which pop up on network tab when I clicked on next page button https://www.couponcodesme.com/ae

How to navigate through js/ajax based pagination while scraping a website?

喜欢而已 提交于 2020-04-17 21:54:19
问题 My code works fine only for the first page of each category, But I want to scrap from all the pages of each category. I'm not able to navigate through the next pages. The website uses AJAX for populating the data when I click on next button for navigating to next page. I have also looked into the ajax request which is being made by this website for dynamically populating data(This is the URL which pop up on network tab when I clicked on next page button https://www.couponcodesme.com/ae

How to navigate through js/ajax based pagination while scraping a website?

我的未来我决定 提交于 2020-04-17 21:53:57
问题 My code works fine only for the first page of each category, But I want to scrap from all the pages of each category. I'm not able to navigate through the next pages. The website uses AJAX for populating the data when I click on next button for navigating to next page. I have also looked into the ajax request which is being made by this website for dynamically populating data(This is the URL which pop up on network tab when I clicked on next page button https://www.couponcodesme.com/ae

scraping web page containing anchor tag <a href = “#”> using scrapy

陌路散爱 提交于 2020-01-24 20:28:10
问题 I am scraping manulife I want to go to the next page, when I inspect the "next" I get : <span class="pagerlink"> <a href="#" id="next" title="Go to the next page">Next</a> </span> What could be the right approach to follow? # -*- coding: utf-8 -*- import scrapy import json from scrapy_splash import SplashRequest class Manulife(scrapy.Spider): name = 'manulife' #allowed_domains = ['https://manulife.taleo.net/careersection/external_global/jobsearch.ftl?lang=en'] start_urls = ['https://manulife

Scrapy CrawlSpider + Splash: how to follow links through linkextractor?

青春壹個敷衍的年華 提交于 2020-01-12 07:42:04
问题 I have the following code that is partially working, class ThreadSpider(CrawlSpider): name = 'thread' allowed_domains = ['bbs.example.com'] start_urls = ['http://bbs.example.com/diy'] rules = ( Rule(LinkExtractor( allow=(), restrict_xpaths=("//a[contains(text(), 'Next Page')]") ), callback='parse_item', process_request='start_requests', follow=True), ) def start_requests(self): for url in self.start_urls: yield SplashRequest(url, self.parse_item, args={'wait': 0.5}) def parse_item(self,

SplashRequest - Cannot get data attribute

痴心易碎 提交于 2020-01-05 08:28:12
问题 I'm strugling to find out why I receive error: AttributeError: 'HtmlResponse' object has no attribute 'data' From documentation: SplashJsonResponse provide extra features: response.data attribute contains response data decoded from JSON; you can access it like response.data['html']. Here is my sample code: class HeadphonesSpider(scrapy.Spider): name = "headphones" handle_httpstatus_list = [404] def start_requests(self): splash_args = { 'html': 1, 'png': 1, 'width': 600, 'render_all': 1, }

SplashRequest - Cannot get data attribute

末鹿安然 提交于 2020-01-05 08:28:10
问题 I'm strugling to find out why I receive error: AttributeError: 'HtmlResponse' object has no attribute 'data' From documentation: SplashJsonResponse provide extra features: response.data attribute contains response data decoded from JSON; you can access it like response.data['html']. Here is my sample code: class HeadphonesSpider(scrapy.Spider): name = "headphones" handle_httpstatus_list = [404] def start_requests(self): splash_args = { 'html': 1, 'png': 1, 'width': 600, 'render_all': 1, }