scrapy | 易学教程

Not able to Login using Scrapy

阅读更多关于 Not able to Login using Scrapy

问题 I'm trying to log in using python scrapy. But it is not working. For reference import quotes as q import loginspidernew as login import scrapy from scrapy.crawler import CrawlerProcess class ValidateURL: def checkURL(self,urls): try: if(urls): for key, value in urls.items(): if value['login_details']: self.runScrap(value) except: return False def runScrap(self,data): if data: process = CrawlerProcess() process.crawl(login.LoginSpider,login_url='http://quotes.toscrape.com/login', start_urls=

Scrape dynamic data using scrapy [closed]

阅读更多关于 Scrape dynamic data using scrapy [closed]

问题 Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 1 year ago . Improve this question I would like to scrape option chain of stock from nasdaq website using scrapy (along with other data) Nasdaq recently updated their website. Here is the url I am talking about. The data is not loaded with plain spider and in scrapy shell. From the scrapy docs, I

Scrapy response.follow query

阅读更多关于 Scrapy response.follow query

问题 I followed the instructions from this page http://docs.scrapy.org/en/latest/intro/tutorial.html import scrapy class QuotesSpider(scrapy.Spider): name = "quotes" start_urls = [ 'http://quotes.toscrape.com/page/1/', ] def parse(self, response): for quote in response.css('div.quote'): yield { 'text': quote.css('span.text::text').get(), 'author': quote.css('span small::text').get(), 'tags': quote.css('div.tags a.tag::text').getall(), } next_page = response.css('li.next a::attr(href)').get() if

Scrapy response.follow query

阅读更多关于 Scrapy response.follow query

Scrape dynamic data using scrapy [closed]

阅读更多关于 Scrape dynamic data using scrapy [closed]

Overriding the serialize_field() method in Scrapy

阅读更多关于 Overriding the serialize_field() method in Scrapy

问题 Im using code from Scrapy documentation, with "Product" class item created from scrapy.exporter import XmlItemExporter class ProductXmlExporter(XmlItemExporter): def serialize_field(self, field, name, value): if field == 'price': return f'$ {str(value)}' return super(Product, self).serialize_field(field, name, value) and always get error from command line return super(Product, self).serialize_field(field, name, value) TypeError: super(Product, obj): obj must be an instance or subtype of type

How to extract the corresponding text of a Div via xpath?

阅读更多关于 How to extract the corresponding text of a Div via xpath?

问题 While making xpath to extract data out of the below given HTML nodes, I'm unable to extract the corresponding text from corresponding elements within a Div. <div class="Main"> <div class="Sub"> <div class="Birth">Jack</div> <span class="Date"> <div><span class="Date">6 June 2018</span></div></span></div> <div class="Sub"> <div class="Birth">Hurley</div> <span class="Date"><div><span class="Date">21 June 2011</span></div></span></div> <div class="Sub"> <div class="Birth">Kate</div> <span class

Scrapy use item and save data in a json file

阅读更多关于 Scrapy use item and save data in a json file

问题 I want to use scrapy item and manipulate data and saving all in json file (using json file like a db). # Spider Class class Spider(scrapy.Spider): name = 'productpage' start_urls = ['https://www.productpage.com'] def parse(self, response): for product in response.css('article'): link = product.css('a::attr(href)').get() id = link.split('/')[-1] title = product.css('a > span::attr(content)').get() product = Product(self.name, id, title, price,'', link) yield scrapy.Request('{}.json'.format

log_count/ERROR while scraping site with Scrapy

阅读更多关于 log_count/ERROR while scraping site with Scrapy

问题 I am getting the following log_count/ERROR while scraping a site with Scrapy. I can see that it has made 43 requests and got 43 responses. Everything looks fine. Then what the error for?: 2018-03-19 00:31:30 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 18455, 'downloader/request_count': 43, 'downloader/request_method_count/GET': 43, 'downloader/response_bytes': 349500, 'downloader/response_count': 43, 'downloader/response_status_count/200': 38, 'downloader

Scrapy repeating rows

阅读更多关于 Scrapy repeating rows

问题 I'm trying to scrape through this site https://www.tahko.com/fi/menovinkit/?ql=tapahtumat. In particular, I'm trying to scrape through the 3 tables on the site. I've managed this with tables = response.xpath('//*[@class="table table-stripefd"]') Then I'd like to get each of the rows for the table, which I did with rows = tables.xpath('//tr') The problem here is, that after scraping and printing out some of the data I noticed that there are multiple entries for some rows. For example, the