scrapy

Not able to Login using Scrapy

随声附和 提交于 2021-01-29 13:39:38
问题 I'm trying to log in using python scrapy. But it is not working. For reference import quotes as q import loginspidernew as login import scrapy from scrapy.crawler import CrawlerProcess class ValidateURL: def checkURL(self,urls): try: if(urls): for key, value in urls.items(): if value['login_details']: self.runScrap(value) except: return False def runScrap(self,data): if data: process = CrawlerProcess() process.crawl(login.LoginSpider,login_url='http://quotes.toscrape.com/login', start_urls=

Scrape dynamic data using scrapy [closed]

若如初见. 提交于 2021-01-29 13:13:21
问题 Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 1 year ago . Improve this question I would like to scrape option chain of stock from nasdaq website using scrapy (along with other data) Nasdaq recently updated their website. Here is the url I am talking about. The data is not loaded with plain spider and in scrapy shell. From the scrapy docs, I

Scrapy response.follow query

纵然是瞬间 提交于 2021-01-29 13:00:38
问题 I followed the instructions from this page http://docs.scrapy.org/en/latest/intro/tutorial.html import scrapy class QuotesSpider(scrapy.Spider): name = "quotes" start_urls = [ 'http://quotes.toscrape.com/page/1/', ] def parse(self, response): for quote in response.css('div.quote'): yield { 'text': quote.css('span.text::text').get(), 'author': quote.css('span small::text').get(), 'tags': quote.css('div.tags a.tag::text').getall(), } next_page = response.css('li.next a::attr(href)').get() if

Scrapy response.follow query

帅比萌擦擦* 提交于 2021-01-29 12:27:36
问题 I followed the instructions from this page http://docs.scrapy.org/en/latest/intro/tutorial.html import scrapy class QuotesSpider(scrapy.Spider): name = "quotes" start_urls = [ 'http://quotes.toscrape.com/page/1/', ] def parse(self, response): for quote in response.css('div.quote'): yield { 'text': quote.css('span.text::text').get(), 'author': quote.css('span small::text').get(), 'tags': quote.css('div.tags a.tag::text').getall(), } next_page = response.css('li.next a::attr(href)').get() if

Scrape dynamic data using scrapy [closed]

只谈情不闲聊 提交于 2021-01-29 12:18:43
问题 Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 1 year ago . Improve this question I would like to scrape option chain of stock from nasdaq website using scrapy (along with other data) Nasdaq recently updated their website. Here is the url I am talking about. The data is not loaded with plain spider and in scrapy shell. From the scrapy docs, I

Overriding the serialize_field() method in Scrapy

Deadly 提交于 2021-01-29 10:50:19
问题 Im using code from Scrapy documentation, with "Product" class item created from scrapy.exporter import XmlItemExporter class ProductXmlExporter(XmlItemExporter): def serialize_field(self, field, name, value): if field == 'price': return f'$ {str(value)}' return super(Product, self).serialize_field(field, name, value) and always get error from command line return super(Product, self).serialize_field(field, name, value) TypeError: super(Product, obj): obj must be an instance or subtype of type

How to extract the corresponding text of a Div via xpath?

余生颓废 提交于 2021-01-29 09:41:11
问题 While making xpath to extract data out of the below given HTML nodes, I'm unable to extract the corresponding text from corresponding elements within a Div. <div class="Main"> <div class="Sub"> <div class="Birth">Jack</div> <span class="Date"> <div><span class="Date">6 June 2018</span></div></span></div> <div class="Sub"> <div class="Birth">Hurley</div> <span class="Date"><div><span class="Date">21 June 2011</span></div></span></div> <div class="Sub"> <div class="Birth">Kate</div> <span class

Scrapy use item and save data in a json file

这一生的挚爱 提交于 2021-01-29 08:01:49
问题 I want to use scrapy item and manipulate data and saving all in json file (using json file like a db). # Spider Class class Spider(scrapy.Spider): name = 'productpage' start_urls = ['https://www.productpage.com'] def parse(self, response): for product in response.css('article'): link = product.css('a::attr(href)').get() id = link.split('/')[-1] title = product.css('a > span::attr(content)').get() product = Product(self.name, id, title, price,'', link) yield scrapy.Request('{}.json'.format

log_count/ERROR while scraping site with Scrapy

本小妞迷上赌 提交于 2021-01-29 07:22:15
问题 I am getting the following log_count/ERROR while scraping a site with Scrapy. I can see that it has made 43 requests and got 43 responses. Everything looks fine. Then what the error for?: 2018-03-19 00:31:30 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 18455, 'downloader/request_count': 43, 'downloader/request_method_count/GET': 43, 'downloader/response_bytes': 349500, 'downloader/response_count': 43, 'downloader/response_status_count/200': 38, 'downloader

Scrapy repeating rows

邮差的信 提交于 2021-01-29 07:21:08
问题 I'm trying to scrape through this site https://www.tahko.com/fi/menovinkit/?ql=tapahtumat. In particular, I'm trying to scrape through the 3 tables on the site. I've managed this with tables = response.xpath('//*[@class="table table-stripefd"]') Then I'd like to get each of the rows for the table, which I did with rows = tables.xpath('//tr') The problem here is, that after scraping and printing out some of the data I noticed that there are multiple entries for some rows. For example, the