scrapy

Split on comma using python and scrapy

末鹿安然 提交于 2020-06-29 03:40:18
问题 Am using scrapy to extract data from a certain website, I have a field am extracting that returns both the city and the region I want to be able to split the returned data on the comma and store the first part of it inside the city field and second part of it in the region field The code am using to extract the data : loader.add_css('region','.seller-box__seller-address__label::text') the output of the data is : a column named region with for example this value : Elbląg, Warmińsko-mazurskie

Scrapyd-Deploy: Errors due to using os path to set directory

蓝咒 提交于 2020-06-28 05:26:05
问题 I am trying to deploy a scrapy project via scrapyd-deploy to a remote scrapyd server. The project itself is functional and works perfectly on my local machine and on the remote server when I deploy it via git push prod to the remote server. With scrapyd-deploy I get this error: % scrapyd-deploy example -p apo { "node_name": "spider1", "status": "error", "message": "/usr/local/lib/python3.8/dist-packages/scrapy/utils/project.py:90: ScrapyDeprecationWarning: Use of environment variables

Extract class name in scrapy

依然范特西╮ 提交于 2020-06-27 13:54:27
问题 I am trying to scrape rating off of trustpilot.com. Is it possible to extract a class name using scrapy? I am trying to scrape a rating which is made up of five individual images but the images are in a class with the name of the rating for example if the rating is 2 starts then: <div class="star-rating count-2 size-medium clearfix">... if it is 3 stars then: <div class="star-rating count-3 size-medium clearfix">... So is there a way I can scrape the class count-2 or count-3 assuming a

How to extract social information from a given website?

▼魔方 西西 提交于 2020-06-27 06:38:22
问题 I have a Website URL Like www.example.com I want to collect social information from this website like : facebook url (facebook.com/example ), twitter url ( twitter.com/example ) etc., if available anywhere, at any page of website. How to complete this task, suggest any tutorials, blogs, technologies .. 回答1: Since you don't know exactly where (on which page of the website) those link are located, you probably want to base you spider on CrawlSpider class. Such spider lets you define rules for

How can Scrapy deal with Javascript

我只是一个虾纸丫 提交于 2020-06-24 13:50:50
问题 Spider for reference: import scrapy from scrapy.spiders import Spider from scrapy.selector import Selector from script.items import ScriptItem class RunSpider(scrapy.Spider): name = "run" allowed_domains = ["stopitrightnow.com"] start_urls = ( 'http://www.stopitrightnow.com/', ) def parse(self, response): for widget in response.xpath('//div[@class="shopthepost-widget"]'): #print widget.extract() item = ScriptItem() item['url'] = widget.xpath('.//a/@href').extract() url = item['url'] #print

Get xpath() to return empty values

主宰稳场 提交于 2020-06-24 07:47:47
问题 I have a situation where I have a lot of <b> tags: <b>12</b> <b>13</b> <b>14</b> <b></b> <b>121</b> As you can see, the second last tag is empty. When I call: sel.xpath('b/text()').extract() Which gives me: ['12', '13', '14', '121'] I would like to have: ['12', '13', '14', '', '121'] Is there a way to get the empty value? My current work around is to call: sel.xpath('b').extract() And then parsing through each html tag myself (the empty tags are here, which is what I want). 回答1: This is where

Unable to import scrapy

拥有回忆 提交于 2020-06-21 05:42:49
问题 I am trying to scrape a website. However, I get this error inside the VS Code editor: Unable to import 'scrapy'pylint(import-error) Unable to import 'scrapy'pylint(import-error) These are the important parts of codes I am using, I just didn't copy all the variables I am scraping: class xxxxxxxxSpider(scrapy.Spider): name = 'xxxxxxx' allowed_domains = ['www.drsaina.com/ConsultationDoctor/%D9%85%D8%B4%D8%A7%D9%88%D8%B1%D9%87- %D8%A2%D9%86%D9%84%D8%A7%DB%8C%D9%86'] start_urls =['https://www

Unable to import scrapy

六月ゝ 毕业季﹏ 提交于 2020-06-21 05:42:02
问题 I am trying to scrape a website. However, I get this error inside the VS Code editor: Unable to import 'scrapy'pylint(import-error) Unable to import 'scrapy'pylint(import-error) These are the important parts of codes I am using, I just didn't copy all the variables I am scraping: class xxxxxxxxSpider(scrapy.Spider): name = 'xxxxxxx' allowed_domains = ['www.drsaina.com/ConsultationDoctor/%D9%85%D8%B4%D8%A7%D9%88%D8%B1%D9%87- %D8%A2%D9%86%D9%84%D8%A7%DB%8C%D9%86'] start_urls =['https://www

Can I fill web forms with Scrapy?

放肆的年华 提交于 2020-06-12 04:15:58
问题 Now I'm using iMacros to extract data from a web and fill forms submitting the data. But iMacros is a expensive tool. I need a free library and I've read about Scrapy for data minning. I's a litle more complex to programming with it but the money rules. The question is if I can fill html forms with Scrapy and submit to the web page. I don't want to use Javascript, I want to use exclusively Python scripts. I searched in http://doc.scrapy.org/ but I didn't found nothing about form-submit. 回答1:

Scrapy Body Text Only

六月ゝ 毕业季﹏ 提交于 2020-06-11 20:12:23
问题 I am trying to scrape the text only from body using python Scrapy, but haven't had any luck yet. Wishing some scholars might be able to help me here scraping all the text from the <body> tag. 回答1: Scrapy uses XPath notation to extract parts of a HTML document. So, have you tried just using the /html/body path to extract <body> ? (assuming it's nested in <html> ). It might be even simpler to use the //body selector: x.select("//body").extract() # extract body You can find more information