scrapy

export python data to csv file

烂漫一生 提交于 2020-01-06 23:49:54
问题 I'm trying to export my file via command line : scrapy crawl tunisaianet -o save.csv -t csv but nothing is happenning, any help? here is my code: import scrapy import csv from tfaw.items import TfawItem class TunisianetSpider(scrapy.Spider): name = "tunisianet" allowed_domains = ["tunisianet.com.tn"] start_urls = [ 'http://www.tunisianet.com.tn/466-consoles-jeux/', ] def parse(self, response): item = TfawItem() data= [] out = open('out.csv', 'a') x = response.xpath('//*[contains(@class, "ajax

export python data to csv file

狂风中的少年 提交于 2020-01-06 23:49:00
问题 I'm trying to export my file via command line : scrapy crawl tunisaianet -o save.csv -t csv but nothing is happenning, any help? here is my code: import scrapy import csv from tfaw.items import TfawItem class TunisianetSpider(scrapy.Spider): name = "tunisianet" allowed_domains = ["tunisianet.com.tn"] start_urls = [ 'http://www.tunisianet.com.tn/466-consoles-jeux/', ] def parse(self, response): item = TfawItem() data= [] out = open('out.csv', 'a') x = response.xpath('//*[contains(@class, "ajax

整理了一周的Python资料,包含各阶段所需网站、项目,收藏了慢慢来

大兔子大兔子 提交于 2020-01-06 20:49:24
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 不知怎么的,最近不少关注我的读者都开始私信我怎么学好python?零基础转行是不是合适,还有希望吗?今年30了,还能不能转IT? 其实关于零基础转型的,我以前写过一篇文章,没有看过的都可以看看:「 零基础转行Python,到底路在何方? 」 另外还有一篇知乎点赞1k+关于如何学习python的也建议都看下:「 万字谏言,给那些想学Python的人,建议收藏后细看! 」。 <br> 相信大家看完以上两篇文章后多少都会有个问号,除了我推荐的《笨办法》外,就没什么资料么,而很多新手村玩家都喜欢问一个问题:有什么资料可以参考吗?有什么实战项目可以借鉴的吗? 今天这篇文章,我花了一周的时间搜索、整理、调研、筛选,最后定稿。希望能够帮助到大家,减少在起步阶段的油耗,集中精神突破技术。 文末有我的一个项目及100本Python电子书 ,同时包含了大量的实战代码:leet-code刷题,设计模式练习,爬虫项目,小应用,微信机器人,大数据项目等等。 1.初出茅庐 我不会推荐你们去看官方文档的,因为我知道,你们不会去看的 廖雪峰老师,包括我自己,我相信很多读者应该都多少看过: 「廖雪峰的官方网站」 : https://www.liaoxuefeng.com/wiki

Scrapy crawlers not running simultaneously from Python script

为君一笑 提交于 2020-01-06 20:14:47
问题 I was just wondering why this might be occurring. Here is my Python script to run all: from scrapy import cmdline file = open('cityNames.txt', 'r') cityNames = file.read().splitlines() for city in cityNames: url = "http://" + city + ".website.com" output = city + ".json" cmdline.execute(['scrapy', 'crawl', 'backpage_tester', '-a', "start_url="+url, '-o', ""+output]) cityNames.txt: chicago sanfran boston It runs the through the first city fine, but then stops after that. It doesn't run sanfran

Scrapy not finding table

好久不见. 提交于 2020-01-06 20:04:13
问题 I am trying to scrape data from the table in http://www.oddsportal.com/basketball/usa/nba-2014-2015/results/ The particular table I want has class="table-main" running from scrapy response.xpath('//table') In [28]: response.xpath('//table') Out[28]: [<Selector xpath='//table' data=u'<table>\n\t\t\t\t\t\t\t\t<tr>\n\t\t\t\t\t<td c lass="bol'>, <Selector xpath='//table' data=u'<table class="table-main top-event">\n\t\t\t'> , <Selector xpath='//table' data=u'<table>\n\t\t\t\t\t<tr>\n\t\t\t\t\t\t

Why LinkExtractor does not catch the links that was generated after AJAX requests?

三世轮回 提交于 2020-01-06 19:51:09
问题 I'm crawling a page that generates data with infinite scrolling. I'm using CrawlSpider, and the rules are defined like this: rules = ( Rule(LinkExtractor(restrict_xpaths = ('//*some/xpaths')), callback = 'parse_first_itmes', follow = True), Rule(LinkExtractor(restrict_xpaths = ('//*some/other/xpaths')), callback = 'parse_second_itmes'), ) In the parse_item function, I have a Request makes the AJAX requests: def parse_first_items(self, response): l = ItemLoader(item = AmazonCnCustomerItem(),

sqlalchemy.exc.ArgumentError: Error creating backref

不羁岁月 提交于 2020-01-06 19:28:01
问题 i am trying to scrape data and store into database but its showing an error sqlalchemy.exc.ArgumentError: Error creating backref 'publisher_id' on relationship 'PublisherLookup.reviews': property of that name exists on mapper 'Mapper|Reviews|reviews' from sqlalchemy import create_engine, Column, Integer, String, ForeignKey from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.engine.url import URL from sqlalchemy.ext.declarative import synonym_for from sqlalchemy.orm import

Is it possible to crawl multiple start_urls list simultaneously

痴心易碎 提交于 2020-01-06 17:57:46
问题 I have 3 URL files all of them have same structure so same spider can be used for all lists. A special need is that all three need to be crawled simultaneously. is it possible to crawl them simultaneously without creating multiple spiders? I believe this answer start_urls = ["http://example.com/category/top/page-%d/" % i for i in xrange(4)] + \ ["http://example.com/superurl/top/page-%d/" % i for i in xrange(55)] in Scrap multiple urls with scrapy only joins two list, but not to run them at

Scrapy: MySQL Pipeline — Unexpected Errors Encountered

做~自己de王妃 提交于 2020-01-06 14:44:13
问题 I'm getting a number of errors, depending upon what is being inserted/updated. Here is the code for processing the item: def process_item(self, item, spider): try: if 'producer' in item: self.cursor.execute("""INSERT INTO Producers (title, producer) VALUES (%s, %s)""", (item['title'], item['producer'])) elif 'actor' in item: self.cursor.execute("""INSERT INTO Actors (title, actor) VALUES (%s, %s)""", (item['title'], item['actor'])) elif 'director' in item: self.cursor.execute("""INSERT INTO

爬虫(十六):Scrapy框架(三) Spider Middleware、Item Pipeline

我怕爱的太早我们不能终老 提交于 2020-01-06 14:09:16
1. Spider Middleware Spider Middleware是介入到Scrapy的Spider处理机制的钩子框架。 当Downloader生成Response之后,Response会被发送给Spider,在发送给Spider之前,Response会首先经过Spider Middleware处理,当Spider处理生成Item和Request之后,Item Request还会经过Spider Middleware的处理。 Spider Middleware有三个作用: 我们可以在Downloader生成的Response发送给Spider之前,也就是在Response发送给Spider之前对Response进行处理。 我们可以在Spider生成的Request发送给Scheduler之前,也就是在Request发送给Scheduler之前对Request进行处理。 我们可以在Spider生成的Item发送给Item Pipeline之前,也就是在Item发送给Item Pipeline之前对Item进行处理。 1.1 使用说明 需要说明的是,Scrapy其实已经提供了许多Spider Middleware,它们被SPIDER_MIDDLEWARES_BASE这个变盘所定义。 SPIDER_MIDDLEWARES_BASE变量的内容如下: { 'scrapy