Database insertion fails without error with scrapy

孤街浪徒 提交于 2019-12-31 04:11:49

问题


I'm working with scrapy and dataset (https://dataset.readthedocs.io/en/latest/quickstart.html#storing-data) which is a layer on top of sqlalchemy , trying to load data into a sqllite table as a follow up to Sqlalchemy : Dynamically create table from Scrapy item.

using the dataset package I have:

class DynamicSQLlitePipeline(object):

    def __init__(self,table_name):

        db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
        db = dataset.connect(db_path)
        self.table = db[table_name].table


    def process_item(self, item, spider):

        try:
            print('TEST DATASET..')
            self.table.insert(dict(name='John Doe', age=46, country='China'))
            print('INSERTED')
        except IntegrityError:
                print('THIS IS A DUP')
        return item

after running my spider I see the print statements printed out in the try except block, with no errors, but after completion , I look in the table and see the screenshot. No data is in the table. What am I doing wrong?


回答1:


The code you posted is not working as is for me:

TypeError: __init__() takes exactly 2 arguments (1 given)

That's because the __init__ method expects a table_name argument which is not being passed. You need to implement the from_crawler class method in the pipeline object, something like:

@classmethod
def from_crawler(cls, crawler):
    return cls(table_name=crawler.spider.name)

That would create a pipeline object using the spider name as table name, you can of course use any name you want.

Also, the line self.table = db[table_name].table should be replaced by self.table = db[table_name] (https://dataset.readthedocs.io/en/latest/quickstart.html#storing-data)

After that, the data is stored:




回答2:


Maybe some problems with the Db connection. Put your this snippet into a try except to check for the problem.

try:
   db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
   db = dataset.connect(db_path)
   self.table = db[table_name].table
except Exception:
   traceback.exec_print()


来源:https://stackoverflow.com/questions/41273314/database-insertion-fails-without-error-with-scrapy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!