Writing items to a MySQL database in Scrapy

后端 未结 3 656
囚心锁ツ
囚心锁ツ 2020-12-02 08:48

I am new to Scrapy, I had the spider code

class Example_spider(BaseSpider):
   name = \"example\"
   allowed_domains = [\"www.example.com\"]

   def start_re         


        
相关标签:
3条回答
  • 2020-12-02 09:15

    Try the following code in your pipeline

    import sys
    import MySQLdb
    import hashlib
    from scrapy.exceptions import DropItem
    from scrapy.http import Request
    
    class MySQLStorePipeline(object):
        def __init__(self):
            self.conn = MySQLdb.connect('host', 'user', 'passwd', 
                                        'dbname', charset="utf8",
                                        use_unicode=True)
            self.cursor = self.conn.cursor()
    
        def process_item(self, item, spider):    
            try:
                self.cursor.execute("""INSERT INTO example_book_store (book_name, price)  
                            VALUES (%s, %s)""", 
                           (item['book_name'].encode('utf-8'), 
                            item['price'].encode('utf-8')))            
                self.conn.commit()            
            except MySQLdb.Error, e:
                print "Error %d: %s" % (e.args[0], e.args[1])
            return item
    
    0 讨论(0)
  • 2020-12-02 09:17

    Your process_item method should be declared as: def process_item(self, item, spider): instead of def process_item(self, spider, item): -> you switched the arguments around.

    This exception: exceptions.NameError: global name 'Exampleitem' is not defined indicates you didn't import the Exampleitem in your pipeline. Try adding: from myspiders.myitems import Exampleitem (with correct names/paths ofcourse).

    0 讨论(0)
  • 2020-12-02 09:17

    I think this way is better and more concise:

    #Item
    class pictureItem(scrapy.Item):
        topic_id=scrapy.Field()
        url=scrapy.Field()
    
    #SQL
    self.save_picture="insert into picture(`url`,`id`) values(%(url)s,%(id)s);"
    
    #usage
    cur.execute(self.save_picture,dict(item))
    

    It's just like

    cur.execute("insert into picture(`url`,`id`) values(%(url)s,%(id)s)" % {"url":someurl,"id":1})
    

    Cause (you can read more about Items in Scrapy)

    The Field class is just an alias to the built-in dict class and doesn’t provide any extra functionality or attributes. In other words, Field objects are plain-old Python dicts.

    0 讨论(0)
提交回复
热议问题