Scrapy returns more results than expected

∥☆過路亽.° 提交于 2019-12-24 09:48:58

问题


This is a continuation of the question: Extract from dynamic JSON response with Scrapy

I have a Scrapy spider that extract values from a JSON response. It works well, extract the right values, but somehow it enters in a loop and returns more results than expected (duplicate results).

For example for 17 values provided in test.txt file it returns 289 results, that means 17 times more than expected.

Spider content below:

import scrapy
import json
from whois.items import WhoisItem

class whoislistSpider(scrapy.Spider):
    name = "whois_list"
    start_urls = []
    f = open('test.txt', 'r')
    global lines
    lines = f.read().splitlines()
    f.close()
    def __init__(self):
        for line in lines:
            self.start_urls.append('http://www.example.com/api/domain/check/%s/com' % line)

    def parse(self, response):
        for line in lines:
            jsonresponse = json.loads(response.body_as_unicode())
            item = WhoisItem()
            domain_name = list(jsonresponse['domains'].keys())[0]
            item["avail"] = jsonresponse["domains"][domain_name]["avail"]
            item["domain"] = domain_name
            yield item

items.py content below

import scrapy

class WhoisItem(scrapy.Item):
    avail = scrapy.Field()
    domain = scrapy.Field()

pipelines.py below

class WhoisPipeline(object):
    def process_item(self, item, spider):
        return item

Thank you in advance for all the replies.


回答1:


The parse function should be like this:

def parse(self, response):
    jsonresponse = json.loads(response.body_as_unicode())
    item = WhoisItem()
    domain_name = list(jsonresponse['domains'].keys())[0]
    item["avail"] = jsonresponse["domains"][domain_name]["avail"]
    item["domain"] = domain_name
    yield item

Notice that I removed the for loop.

What was happening: for every single response you would loop and parse it 17 times. (Therefore resulting in 17*17 records)



来源:https://stackoverflow.com/questions/38315087/scrapy-returns-more-results-than-expected

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!