How to collect data from multiple pages into single data structure with scrapy

两盒软妹~` 提交于 2019-12-03 04:04:10

here is a way you need to deal. you need to yield/return item once when item has all attributes

yield Request(page1,
              callback=self.page1_data)

def page1_data(self, response):
    hxs = HtmlXPathSelector(response)
    i = TestItem()
    i['name']='name'
    i['age']='age'
    url_profile_page = 'url to the profile page'

    yield Request(url_profile_page,
                  meta={'item':i},
    callback=self.profile_page)


def profile_page(self,response):
    hxs = HtmlXPathSelector(response)
    old_item=response.request.meta['item']
    # parse other fileds
    # assign them to old_item

    yield old_item
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!