i want to print a proper table out of data scrapped using scrapy

匆匆过客 提交于 2021-02-11 17:20:38

问题


so i have written all the code to scrap table from [http://www.rarityguide.com/cbgames_view.php?FirstRecord=21][1] but i am getting output like

# the output that i get

{'EXG': (['17.00',
          '10.00',
          '90.00',
          '9.00',
          '13.00',
          '17.00',
          '16.00',
          '43.00',
          '125.00',
          '16.00',
          '11.00',
          '150.00',
          '17.00',
          '24.00',
          '15.00',
          '24.00',



'21.00',
          '36.00',
          '270.00',
          '280.00'],),
 'G': ['8.00',
       '5.00',
       '38.00',
       '2.00',
       '6.00',
       '7.00',
       '6.00',
       '20.00',
       '40.00',
       '7.00',
       '5.00',
       '70.00',
       '6.00',
       '12.00',
       '7.00',
       '12.00',
       '10.00',
       '15.00',
       '120.00',
       '140.00'],
 'company': (['Milton Bradley',
              'Lowell',
              'Milton Bradley',
              'Transogram',
              'Milton Bradley',
              'Transogram',
              'Standard Toykraft',
              'Ideal',
              'Game Gems',
              'Milton Bradley',
              'Parker Brothers',
              'CPC',
              'Parker Brothers',
              'Whitman',
              'Ideal',
              'Transogram',
              'King Features',
              'Westinghouse',
              'Parker Brothers',
              'Parker Brothers'],),
 'mnm': (['26.00',
          '19.00',
          '195.00',
          '15.00',
          '30.00',
          '29.00',
          '31.00',
          '65.00',
          '204.00',
          '25.00',
          '22.00',
          '250.00',
          '27.00',
          '42.00',
          '23.00',
          '37.00',
          '40.00',
          '57.00',
          '415.00',
          '435.00'],),
 'rarity': ([],),
 'title': (['Beat the Clock',
            'Beat the Clock',
            'Beatles - Flip Your Wig',
            'Ben Casey M.D.',
            'Bermuda Triangle',
            'Betsy Ross and the Flag',
            'Beverly Hillbillies',
            'Beware the Spider',
            'Bewitched',
            'Bewitched - Stymie Card Game',
            'Bionic Woman',
            'Blade Runner',
            'Blondie',
            'Blondie - Playing Card Game',
            'Blondie - Sunday Funnies',
            'Blondie - The Hurry Scurry Game',
            "Blondie and Dagwood's Race for the Office",
            'Blondie Goes to Leisureland',
            'Boom or Bust',
            'Boom or Bust'],),
 'year': (['1969',
           '1954',
           '1964',
           '1961',
           '1976',
           '1961',
           '1963',
           '1980',
           '1965',
           '1964',
           '1976',
           '1982',
           '1969',
           '1941',
           '1972',
           '1966',
           '1950',
           '1935',
           '1951',
           '1959'],)}

can ayone help me achieve output like

# the output that i want!
{"EXG": ["17.00"],
  "MNM": ["26.00"],
  "year": ["1969"],
  "company": ["Milton Bradley"],
  "Title": ["Beat the Clock"] }

{"EXG": ["10.00"],
  "MNM": ["19.00"],
  "year": ["1954"],
  "company": ["Lowell"],
  "Title": ["Beat the Clock"] }
and then so on for all values.

basically i want to have one dictionary containing all the key value pairs instead of having one entire dictionary for each key. also here's my spider's code

import scrapy
from ..items import RarityItem


class RarityScrapper(scrapy.Spider):
    name = "rarity"
    start_urls = [
        "http://www.rarityguide.com/cbgames_view.php?FirstRecord=21"
    ]

    def parse(self, response):
        table = response.css(
            "form")

        items = RarityItem()

        for contents in table:
            title = contents.css("td:nth-child(2)::text").extract()
            company = contents.css("td:nth-child(3)::text").extract()
            year = contents.css("td:nth-child(4)::text").extract()
            rarity = contents.css("td:nth-child(5)::text").extract()
            mnm = contents.css("td:nth-child(6)::text").extract()
            EXG = contents.css("td:nth-child(7)::text").extract()
            G = contents.css("td:nth-child(8)::text").extract()

            items["title"] = title,
            items["company"] = company,
            items["year"] = year,
            items["rarity"] = rarity,
            items["mnm"] = mnm,
            items["EXG"] = EXG,
            items["G"] = G

            yield items

回答1:


If all lists are same length, after this line

G = contents.css("td:nth-child(8)::text").extract():

Add this ode snippet:

arr = []
for _ in range(len(title)):
    arr.append({
        'EXP': title[_], 'company': company[_], 'year': year[_], 'rarity': rarity[_],
        'MNM': mnm[_], 'EXG': EXG[_], 'G': G[_]})

Then type this:

for _ in arr:
    print(_)

to see output array




回答2:


You need to iterate through each row in table and process row data separately. As all row have the same length you can use list unpacking to write data into dict item:

def parse(self, response):
    table = response.css(
        "form table")

    for row in table.css("tr"):
        i = {}
        _, i["title"], i["company"], i["year"], _, i["mnm"], i["EXG"], i["G"] = row.css("td::text").extract()
        i["rarity"] = row.css("td img::alt").extract_first("")
        yield i


来源:https://stackoverflow.com/questions/61351415/i-want-to-print-a-proper-table-out-of-data-scrapped-using-scrapy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!