Sort CSV using a key computed from two columns, grab first n largest values

后端 未结 3 520
被撕碎了的回忆
被撕碎了的回忆 2021-01-17 03:11

Python amateur here...let\'s say here I have snippet of an example csv file:

Country, Year, GDP, Population
Country1         


        
3条回答
  •  梦谈多话
    2021-01-17 03:31

    This is an approach that will enable you to do one scan of the file to get the top 10 for each country...

    It is possible to do this without pandas by utilising the heapq module, the following is untested, but should be a base for you to refer to appropriate documentation and adapt for your purposes:

    import csv
    import heapq
    from itertools import islice
    
    freqs = {}
    with open('yourfile') as fin:
        csvin = csv.reader(fin)
        rows_with_gdp = ([float(row[2]) / float(row[3])] + row for row in islice(csvin, 1, None) if row[2] and row[3])
        for row in rows_with_gdp:
            cnt = freqs.setdefault(row[2], [[]] * 10) # 2 = year, 10 = num to keep
            heapq.heappushpop(cnt, row)
    
    for year, vals in freqs.iteritems():
        print year, [row[1:] for row in sorted(filter(None, vals), reverse=True)]
    

提交回复
热议问题