Read random lines from huge CSV file in Python

前端 未结 11 1478
独厮守ぢ
独厮守ぢ 2020-12-05 02:27

I have this quite big CSV file (15 Gb) and I need to read about 1 million random lines from it. As far as I can see - and implement - the CSV utility in Python only allows t

11条回答
  •  刺人心
    刺人心 (楼主)
    2020-12-05 02:49

    # pass 1, count the number of rows in the file
    rowcount = sum(1 for line in file)
    # pass 2, select random lines
    file.seek(0)
    remaining = 1000000
    for row in csv.reader(file):
        if random.randrange(rowcount) < remaining:
            print row
            remaining -= 1
        rowcount -= 1
    

提交回复
热议问题