How do I read a random line from one file?

前端 未结 11 763
灰色年华
灰色年华 2020-12-04 20:03

Is there a built-in method to do it? If not how can I do this without costing too much overhead?

11条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-12-04 20:25

    Not built-in, but algorithm R(3.4.2) (Waterman's "Reservoir Algorithm") from Knuth's "The Art of Computer Programming" is good (in a very simplified version):

    import random
    
    def random_line(afile):
        line = next(afile)
        for num, aline in enumerate(afile, 2):
          if random.randrange(num): continue
          line = aline
        return line
    

    The num, ... in enumerate(..., 2) iterator produces the sequence 2, 3, 4... The randrange will therefore be 0 with a probability of 1.0/num -- and that's the probability with which we must replace the currently selected line (the special-case of sample size 1 of the referenced algorithm -- see Knuth's book for proof of correctness == and of course we're also in the case of a small-enough "reservoir" to fit in memory ;-))... and exactly the probability with which we do so.

提交回复
热议问题