Is there a built-in method to do it? If not how can I do this without costing too much overhead?
If you don't want to load the whole file into RAM with f.read()
or f.readlines()
, you can get random line this way:
import os
import random
def get_random_line(filepath: str) -> str:
file_size = os.path.getsize(filepath)
with open(filepath, 'rb') as f:
while True:
pos = random.randint(0, file_size)
if not pos: # the first line is chosen
return f.readline().decode() # return str
f.seek(pos) # seek to random position
f.readline() # skip possibly incomplete line
line = f.readline() # read next (full) line
if line:
return line.decode()
# else: line is empty -> EOF -> try another position in next iteration
P.S.: yes, that was proposed by Ignacio Vazquez-Abrams in his answer above, but a) there's no code in his answer and b) I've come up with this implementation myself; it can return first or last line. Hope it may be useful for someone.
However, if you care about distribution, this code is not an option for you.